[DEBUG] result={'visible': True, 'value': 'C:\Users\Admin\AppData\Local\Temp\gradio\b63dccf3804fc531779443a72840298a1eb10309ab2c053fc31abd99cfb7be5d\spk_1777043456.wav', '__type__': 'update'}
[DEBUG] result={'visible': True, 'value': 'C:\Users\Admin\AppData\Local\Temp\gradio\013c8d93b84bcb809b322c3b78ddc380dc040150b9c109b344fc5b6c35de67bc\spk_1777043461.wav', '__type__': 'update'}
[DEBUG] result={'visible': True, 'value': 'C:\Users\Admin\AppData\Local\Temp\gradio\104af44553e62299c7b3f9c9157dc6402e1e1d0a20211b71574c77ce14fdf4c7\spk_1777043464.wav', '__type__': 'update'}
[DEBUG] [字幕配音]渠道11:共耗时:47s
speedrate
[DEBUG] [SpeedRate] Init. AudioRate=True, VideoRate=False, Rubberband=True
[DEBUG] [SpeedRate] 启用变速,进入对齐模式。
[DEBUG] [Calc] Mode=Only Audio Line=1 | Source=3152 Dubb=2612 -> TargetV=3152 TargetA=3152
[DEBUG] [Calc] Mode=Only Audio Line=2 | Source=4016 Dubb=4331 -> TargetV=4016 TargetA=4016
[DEBUG] [Calc] Mode=Only Audio Line=3 | Source=1800 Dubb=4830
......
[DEBUG] concat_txt='D:/win-pyvideotrans-v3.99-420/tmp/24424/354bba539d/final_audio_concat.txt',filelist[0]='D:/win-pyvideotrans-v3.99-420/tmp/24424/354bba539d/silence_head_0.wav'
[DEBUG] [Audio-Concat] 最终音频已生成: D:/win-pyvideotrans-v3.99-420/tmp/24424/354bba539d/target.wav
[DEBUG]
==准备要嵌入的字幕:self.cfg.subtitle_type=3
=
[DEBUG] 最终确定字幕嵌入类型:3 ,目标字幕语言:chi, 字幕文件:D:/win-pyvideotrans-v3.99-420/tmp/24424/354bba539d/shuang.srt
[DEBUG] [FFMPEG-CMD]:
ffmpeg -hide_banner -nostdin -ignore_unknown -threads 0 -y -i novoice.mp4 -vf tpad=stop_mode=clone:stop_duration=0.002 -c:v libx264 -crf 23 -preset medium -an final_video_with_freeze_lastend.mp4
[DEBUG] 视频定格应延长2ms,实际向上取整秒延长0.002s,操作成功。
[DEBUG] 原始hw_type='h264_nvenc'
[DEBUG] 整理后hw_type='nvenc'
[DEBUG] [尝试硬件编解码执行命令]
-y -progress compose1777043517.652968.txt -hwaccel cuda -hwaccel_output_format cuda -i novoice.mp4 -i origin_audio.m4a -filter_complex [0:v]hwdownload,format=nv12,subtitles=filename='shuang.ass',hwupload_cuda[v_out] -map [v_out] -map 1:a -c:v h264_nvenc -c:a copy -cq 23 -preset p4 -movflags +faststart -t 26.833000 laste_target.mp4
[DEBUG] 最终配置信息:self.cfg=TaskCfgVTT(is_cuda=True, uuid='875e61ada6', cache_folder='D:/win-pyvideotrans-v3.99-420/tmp/24424/875e61ada6', target_dir='D:/_Output/_video_out/测试001-mp4', source_language='英语', source_language_code='en', source_sub='D:/_Output/_video_out/测试001-mp4/en.srt', source_wav='D:/win-pyvideotrans-v3.99-420/tmp/24424/875e61ada6/en.wav', source_wav_output='D:/_Output/_video_out/测试001-mp4/en.m4a', target_language='简体中文', target_language_code='zh-cn', target_sub='D:/_Output/_video_out/测试001-mp4/zh-cn.srt', target_wav='D:/win-pyvideotrans-v3.99-420/tmp/24424/875e61ada6/target.wav', target_wav_output='D:/_Output/_video_out/测试001-mp4/zh-cn.m4a', name='D:/_Output/测试001.mp4', noextname='测试001', basename='测试001.mp4', ext='mp4', dirname='D:/_Output', shound_del_name=None, translate_type=4, tts_type=11, volume='+0%', pitch='+0Hz', voice_rate='+0%', voice_role='en_000001_80_chn.wav', voice_autorate=True, video_autorate=False, remove_silent_mid=False, align_sub_audio=True, detect_language='en', recogn_type=0, model_name='large-v3-turbo', shibie_audio=None, remove_noise=False, enable_diariz=False, nums_diariz=0, rephrase=1, fix_punc=True, subtitle_language=None, app_mode='biaozhun', subtitles='', targetdir_mp4='D:/_Output/_video_out/测试001-mp4/测试001.mp4', novoice_mp4='D:/win-pyvideotrans-v3.99-420/tmp/24424/875e61ada6/novoice.mp4', is_separate=False, embed_bgm=False, instrument=None, vocal=None, back_audio='', clear_cache=True, background_music=None, subtitle_type=3, only_out_mp4=False, recogn2pass=False, output_srt=2, copysrt_rawvideo=False)
[DEBUG] [FFMPEG-CMD]:
ffmpeg -hide_banner -nostdin -ignore_unknown -threads 0 -y -fflags +genpts -i D:/_Output/测试001.mp4 -an -c:v copy novoice.mp4
[DEBUG] [recognition]__init__:kwargs={'detect_language': 'en', 'audio_file': 'D:/win-pyvideotrans-v3.99-420/tmp/24424/875e61ada6/en.wav', 'cache_folder': 'D:/win-pyvideotrans-v3.99-420/tmp/24424/875e61ada6', 'model_name': 'large-v3-turbo', 'uuid': '875e61ada6', 'is_cuda': True, 'subtitle_type': 3, 'recogn_type': 0, 'max_speakers': -1, 'llm_post': True, 'recogn2pass': False}
[DEBUG] BaseRecogn 初始化
[DEBUG] Before VAD tenvad,_min_speech=1000ms,_max_speech=5000ms,_min_silence=250ms
[DEBUG] [Ten-VAD]Fix after:VAD断句参数:threshold=0.5,min_speech_duration_ms=1000ms,max_speech_duration_ms=5000ms,min_silent_duration_ms=250ms
[DEBUG] [Ten-VAD]音频能量: 2911.925941262958, 调整后阈值: 0.5
[DEBUG] [Ten-VAD]切分用时 0s
[DEBUG] [Ten-VAD]切分合并共用时:0s
[DEBUG] faster-whisper模式下,预先使用VAD分割音频,对large-v3-turbo模型返回的文字结果直接使用
[DEBUG] [语音识别]渠道0,large-v3-turbo:共耗时:9s
2026-04-24 23:21:53,216 - modelscope - WARNING - We can not confirm the cached file is for revision: master