#5270 TaskCfgVTT(uuid='d5ed0e4801', name='E:/01-Zwsoft/ZW3D产品管理/13_产品发布/Video/New Feature Video/2027_ZW3D Video Cam/3X Milling

31.22* Posted at: 2 days ago

Batch size mismatch: audio=8, context=0:Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 559, in qwen3asr_fun
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "D:\videotrans\_internal\qwen_asr\inference\qwen3_asr.py", line 345, in transcribe

raise ValueError(f"Batch size mismatch: audio={n}, context={len(ctxs)}")

ValueError: Batch size mismatch: audio=8, context=0
[Qwen-ASR(本地内置), Google(免费), Edge-TTS(免费)]
Traceback (most recent call last):
File "videotrans\task\only_one.py", line 47, in run
File "videotrans\task\trans_create.py", line 317, in recogn
File "videotrans\recognition\__init__.py", line 190, in run
File "videotrans\recognition\_base.py", line 94, in run
File "videotrans\recognition\_qwenasrlocal.py", line 45, in _exec
File "videotrans\configure\base.py", line 268, in _new_process
videotrans.configure.excepts.VideoTransError: Batch size mismatch: audio=8, context=0:Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 559, in qwen3asr_fun
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "D:\videotrans\_internal\qwen_asr\inference\qwen3_asr.py", line 345, in transcribe

raise ValueError(f"Batch size mismatch: audio={n}, context={len(ctxs)}")

ValueError: Batch size mismatch: audio=8, context=0
TaskCfgVTT(uuid='d5ed0e4801', name='E:/01-Zwsoft/ZW3D产品管理/13_产品发布/Video/New Feature Video/2027_ZW3D Video Cam/3X Milling Improvements_ENG/3X Milling Improvements_ENG_Finished.mp4', dirname='E:/01-Zwsoft/ZW3D产品管理/13_产品发布/Video/New Feature Video/2027_ZW3D Video Cam/3X Milling Improvements_ENG', noextname='3X Milling Improvements_ENG_Finished', basename='3X Milling Improvements_ENG_Finished.mp4', ext='mp4', target_dir='3X Milling Improvements_ENG_Finished-mp4', cache_folder='D:/videotrans/tmp/22188/d5ed0e4801', is_cuda=True, source_language='英语', source_language_code='en', source_sub='3X Milling Improvements_ENG_Finished-mp4/en.srt', source_wav='D:/videotrans/tmp/22188/d5ed0e4801/en.wav', source_wav_output='3X Milling Improvements_ENG_Finished-mp4/en.m4a', target_language='简体中文', target_language_code='zh-cn', target_sub='3X Milling Improvements_ENG_Finished-mp4/zh-cn.srt', target_wav='D:/videotrans/tmp/22188/d5ed0e4801/target.wav', target_wav_output='3X Milling Improvements_ENG_Finished-mp4/zh-cn.m4a', translate_type=0, tts_type=0, volume='+0%', pitch='+0Hz', voice_rate='+0%', voice_role='No', voice_autorate=True, video_autorate=False, remove_silent_mid=False, align_sub_audio=True, detect_language='en', recogn_type=2, model_name='1.7B', shibie_audio=None, remove_noise=False, enable_diariz=False, nums_diariz=0, rephrase=0, fix_punc=0, subtitle_language=None, app_mode='biaozhun', subtitles='', targetdir_mp4='3X Milling Improvements_ENG_Finished-mp4/3X Milling Improvements_ENG_Finished.mp4', novoice_mp4='D:/videotrans/tmp/22188/d5ed0e4801/novoice.mp4', is_separate=False, embed_bgm=True, instrument='D:/videotrans/tmp/22188/d5ed0e4801/instrument.wav', vocal='D:/videotrans/tmp/22188/d5ed0e4801/vocal.wav', clear_cache=True, background_music='', subtitle_type=1, only_out_mp4=False, recogn2pass=False, output_srt=2, copysrt_rawvideo=False, loop_backaudio=1, backaudio_volume=0.8)


=
system:Windows-10-10.0.22631-SP0
version:v4.02
frozen:True
language:zh
root_dir:D:/videotrans

1 Replies

该错误 Batch size mismatch: audio=8, context=0 表明 Qwen-ASR(本地)在识别时,音频被切分为多个片段(这里是8个),但对应的上下文(如语言提示或说话人信息)却为空(0),导致批次不匹配。

核心原因

  1. 音频预处理异常:输入音频可能短于或接近零长度,或内容为纯静音/噪声,导致VAD(语音活动检测)未能提取出有效的语音片段,或提取出的片段持续时间极短,使模型内部上下文生成失败。
  2. VAD参数设置不当:若之前修改过高级选项中的最短语音持续毫秒最长语音持续秒数,可能导致截取的音频片段不合规(例如过短)。
  3. 模型或渠道自身Bug:该错误为Qwen-ASR模型内部报错,可能与特定环境或输入格式兼容性有关。

解决方法

  1. 检查原始视频音频:用播放器确认视频内是否包含清晰、连续的人声讲话,且时长至少数秒以上。
  2. 更换语音识别渠道:暂时改用 faster-whisper(本地)+ large-v3 模型(英语识别最佳选择)或 openai-whisper(本地),绕过该渠道的兼容性问题。
  3. 调整VAD参数(若曾修改过):

    • 进入 菜单→工具→高级选项→语音识别参数,确保 最短语音持续毫秒 为默认值(通常 1000),最长语音持续秒数 为默认值(30)。
    • 若未修改过,可尝试恢复默认设置:删除 videotrans/cfg.json 文件(备份后删除,重启软件)。
  4. 清理缓存后重试:勾选主界面的 清理已生成,重新执行任务。
  5. 若仍失败:请提供软件安装目录下 logs/ 文件夹中最新日志文件的前后更多上下文(至少30行),以便进一步定位。

请查阅相关文档:

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update. Dondate: ko-fi.com/jianchang512

Related resource