#5233 TaskCfgVTT(uuid='982d1065b7', name='E:/55.mp4', dirname='E:/', noextname='55', basename='55.mp4', ext='mp4', target_dir=

120.217* Posted at: 1 day ago

Expected parameter logits (Tensor of shape (1, 51864)) of distribution Categorical(logits: torch.Size([1, 51864])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0'):Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 102, in openai_whisper
File "whisper\transcribe.py", line 295, in transcribe
File "whisper\transcribe.py", line 201, in decode_with_fallback
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 824, in decode
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 737, in run
File "whisper\decoding.py", line 703, in _main_loop
File "whisper\decoding.py", line 283, in update
File "torch\distributions\categorical.py", line 73, in init

super().__init__(batch_sh

......
"torch\distributions\distribution.py", line 72, in init

raise ValueError(

ValueError: Expected parameter logits (Tensor of shape (1, 51864)) of distribution Categorical(logits: torch.Size([1, 51864])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')
[openai-whisper(本地内置), 微软(免费), Edge-TTS(免费)]
Traceback (most recent call last):
File "videotrans\task\only_one.py", line 47, in run
File "videotrans\task\trans_create.py", line 322, in recogn
File "videotrans\recognition\__init__.py", line 190, in run
File "videotrans\recognition\_base.py", line 94, in run
File "videotrans\recognition\_whisper.py", line 35, in _exec
File "videotrans\recognition\_whisper.py", line 84, in _openai
File "videotrans\configure\base.py", line 258, in _new_process
videotrans.configure.excepts.VideoTransError: Expected parameter logits (Tensor of shape (1, 51864)) of distribution Categorical(logits: torch.Size([1, 51864])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0'):Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 102, in openai_whisper
File "whisper\transcribe.py", line 295, in transcribe
File "whisper\transcribe.py", line 201, in decode_with_fallback
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 824, in decode
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 737, in run
File "whisper\decoding.py", line 703, in _main_loop
File "whisper\decoding.py", line 283, in update
File "torch\distributions\categorical.py", line 73, in init

super().__init__(batch_shape, validate_args=validate_args)

File "torch\distributions\distribution.py", line 72, in init

raise ValueError(

ValueError: Expected parameter logits (Tensor of shape (1, 51864)) of distribution Categorical(logits: torch.Size([1, 51864])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')
TaskCfgVTT(uuid='982d1065b7', name='E:/55.mp4', dirname='E:/', noextname='55', basename='55.mp4', ext='mp4', target_dir='E:/_video_out/55-mp4', cache_folder='D:/My Softwares/视频翻译配音 Pyvideotran 3.98/tmp/13864/982d1065b7', is_cuda=True, source_language='英语', source_language_code='en', source_sub='E:/_video_out/55-mp4/en.srt', source_wav='D:/My Softwares/视频翻译配音 Pyvideotran 3.98/tmp/13864/982d1065b7/en.wav', source_wav_output='E:/_video_out/55-mp4/en.m4a', target_language='简体中文', target_language_code='zh-cn', target_sub='E:/_video_out/55-mp4/zh-cn.srt', target_wav='D:/My Softwares/视频翻译配音 Pyvideotran 3.98/tmp/13864/982d1065b7/target.wav', target_wav_output='E:/_video_out/55-mp4/zh-cn.m4a', translate_type=1, tts_type=0, volume='+0%', pitch='+0Hz', voice_rate='+0%', voice_role='No', voice_autorate=True, video_autorate=False, remove_silent_mid=False, align_sub_audio=True, detect_language='en', recogn_type=1, model_name='medium.en', shibie_audio=None, remove_noise=False, enable_diariz=False, nums_diariz=0, rephrase=0, fix_punc=0, subtitle_language=None, app_mode='biaozhun', subtitles='', targetdir_mp4='E:/_video_out/55-mp4/55.mp4', novoice_mp4='D:/My Softwares/视频翻译配音 Pyvideotran 3.98/tmp/13864/982d1065b7/novoice.mp4', is_separate=False, embed_bgm=True, instrument=None, vocal='D:/My Softwares/视频翻译配音 Pyvideotran 3.98/tmp/13864/982d1065b7/vocal.wav', clear_cache=True, background_music='', subtitle_type=1, only_out_mp4=False, recogn2pass=False, output_srt=0, copysrt_rawvideo=False, loop_backaudio=1, backaudio_volume=0.8)


=
system:Windows-10-10.0.26200-SP0
version:v4.02
frozen:True
language:zh
root_dir:D:/My Softwares/视频翻译配音 Pyvideotran 3.98

3 Replies

这是 openai-whisper(本地)解码时 GPU 显存不足或模型数值不稳定导致的 NaN 崩溃,常见于 medium.en 模型在显存紧张时发生。

解决方案(按顺序尝试):

  1. 切换到 faster-whisper(本地):在语音识别渠道选择 faster-whisper(本地) 并选用 large-v3 模型,更稳定且对显存管理更好。
  2. 降低显存占用

    • 若继续用 openai-whisper,进入 菜单 -> 工具 -> 高级选项 -> 语音识别参数,将 beam_sizebest_of 均设为 1计算数据类型 改为 float16int8
  3. 检查 CUDA 环境:确认已安装 CUDA 12.8+ 和 cuDNN 9.11+,且显卡驱动为最新。
  4. 改用更小模型:在 openai-whisper 中将模型改为 smallbase(仅限英文时可用 small.en)。
  5. 尝试 CPU 模式:取消勾选 CUDA 加速,用 CPU 运行一次以验证是否为显卡相关故障。

如果以上仍未解决,请提供 logs/ 文件夹中最新的日志文件供进一步排查。


Application started at 2026-06-16 12:41:37

[DEBUG] 返回缓存的编解码器 Windows-264: h264_nvenc
[DEBUG] 返回缓存的编解码器 Windows-265: hevc_nvenc
[DEBUG] NVIDIA_GPU_NUMS=1
[DEBUG] Found 1 GPUs, cost=0s
[DEBUG] 上次缓存的角色:No,字幕嵌入类型:1,发音语言:英语,目标语言:简体中文,目标语言代码:zh-cn,模型:large-v3,TTS渠道[0]
[DEBUG] process_max_gpu=1,is_multi_gpus=False
[DEBUG] Concurrent task_nums=1, process_max_cpu=0
[DEBUG] start 9 jobs
[DEBUG] app_cfg.proxy=''
[DEBUG] 无法连接 huggingface.co, 使用镜像替换: hf-mirror.com
[DEBUG] [TransCreate]最终配置信息:self=TransCreate(uuid='982d1065b7', proxy_str=None, last_down_time=0, precent=1, hasend=False, should_recogn=True, should_trans=True, should_dubbing=False, should_separate=False, should_hebing=False, source_srt_list=[], target_srt_list=[], video_time=0.0, is_copy_video=False, video_codec_num=264, ignore_align=False, is_audio_trans=False, clone_ref='', cost_duration=1781584930.761656, should_recogn2=False)
self.cfg=TaskCfgVTT(uuid='982d1065b7', name='E:/55.mp4', dirname='E:/', noextname='55', basename='55.mp4', ext='mp4', target_dir='E:/_video_out/55-mp4', cache_folder='D:/My Softwares/视频翻译配音 Pyvideotran 3.98/tmp/7932/982d1065b7', is_cuda=True, source_language='英语', source_language_code='en', source_sub='E:/_video_out/55-mp4/en.srt', source_wav='D:/My Softwares/视频翻译配音 Pyvideotran 3.98/tmp/7932/982d1065b7/en.wav', source_wav_output='E:/_video_out/55-mp4/en.m4a', target_language='简体中文', target_language_code='zh-cn', target_sub='E:/_video_out/55-mp4/zh-cn.srt', target_wav='D:/My Softwares/视频翻译配音 Pyvideotran 3.98/tmp/7932/982d1065b7/target.wav', target_wav_output='E:/_video_out/55-mp4/zh-cn.m4a', translate_type=1, tts_type=0, volume='+0%', pitch='+0Hz', voice_rate='+0%', voice_role='No', voice_autorate=False, video_autorate=False, remove_silent_mid=False, align_sub_audio=True, detect_language='en', recogn_type=0, model_name='large-v3', shibie_audio=None, remove_noise=False, enable_diariz=False, nums_diariz=0, rephrase=0, fix_punc=0, subtitle_language=None, app_mode='tiqu', subtitles='', targetdir_mp4='E:/_video_out/55-mp4/55.mp4', novoice_mp4='D:/My Softwares/视频翻译配音 Pyvideotran 3.98/tmp/7932/982d1065b7/novoice.mp4', is_separate=False, embed_bgm=True, instrument=None, vocal=None, clear_cache=True, background_music='', subtitle_type=0, only_out_mp4=False, recogn2pass=False, output_srt=0, copysrt_rawvideo=False, loop_backaudio=1, backaudio_volume=0.8)
[DEBUG] The file info after process:result={'video_fps': 60.0, 'r_frame_rate': 60.0, 'video_codec_name': 'h264', 'audio_codec_name': 'aac', 'width': 1280, 'height': 720, 'time': 5804716, 'streams_len': 2, 'streams_audio': 1, 'video_streams': 1, 'color': 'yuv420p'}

[DEBUG] 开始语音识别:渠道0
[DEBUG] app_cfg.proxy=None
[DEBUG] 无法连接 huggingface.co, 使用镜像替换: hf-mirror.com
[DEBUG] 从 modelscope.cn 下载模型 model_id='large-v3'
2026-06-16 12:42:17,291 - modelscope - WARNING - We can not confirm the cached file is for revision: master
[DEBUG] [新进程任务 开始:STT use large-v3]
[DEBUG] 任务参数:kwargs={'detect_language': 'en', 'model_name': 'large-v3', 'logs_file': 'D:/My Softwares/视频翻译配音 Pyvideotran 3.98/tmp/7932/982d1065b7/faster-en-1781584937.2920804.log', 'is_cuda': True, 'no_speech_threshold': 0.6, 'condition_on_previous_text': True, 'speech_timestamps': None, 'audio_file': 'D:/My Softwares/视频翻译配音 Pyvideotran 3.98/tmp/7932/982d1065b7/en.wav', 'local_dir': 'D:/My Softwares/视频翻译配音 Pyvideotran 3.98/models/models--Systran--faster-whisper-large-v3', 'compute_type': 'int8', 'jianfan': False, 'audio_duration': 5804736, 'hotwords': '', 'prompt': '', 'beam_size': 5, 'best_of': 5, 'temperature': '', 'repetition_penalty': 1.0, 'compression_ratio_threshold': 2.4, 'max_speech_ms': 5000, 'device_index': 0}
[DEBUG] GPU进程池:max_workers=1
[DEBUG] [faster_whisper]加载模型:当前 is_cuda=True,_compute_type='int8'

当前日志已正确切换为 faster-whisper(本地) + large-v3 模型,并使用 int8 计算类型 + CUDA 加速,这是解决之前 NaN 错误的标准配置。在 显存 ≥ 10GB 的情况下此配置稳定可靠。

若仍遇到 NaN 或 out of memory,请按以下优先级调整:

  1. 降模型至 large-v3-turbomedium(更省显存,速度更快)
  2. 进入 高级选项 → 语音识别参数,将 beam_sizebest_of 设为 1
  3. 关闭 启用上下文感知,并将 重复惩罚 设为 1.2
  4. 确认 CUDA 12.8+ 和 cuDNN 9.11+ 已正确安装,且显卡驱动为最新(通过 nvidia-smi 查看)

当前日志显示模型正在从 modelscope 下载,若下载完成后仍无法正常识别,请提供后续报错日志。

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update. Dondate: ko-fi.com/jianchang512

Related resource