#4824 TaskCfgSTT(uuid='1a66eb0958', name='C:/Users/ilove/Desktop/temp/animBot- Getting Started.mp4', dirname='C:/Users/ilove/D

2409:8a02* Posted at: 4 hours ago

语音识别阶段出错[openai-whisper(本地)] Expected parameter logits (Tensor of shape (1, 51865)) of distribution Categorical(logits: torch.Size([1, 51865])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0'):Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 102, in openai_whisper
File "whisper\transcribe.py", line 295, in transcribe
File "whisper\transcribe.py", line 201, in decode_with_fallback
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 824, in decode
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 737, in run
File "whisper\decoding.py", line 703, in _main_loop
File "whisper\decoding.py", line 283, in update
File "torch\distributions\categorical.py", line 73, in init

super().__init__(batch_shape, validate_args=validate_args)

File "torch\distributions\distribution.py", line 72, in init

raise ValueError(

ValueError: Expected parameter logits (Tensor of shape (1, 51865)) of distribution Categorical(logits: torch.Size([1, 51865])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')

Traceback (most recent call last):

File "videotrans\task\job.py", line 54, in run

File "videotrans\task\job.py", line 119, in process_task

File "videotrans\task\speech2text.py", line 125, in recogn

File "videotrans\recognition\__init__.py", line 190, in run

File "videotrans\recognition\_base.py", line 93, in run

File "videotrans\recognition\_whisper.py", line 33, in _exec

File "videotrans\recognition\_whisper.py", line 76, in _openai

File "videotrans\configure\base.py", line 252, in _new_process

videotrans.configure.excepts.VideoTransError: Expected parameter logits (Tensor of shape (1, 51865)) of distribution Categorical(logits: torch.Size([1, 51865])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0'):Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 102, in openai_whisper
File "whisper\transcribe.py", line 295, in transcribe
File "whisper\transcribe.py", line 201, in decode_with_fallback
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 824, in decode
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 737, in run
File "whisper\decoding.py", line 703, in _main_loop
File "whisper\decoding.py", line 283, in update
File "torch\distributions\categorical.py", line 73, in init

super().__init__(batch_shape, validate_args=validate_args)

File "torch\distributions\distribution.py", line 72, in init

raise ValueError(

ValueError: Expected parameter logits (Tensor of shape (1, 51865)) of distribution Categorical(logits: torch.Size([1, 51865])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')
TaskCfgSTT(uuid='1a66eb0958', name='C:/Users/ilove/Desktop/temp/animBot- Getting Started.mp4', dirname='C:/Users/ilove/Desktop/temp', noextname='animBot- Getting Started', basename='animBot- Getting Started.mp4', ext='mp4', target_dir='D:/Download/win-pyvideotrans-v4.00-528/output/recogn', cache_folder='D:/Programs/win-pyvideotrans-v4.00-528/tmp/7336/1a66eb0958', is_cuda=True, source_language=None, source_language_code=None, source_sub=None, source_wav=None, source_wav_output=None, target_language=None, target_language_code=None, target_sub='D:/Download/win-pyvideotrans-v4.00-528/output/recogn/animBot- Getting Started.srt', target_wav=None, target_wav_output=None, detect_language='en', recogn_type=1, model_name='small', shibie_audio='D:/Programs/win-pyvideotrans-v4.00-528/tmp/7336/1a66eb0958/animBot- Getting Started-1780203873.804824.wav', remove_noise=False, enable_diariz=False, nums_diariz=0, rephrase=0, fix_punc=False)


=
system:Windows-10-10.0.22631-SP0
version:v4.00
frozen:True
language:zh
root_dir:D:/Programs/win-pyvideotrans-v4.00-528

1 Replies

这个错误是 Whisper 模型在 GPU 推理过程中产生了 NaN(无效数值),通常由以下原因导致:

  1. 音频文件异常:视频音轨可能为空、静音、无声,或者格式/采样率不符合 Whisper 要求。
  2. GPU 环境不稳定:CUDA 或 cuDNN 版本不兼容,或显存不足导致计算溢出。
  3. 模型与硬件冲突:某些型号的 NVIDIA 显卡(如 50 系)搭配旧版 CUDA 会出现此类问题。

解决方案(按顺序尝试):

  1. 检查视频音频
    在播放器中打开视频,确认有正常的人类说话声,且不是完全静音。若无人声,请使用其他视频测试。
  2. 关闭 CUDA 加速测试
    在主界面取消勾选 CUDA加速,使用 CPU 重新运行一次。如果不再报错,说明问题出在 GPU 环境。
  3. 检查 CUDA 版本
    在 CMD 中运行 nvcc -V 查看 CUDA 编译器版本,必须 ≥ 12.8
    运行 nvidia-smi 查看驱动支持的最高 CUDA 版本,驱动需要支持 CUDA 12.8+。
    如果版本不符,请重新安装 CUDA 12.8 和 cuDNN 9.x(安装教程)。
  4. 降低模型规模和精度
    在菜单 工具 → 高级选项 → 语音识别参数 中修改:

    • CUDA数据类型 改为 float16int8
    • beam_size 改为 1
    • best_of 改为 1
  5. 切换识别渠道
    将语音识别渠道从 openai-whisper(本地) 改为 faster-whisper(本地),使用 large-v3 模型,其兼容性和稳定性更好。
  6. 简化文件路径
    将视频文件重命名为简短英文名(如 test.mp4),并移动到浅层目录(如 D:/videos/),避免路径中有空格或特殊符号(当前路径包含空格和短横线,可能触发文件读取问题)。

如果以上步骤仍然无效,请提供视频文件的时长和大小信息,以及 logs/ 文件夹内最新日志的底部 30 行内容,以便进一步分析。

请查阅相关文档:

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update.

Related resource