#4528 TaskCfgSTT(is_cuda=True, uuid='2ea1a37a28', cache_folder='J:/pyvideotrans-v3.99-0508/tmp/6424/2ea1a37a28', target_dir='J

103.62* Posted at: 3 hours ago 👁14

语音识别阶段出错 [openai-whisper(本地)] Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 109, in openai_whisper
File "whisper\transcribe.py", line 295, in transcribe
File "whisper\transcribe.py", line 201, in decode_with_fallback
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 824, in decode
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 737, in run
File "whisper\decoding.py", line 703, in _main_loop
File "whisper\decoding.py", line 283, in update
File "torch\distributions\categorical.py", line 73, in init

super().__init__(batch_shape, validate_args=validate_args)

File "torch\distributions\distribution.py", line 72, in init

raise ValueError(

ValueError: Expected parameter logits (Tensor of shape (1, 51866)) of distribution Categorical(logits: torch.Size([1, 51866])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')

Traceback (most recent call last):
File "videotrans\task\job.py", line 105, in run
File "videotrans\task\_speech2text.py", line 191, in recogn
File "videotrans\recognition\__init__.py", line 293, in run
File "videotrans\recognition\_base.py", line 143, in run
File "videotrans\recognition\_overall.py", line 31, in _exec
File "videotrans\recognition\_overall.py", line 74, in _openai
File "videotrans\configure\_base.py", line 289, in _new_process
RuntimeError: Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 109, in openai_whisper
File "whisper\transcribe.py", line 295, in transcribe
File "whisper\transcribe.py", line 201, in decode_with_fallback
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 824, in decode
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 737, in run
File "whisper\decoding.py", line 703, in _main_loop
File "whisper\decoding.py", line 283, in update
File "torch\distributions\categorical.py", line 73, in init

super().__init__(batch_shape, validate_args=validate_args)

File "torch\distributions\distribution.py", line 72, in init

raise ValueError(

ValueError: Expected parameter logits (Tensor of shape (1, 51866)) of distribution Categorical(logits: torch.Size([1, 51866])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')
TaskCfgSTT(is_cuda=True, uuid='2ea1a37a28', cache_folder='J:/pyvideotrans-v3.99-0508/tmp/6424/2ea1a37a28', target_dir='J:/pyvideotrans-v3.99-0508/output/recogn', source_language=None, source_language_code=None, source_sub=None, source_wav=None, source_wav_output=None, target_language=None, target_language_code=None, target_sub='J:/pyvideotrans-v3.99-0508/output/recogn/测试用.srt', target_wav=None, target_wav_output=None, name='J:/测试用.mp4', noextname='测试用', basename='测试用.mp4', ext='mp4', dirname='J:/', shound_del_name=None, detect_language='zh-cn', recogn_type=1, model_name='large-v3', shibie_audio='J:/pyvideotrans-v3.99-0508/tmp/6424/2ea1a37a28/测试用-1778518991.560455.wav', remove_noise=True, enable_diariz=False, nums_diariz=0, rephrase=0, fix_punc=False)


=
system:Windows-10-10.0.19041-SP0
version:v3.99
frozen:True
language:zh
root_dir:J:/pyvideotrans-v3.99-0508

7 Replies

这是由于显存不足或 CUDA 环境不兼容导致模型解码时输出 NaN 值。具体原因可能是:

  • 使用 large-v3 模型且开启了降噪,显存占用过高;
  • CUDA 版本或 cuDNN 版本与软件内置的 12.8 不匹配;
  • 视频文件路径包含中文字符,可能引发 ffmpeg 异常。

解决方案(按顺序尝试):

  1. 取消降噪:在主界面取消勾选“降噪”选项,然后重试。
  2. 调整高级参数:点击菜单 工具/选项高级选项语音识别参数,进行以下修改:

    • CUDA数据类型 → 选择 float16int8
    • 识别准确度beam_size → 设为 1
    • 识别准确度best_of → 设为 1
  3. 改用 smaller 模型:将模型从 large-v3 切换为 mediumsmall
  4. 使用 faster-whisper:在语音识别渠道中选择 faster-whisper(本地),它比 openai-whisper 更省显存且更稳定。
  5. 检查 CUDA 环境:确保 CUDA 版本 ≥12.8,cuDNN ≥9.11,并在命令行运行 nvcc -Vnvidia-smi 确认。
  6. 简化文件路径:将视频文件移动到纯英文目录(如 D:\videos)并重命名为简短英文名。

请查阅相关文档:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:42:46_Pacific_Standard_Time_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0

你的 CUDA 12.8 环境已正确安装,但 openai-whisper 在 large-v3 模型上输出 NaN,通常由以下原因导致:显存不足(large-v3 需要 10GB+)、FP16 数值溢出、或 50 系显卡兼容性问题。

解决方案(按优先级尝试):

  1. 切换到 faster-whisper(更稳定、省显存)
    主界面语音识别渠道改为 faster-whisper(本地),模型同样选 large-v3,该实现不易出现 NaN。
  2. 取消“降噪”并调整高级参数
    取消勾选主界面的“降噪”。
    点击菜单 工具/选项高级选项语音识别参数

    • CUDA数据类型 → 选择 float16int8
    • beam_size → 设为 1
    • best_of → 设为 1
    • 上下文感知 → 设为 false
    • 如果未恢复,勾选 Whisper预分割音频?
  3. 改用较小模型
    将模型从 large-v3 切换为 mediumsmall
  4. 检查显卡驱动与显存
    运行 nvidia-smi 确认显存是否充足(large-v3 最低 10GB),驱动更新至最新版。
  5. 启用 int8 量化
    在高级选项中 CUDA数据类型int8,可大幅降低显存消耗并避免 NaN。

请查阅相关文档:

我的显卡是gtx1650

你的 GTX 1650 显存通常只有 4GB(部分版本6GB),而 large-v3 模型至少需要 10GB 显存,因此一定会出现 NaN 错误。请直接采用以下方案:

  1. 切换语音识别渠道为 faster-whisper(本地),它比 openai-whisper 更省显存。
  2. 模型改为 mediumsmall(不要使用 large 系列)。
  3. 菜单→工具→高级选项→语音识别参数 中:

    • CUDA数据类型 选择 int8
    • beam_size 设为 1
    • best_of 设为 1
    • 上下文感知 设为 false
    • 勾选 Whisper预分割音频?
  4. 取消勾选“降噪”,降噪会增加显存占用。
  5. 确保视频文件路径为纯英文且简短。

如果仍出现错误,可暂时关闭 CUDA 加速(不勾选主界面 CUDA加速),使用 CPU 运行,速度虽慢但能完成。


Application started at 2026-05-12 01:00:50

First searching GPU...
[DEBUG] 返回缓存的编解码器 Windows-264: libx264
[DEBUG] 返回缓存的编解码器 Windows-265: libx265
NVIDIA_GPU_NUMS=1
Found 1 GPUs, cost=5s
start workers...
process_max_gpu=1
multi_gpus=False
Concurrent task_nums=1
process_max=1
start 9 jobs
ended workers
[DEBUG] [job] 进入执行语音识别阶段 trk.cfg=TaskCfgSTT(is_cuda=True, uuid='2ea1a37a28', cache_folder='J:/pyvideotrans-v3.99-0508/tmp/6424/2ea1a37a28', target_dir='J:/pyvideotrans-v3.99-0508/output/recogn', source_language=None, source_language_code=None, source_sub=None, source_wav=None, source_wav_output=None, target_language=None, target_language_code=None, target_sub='J:/pyvideotrans-v3.99-0508/output/recogn/测试用.srt', target_wav=None, target_wav_output=None, name='J:/测试用.mp4', noextname='测试用', basename='测试用.mp4', ext='mp4', dirname='J:/', shound_del_name=None, detect_language='zh-cn', recogn_type=1, model_name='large-v3', shibie_audio='J:/pyvideotrans-v3.99-0508/tmp/6424/2ea1a37
......
ape (1, 51866)) of distribution Categorical(logits: torch.Size([1, 51866])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')

[ERROR] Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 109, in openai_whisper
File "whisper\transcribe.py", line 295, in transcribe
File "whisper\transcribe.py", line 201, in decode_with_fallback
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 824, in decode
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 737, in run
File "whisper\decoding.py", line 703, in _main_loop
File "whisper\decoding.py", line 283, in update
File "torch\distributions\categorical.py", line 73, in init

super().__init__(batch_shape, validate_args=validate_args)

File "torch\distributions\distribution.py", line 72, in init

raise ValueError(

ValueError: Expected parameter logits (Tensor of shape (1, 51866)) of distribution Categorical(logits: torch.Size([1, 51866])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')
Traceback (most recent call last):
File "videotrans\task\job.py", line 105, in run
File "videotrans\task\_speech2text.py", line 191, in recogn
File "videotrans\recognition\__init__.py", line 293, in run
File "videotrans\recognition\_base.py", line 143, in run
File "videotrans\recognition\_overall.py", line 31, in _exec
File "videotrans\recognition\_overall.py", line 74, in _openai
File "videotrans\configure\_base.py", line 289, in _new_process
RuntimeError: Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 109, in openai_whisper
File "whisper\transcribe.py", line 295, in transcribe
File "whisper\transcribe.py", line 201, in decode_with_fallback
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 824, in decode
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 737, in run
File "whisper\decoding.py", line 703, in _main_loop
File "whisper\decoding.py", line 283, in update
File "torch\distributions\categorical.py", line 73, in init

super().__init__(batch_shape, validate_args=validate_args)

File "torch\distributions\distribution.py", line 72, in init

raise ValueError(

ValueError: Expected parameter logits (Tensor of shape (1, 51866)) of distribution Categorical(logits: torch.Size([1, 51866])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')

你的 GTX 1650 显存仅 4-6GB,而 openai-whisper large-v3 模型至少需要 10GB 显存,因此模型加载后直接溢出 NaN。这是硬件限制,请按以下步骤操作:

  1. 立即更换识别渠道:在主界面语音识别渠道下拉列表中,选择 faster-whisper(本地),该实现比 openai-whisper 省显存。
  2. 改用更小的模型:在 faster-whisper 的模型下拉中选择 smallbase(不要选 large 系列任何模型)。
  3. 关闭降噪:取消勾选主界面右上角的“降噪”选项,降噪会大幅增加显存占用。
  4. 调整高级参数:点击菜单 工具/选项 → 高级选项 → 语音识别参数

    • CUDA数据类型 → 选择 int8
    • beam_size → 设为 1
    • best_of → 设为 1
    • 上下文感知 → 设为 false
    • 勾选 Whisper预分割音频?
  5. 确保路径不含中文:将视频文件移动到纯英文目录,例如 D:\videos\test.mp4,不要使用中文路径。

如果仍然报错,取消勾选主界面的“CUDA加速”,使用 CPU 运行(速度较慢但能完成)。

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update.

Related resource