#5360 语音识别阶段出错[openai-whisper(本地内置)] Expected parameter logits (Tensor of shape (1, 51866)) of distribution Categorical(logits

120.217* Posted at: 2 days ago

语音识别阶段出错[openai-whisper(本地内置)] Expected parameter logits (Tensor of shape (1, 51866)) of distribution Categorical(logits: torch.Size([1, 51866])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0'):Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 67, in openai_whisper
File "whisper\transcribe.py", line 295, in transcribe
File "whisper\transcribe.py", line 201, in decode_with_fallback
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 824, in decode
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 737, in run
File "whisper\decoding.py", line 703, in _main_loop
File "whisper\decoding.py", line 283, in update
File "torch\distributions\categorical.py", line 73, in init

super().__init__(batch_shape, validate_args=validate_args)

File "torch\distributions\distribution.py", line 72, in init

raise ValueError(

ValueError: Expected parameter logits (Tensor of shape (1, 51866)) of distribution Categorical(logits: torch.Size([1, 51866])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')

Traceback (most recent call last):

File "videotrans\task\job.py", line 35, in run

File "videotrans\task\job.py", line 100, in process_task

File "videotrans\task\trans_create.py", line 319, in recogn

File "videotrans\recognition\__init__.py", line 191, in run

File "videotrans\recognition\_base.py", line 90, in run

File "videotrans\recognition\_whisper.py", line 36, in _exec

File "videotrans\recognition\_whisper.py", line 85, in _openai

File "videotrans\configure\base.py", line 272, in _new_process

videotrans.configure.excepts.VideoTransError: Expected parameter logits (Tensor of shape (1, 51866)) of distribution Categorical(logits: torch.Size([1, 51866])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0'):Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 67, in openai_whisper
File "whisper\transcribe.py", line 295, in transcribe
File "whisper\transcribe.py", line 201, in decode_with_fallback
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 824, in decode
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 737, in run
File "whisper\decoding.py", line 703, in _main_loop
File "whisper\decoding.py", line 283, in update
File "torch\distributions\categorical.py", line 73, in init

super().__init__(batch_shape, validate_args=validate_args)

File "torch\distributions\distribution.py", line 72, in init

raise ValueError(

ValueError: Expected parameter logits (Tensor of shape (1, 51866)) of distribution Categorical(logits: torch.Size([1, 51866])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')
cfg=[TaskCfgVTT]当前工作模式: 转录并翻译字幕
原始输入文件名: E:/5/02.mp4,
输出结果保存到文件夹: E:/5/_video_out/02-mp4,
临时文件夹: D:/My Softwares/Pyvideotran 4.03-0622/tmp/6824/9bab9ebb53
已选 清理已存在
已选 启用CUDA加速
未选 降噪
语音识别:openai-whisper(本地内置), model_name: large-v3-turbo, 发音语言: 英语, 断句方式:默认断句
翻译渠道:M2M100(本地),原始语言:英语,目标语言:简体中文, 已选 发送完整字幕
输出字幕格式: 单字幕
代理地址:


=
system:Windows-10-10.0.26200-SP0
version:v4.03
frozen:True
language:zh
root_dir:D:/My Softwares/Pyvideotran 4.03-0622

1 Replies

语音识别阶段出现 nan 值,这是 Torch 推理过程中发生了数值溢出或模型计算不稳定,通常由以下原因导致:

  1. GPU 显存不足large-v3-turbo 模型在 openai-whisper 中需要较大显存(至少 6-8G),显存耗尽后计算产生 nan。
  2. 音频质量异常:音频过短、静音、或完全被噪声淹没,导致模型解码时无法置信而输出 nan。
  3. 模型文件损坏:下载的 Whisper 模型文件不完整。
  4. 计算精度问题:使用 CUDA 时 float32 精度下可能不稳定,可改为 float16

解决方案(按顺序尝试)

  1. 切换为更节省资源的方案

    • 将语音识别渠道由 openai-whisper(本地) 改为 faster-whisper(本地)(速度更快,显存占用更低)。
    • 模型选择 large-v3-turbomedium
  2. 降低计算负载
    进入 菜单 -> 工具 -> 高级选项 -> 语音识别参数

    • 计算数据类型 设为 float16(若仍报错可改为 int8)。
    • beam_sizebest_of 都改为 1
    • 关闭 启用上下文感知
  3. 检查音频质量

    • 用播放器试听原始视频/音频,确认存在清晰的人类说话声,且结尾无过长静音。
    • 若非独立音频,可先使用 批量为语音转字幕 功能单独测试该音频。
  4. 清理并重新下载模型

    • 关闭软件,删除 models/ 目录下 openai-whisper-large-v3-turbo.pt 文件(或整个模型的文件夹)。
    • 重新运行软件,让其自动重新下载。
  5. 关闭其他 GPU 程序

    • 任务管理器中关闭占用显存的其他程序(如浏览器、游戏、绘图软件等)。
  6. 最后备用方案

    • 若不使用 CUDA,临时取消勾选 CUDA加速,改用 CPU 处理(速度极慢,仅测试是否是 GPU 问题)。

按照以上步骤操作后,通常可解决 nan 报错。若仍失败,请提供 logs/ 目录下的最新日志文件。

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update. Dondate: ko-fi.com/jianchang512

Related resource