#4187 TaskCfgSTT(is_cuda=True, uuid='ef179ca233', cache_folder='D:/win-pyvideotrans-v3.99-417/tmp/5052/ef179ca233', target_dir

240e:44d* Posted at: 2 months ago

语音识别阶段出错 [openai-whisper(本地)] Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 112, in openai_whisper
File "whisper\transcribe.py", line 295, in transcribe
File "whisper\transcribe.py", line 201, in decode_with_fallback
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 824, in decode
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 737, in run
File "whisper\decoding.py", line 703, in _main_loop
File "whisper\decoding.py", line 283, in update
File "torch\distributions\categorical.py", line 73, in init

super().__init__(batch_shape, validate_args=validate_args)

File "torch\distributions\distribution.py", line 72, in init

raise ValueError(

ValueError: Expected parameter logits (Tensor of shape (1, 51866)) of distribution Categorical(logits: torch.Size([1, 51866])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')

Traceback (most recent call last):
File "videotrans\task\job.py", line 105, in run
File "videotrans\task\_speech2text.py", line 159, in recogn
File "videotrans\recognition\__init__.py", line 293, in run
File "videotrans\recognition\_base.py", line 143, in run
File "videotrans\recognition\_overall.py", line 31, in _exec
File "videotrans\recognition\_overall.py", line 73, in _openai
File "videotrans\configure\_base.py", line 289, in _new_process
RuntimeError: Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 112, in openai_whisper
File "whisper\transcribe.py", line 295, in transcribe
File "whisper\transcribe.py", line 201, in decode_with_fallback
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 824, in decode
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "whisper\decoding.py", line 737, in run
File "whisper\decoding.py", line 703, in _main_loop
File "whisper\decoding.py", line 283, in update
File "torch\distributions\categorical.py", line 73, in init

super().__init__(batch_shape, validate_args=validate_args)

File "torch\distributions\distribution.py", line 72, in init

raise ValueError(

ValueError: Expected parameter logits (Tensor of shape (1, 51866)) of distribution Categorical(logits: torch.Size([1, 51866])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')
TaskCfgSTT(is_cuda=True, uuid='ef179ca233', cache_folder='D:/win-pyvideotrans-v3.99-417/tmp/5052/ef179ca233', target_dir='D:/win-pyvideotrans-v3.99-417/output/recogn', source_language=None, source_language_code=None, source_sub=None, source_wav=None, source_wav_output=None, target_language=None, target_language_code=None, target_sub='D:/win-pyvideotrans-v3.99-417/output/recogn/Mom Shared Bed with her Son and has a Squirting Orgasm while her Dad wasnt Looking HD 1080p.srt', target_wav=None, target_wav_output=None, name='E:/DownLoad/DCL/Mom Shared Bed with her Son and has a Squirting Orgasm while her Dad wasnt Looking HD 1080p.mp4', noextname='Mom Shared Bed with her Son and has a Squirting Orgasm while her Dad wasnt Looking HD 1080p', basename='Mom Shared Bed with her Son and has a Squirting Orgasm while her Dad wasnt Looking HD 1080p.mp4', ext='mp4', dirname='E:/DownLoad/DCL', shound_del_name=None, detect_language='auto', recogn_type=1, model_name='large-v3-turbo', shibie_audio='D:/win-pyvideotrans-v3.99-417/tmp/5052/ef179ca233/remove_noise.wav', remove_noise=True, enable_diariz=False, nums_diariz=0, rephrase=0, fix_punc=False)

=
system:Windows-10-10.0.26100-SP0
version:v3.99
frozen:True
language:zh
root_dir:D:/win-pyvideotrans-v3.99-417

Floor #1 DeepSeek reply at 2 months ago

这是由于 GPU 显存或内存不足，导致 Whisper 模型在 CUDA 上计算时产生了 NaN（非数字）值。

解决方案：

降低模型大小：将 openai-whisper(本地) 的模型从 large-v3-turbo 更换为更小的模型，例如 medium、small 或 base。large-v3-turbo 对显存要求较高。
调整高级设置：在菜单栏 工具/选项 -> 高级选项 的【语音识别参数】区域进行如下修改，以降低显存占用：
- CUDA数据类型：如果当前是 float32，尝试改为 float16。
- 识别准确度beam_size：将 5 改为 1。
- 识别准确度best_of：将 5 改为 1。
- 启用上下文感知：确保此项为 false（未勾选）。
关闭降噪：从错误信息 shibie_audio='.../remove_noise.wav' 和 remove_noise=True 可以看出，你开启了降噪功能。降噪会增加处理步骤和资源消耗。尝试在主界面取消勾选 降噪 选项后重试。
检查文件路径：你的视频文件名非常长且包含空格，这可能导致处理异常。请将视频文件重命名为一个简短的英文名称（不含空格和特殊符号），并移动到较浅的目录（如 D:\video.mp4）再进行处理。

如果以上调整后问题依旧，可能是视频本身或当前硬件环境与 openai-whisper 模型存在兼容性问题。可以尝试切换到 faster-whisper(本地) 渠道并选择 large-v3 模型进行识别。

#4187 TaskCfgSTT(is_cuda=True, uuid='ef179ca233', cache_folder='D:/win-pyvideotrans-v3.99-417/tmp/5052/ef179ca233', target_dir

1 Replies

请查阅相关文档：

Post Your Reply

Related resource