#871 语音识别阶段出错:[openai-whisper(本地)] 无效的值或参数:

*321:c6d* Posted at: 4 hours ago 👁6

语音识别阶段出错:[openai-whisper(本地)] 无效的值或参数:Expected parameter logits (Tensor of shape (1, 51865)) of distribution Categorical(logits: torch.Size([1, 51865])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:

tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0'):

Traceback (most recent call last):

File "videotrans\task\job.py", line 112, in run

File "videotrans\task\_speech2text.py", line 91, in recogn

File "videotrans\recognition\__init__.py", line 178, in run

File "videotrans\recognition\_base.py", line 78, in run

File "videotrans\recognition\_openai.py", line 70, in _exec

File "whisper\transcribe.py", line 295, in transcribe

result: DecodingResult = decode_with_fallback(mel_segment)

File "whisper\transcribe.py", line 201, in decode_with_fallback

decode_result = model.decode(segment, options)

File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(args, *kwargs)

File "whisper\decoding.py", line 824, in decode

result = DecodingTask(model, options).run(mel)

File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(args, *kwargs)

File "whisper\decoding.py", line 737, in run

tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)

File "whisper\decoding.py", line 703, in _main_loop

tokens, completed = self.decoder.update(tokens, logits, sum_logprobs)

File "whisper\decoding.py", line 283, in update

next_tokens = Categorical(logits=logits / self.temperature).sample()

File "torch\distributions\categorical.py", line 73, in __init__

super().__init__(batch_shape, validate_args=validate_args)

File "torch\distributions\distribution.py", line 72, in __init__

raise ValueError(

ValueError: Expected parameter logits (Tensor of shape (1, 51865)) of distribution Categorical(logits: torch.Size([1, 51865])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:

tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')

1 Replies

本条内容由 DeepSeek 回复
这个问题是由于使用 openai-whisper 本地模式进行语音识别时,模型计算过程中出现了 NaN(非数字)值导致的。这通常与 GPU 显存不足、模型文件损坏或 CUDA 兼容性问题有关。

解决方案:

1. 检查显存使用情况:如果您的显卡显存较小(如 4GB 或更少),在使用较大的 whisper 模型(如 large-v3)时可能会出现显存不足。请尝试:

  • 在语音识别设置中切换到更小的模型(如 base 或 small)
  • 或者使用 CPU 模式进行识别(取消选中"CUDA加速"选项)

2. 重新下载模型文件:模型文件可能损坏或不完整

  • 删除软件目录下的 models/whisper 文件夹
  • 重新启动软件,让系统自动重新下载模型

3. 更新驱动和依赖

  • 更新 NVIDIA 显卡驱动到最新版本
  • 确保安装了正确版本的 PyTorch 和 CUDA

4. 切换识别渠道

  • 暂时使用 faster-whisper 本地模式作为替代方案
  • 或者尝试其他识别渠道如阿里 FunASR

如果问题持续存在,请提供您的软件版本、操作系统和显卡型号,以及软件 logs 文件夹中对应日期的日志文件内容,以便进一步分析。

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update.

Donate: https://ko-fi.com/jianchang512

Trending Questions