#1713 index 1218 is out of bounds for dimension 0 with size 1218

60.7* Posted at: 3 months ago 👁124

index 1218 is out of bounds for dimension 0 with size 1218

=
system:Windows-10-10.0.26200-SP0
version:v3.86
frozen:True
language:zh
root_dir:H:/win-PatchUpdate-v3.86

Floor #1 DeepSeek AI reply at 3 months ago

这个错误通常发生在使用 Faster-Whisper 模型进行语音识别时，模型加载或处理音频文件出现了问题。核心原因是模型文件不完整或损坏。

请按以下步骤解决：

清理并重新下载模型文件：
- 关闭软件。
- 前往模型缓存目录 C:\Users\[你的用户名]\.cache\huggingface\hub\models--Systran--faster-whisper-（或类似路径）。
- 删除 faster-whisper- 开头的整个文件夹（例如 faster-whisper-large-v3）。
- 重新启动软件，它会自动重新下载模型。请确保网络通畅。
更换更小的模型（如果第一步无效）：
- 在主界面第2行“语音识别”中，将模型从 large-v3 或 large-v2 更换为 medium 或 small，然后重试。小模型文件更小，出错概率低。
检查音频/视频文件：
- 尝试处理另一个不同的、时长较短的音视频文件，以排除是特定文件损坏导致的问题。
使用备用识别引擎：
- 如果问题持续，可以暂时切换到其他识别引擎，如 阿里FunASR（本地）（对中文友好）或 OpenAI-Whisper（API）（需要API Key）。

如果以上步骤均无法解决，请提供 H:/win-PatchUpdate-v3.86/logs/ 目录下最新的日志文件（以 .log 结尾），以便进行更深入的分析。

Floor #2 60.7* reply at 3 months ago

Downloading Model from https://www.modelscope.cn to directory: H:/win-PatchUpdate-v3.86/models\models\iic\speech_fsmn_vad_zh-cn-16k-common-pytorch
2025-12-01 20:15:03,762 - modelscope - INFO - Use user-specified model revision: v2.0.4
Downloading Model from https://www.modelscope.cn to directory: H:/win-PatchUpdate-v3.86/models\models\iic\punc_ct-transformer_cn-en-common-vocab471067-large
2025-12-01 20:15:11,710 - modelscope - INFO - Use user-specified model revision: v2.0.4
2025-12-01 20:15:17,217 - ERROR - index 1218 is out of bounds for dimension 0 with size 1218
Traceback (most recent call last):
File "videotrans\recognition\_base.py", line 78, in run
File "videotrans\recognition\_funasr.py", line 57, in _exec

File "H:\win-PatchUpdate-v3.86\_internal\funasr\models\fsmn_vad_streaming\model.py", line 722, in inference

segments_i = self.forward(**batch)

File "H:\win-PatchUpdate-v3.86\_internal\funasr\models\fsmn_vad_streaming\model.py", line 564, in forward

self.DetectCommonFrames(cache=cache)

File "H:\win-PatchUpdate-v3.86\_internal\funasr\models\fsmn_vad_streaming\model.py", line 760, in DetectCommonFrames

frame_state = self.GetFrameState(

File "H:\win-PatchUpdate-v3.86\_internal\funasr\models\fsmn_vad_streaming\model.py", line 518, in GetFrameState

sum_score = cache["stats"].scores[0][t][cache["stats"].sil_pdf_ids[0]].item()

IndexError: index 1218 is out of bounds for dimension 0 with size 1218
2025-12-01 20:15:17,217 - ERROR - index 1218 is out of bounds for dimension 0 with size 1218
Traceback (most recent call last):
File "videotrans\recognition\_base.py", line 78, in run
File "videotrans\recognition\_funasr.py", line 57, in _exec
File "funasr\auto\auto_model.py", line 306, in generate

return self.inference_with_vad(input, input_len=input_len, **cfg)

File "funasr\auto\auto_model.py", line 383, in inference_with_vad

res = self.inference(

File "funasr\auto\auto_model.py", line 345, in inference

res = model.inference(**batch, **kwargs)

File "H:\win-PatchUpdate-v3.86\_internal\funasr\models\fsmn_vad_streaming\model.py", line 722, in inference

segments_i = self.forward(**batch)

File "H:\win-PatchUpdate-v3.86\_internal\funasr\models\fsmn_vad_streaming\model.py", line 564, in forward

self.DetectCommonFrames(cache=cache)

File "H:\win-PatchUpdate-v3.86\_internal\funasr\models\fsmn_vad_streaming\model.py", line 760, in DetectCommonFrames

frame_state = self.GetFrameState(

File "H:\win-PatchUpdate-v3.86\_internal\funasr\models\fsmn_vad_streaming\model.py", line 518, in GetFrameState

sum_score = cache["stats"].scores[0][t][cache["stats"].sil_pdf_ids[0]].item()

IndexError: index 1218 is out of bounds for dimension 0 with size 1218
ERROR:VideoTrans:index 1218 is out of bounds for dimension 0 with size 1218

Floor #3 DeepSeek AI reply at 3 months ago

错误明确发生在使用 阿里FunASR（本地） 进行语音识别时，是模型内部的一个数组越界错误。这通常与特定音频文件的处理或模型缓存状态有关。请按以下步骤解决：

清理 FunASR 模型缓存：
- 关闭软件。
- 前往目录 H:/win-PatchUpdate-v3.86/models/models/iic/。
- 删除 speech_fsmn_vad_zh-cn-16k-common-pytorch 和 punc_ct-transformer_cn-en-common-vocab471067-large 这两个文件夹。
- 重新启动软件，它会自动重新下载这两个模型。
更换识别引擎（临时解决方案）：
- 在主界面第2行“语音识别”中，将引擎从 阿里FunASR（本地） 切换为 faster-whisper（本地），并选择一个模型（如 large-v3）。
- 重新处理视频。这是最直接的绕过此错误的方法。
检查并预处理音频文件：
- 如果错误只针对某个特定文件，可能是该文件音频流有异常。
- 可以使用工具箱中的 音视频合并/分离 功能，先提取该视频的音频（WAV格式），然后用提取的音频文件进行识别测试。
如果问题持续：
- 请提供以下信息以便进一步定位：
  - 你正在处理的视频/音频文件的格式和大概时长。
  - 在高级选项中，VAD参数（语音阈值、静音分割等）是否做过自定义修改？如果有，请恢复为默认值。
  - 尝试一个时长很短（如1分钟以内）的测试文件，看错误是否依然出现。

首选建议：直接执行第2步，更换为 faster-whisper 引擎，这是最快且稳定的解决方案。

Floor #4 60.7* reply at 3 months ago

现在删完重新下载后，运行阿里funasr卡在了模型加载完毕，进入识别很久了

Floor #5 DeepSeek AI reply at 3 months ago

这是 FunASR 模型加载后，在识别阶段卡住或无响应的典型问题。通常与模型处理长音频时的内存、配置或特定文件有关。请按以下步骤排查和解决：强制终止并重启软件：首先，通过任务管理器彻底结束 pyVideoTrans 进程。然后重新启动软件。使用 faster-whisper 引擎（推荐方案）：这是最稳定、最快速的解决方案。在主界面第2行“语音识别”中，将引擎切换为 faster-whisper（本地），模型选择 large-v3 或 large-v2，识别模式选择整体识别。重新处理视频。faster-whisper 在大多数情况下识别效果和速度都很好。为 FunASR 调整处理方式（如果必须使用）：如果因中文识别效果必须使用 FunASR，请尝试：确保勾选了 CUDA加速（如果你有NVIDIA显卡）。在高级选项中，找到“Faster/OpenAI/Whisper 语音识别微调”部分，尝试勾选强制批量推理（如果该选项对FunASR生效，可能会改变处理模式）。处理一个短文件测试：找一个时长很短（如30秒）的音频或视频文件，用 FunASR 处理，看是否能顺利完成。如果短文件可以，长文件不行，则可能是内存不足或文件本身有问题。检查系统资源：打开任务管理器，在软件“卡住”时，观察内存和GPU内存的使用率是否接近100%。如果是，说明资源不足，只能使用 faster-whisper 或更换更小的模型。预处理音频文件：使用软件工具箱中的音视频合并/分离功能，将视频的音频提取为 WAV 格式，然后用这个 WAV 文件进行识别。有时容器格式会导致问题。核心建议：对于绝大多数用户，使用 faster-whisper（本地）是更稳定、更高效的选择，除非你对中文口语的识别准确率有极端要求。请优先尝试第2步。

#1713 index 1218 is out of bounds for dimension 0 with size 1218

5 Replies

Post Your Reply

Related resource