#1712 TaskCfg(cache_folder='H:/win-PatchUpdate-v3.86/tmp10352/speech2text', target_dir='h:/xiazai/out/recogn', remove_noise=Fa

60.7* Posted at: 3 months ago 👁112

语音识别阶段出错:[阿里FunASR中文(本地)] 处理列表或序列时索引越界:index 1247 is out of bounds for dimension 0 with size 1247:
Traceback (most recent call last):
File "videotrans\task\job.py", line 113, in run
File "videotrans\task\_speech2text.py", line 140, in recogn
File "videotrans\recognition\__init__.py", line 224, in run
File "videotrans\recognition\_base.py", line 78, in run
File "videotrans\recognition\_funasr.py", line 57, in _exec
File "funasr\auto\auto_model.py", line 306, in generate

return self.inference_with_vad(input, input_len=input_len, **cfg)

File "funasr\auto\auto_model.py", line 383, in inference_with_vad

res = self.inference(

File "funasr\auto\auto_model.py", line 345, in inference

res = model.inference(**batch, **kwargs)

File "H:\win-PatchUpdate-v3.86\_internal\funasr\models\fsmn_vad_streaming\model.py", line 722, in inference

segments_i = self.forward(**batch)

File "H:\win-PatchUpdate-v3.86\_internal\funasr\models\fsmn_vad_streaming\model.py", line 564, in forward

self.DetectCommonFrames(cache=cache)

File "H:\win-PatchUpdate-v3.86\_internal\funasr\models\fsmn_vad_streaming\model.py", line 760, in DetectCommonFrames

frame_state = self.GetFrameState(

File "H:\win-PatchUpdate-v3.86\_internal\funasr\models\fsmn_vad_streaming\model.py", line 518, in GetFrameState

sum_score = cache["stats"].scores[0][t][cache["stats"].sil_pdf_ids[0]].item()

IndexError: index 1247 is out of bounds for dimension 0 with size 1247

TaskCfg(cache_folder='H:/win-PatchUpdate-v3.86/tmp10352/speech2text', target_dir='h:/xiazai/out/recogn', remove_noise=False, is_separate=False, detect_language='zh-cn', subtitle_language=None, source_language=None, target_language=None, source_language_code=None, target_language_code=None, source_sub=None, target_sub='h:/xiazai/out/recogn/zh-cn.srt', source_wav=None, source_wav_output=None, target_wav=None, target_wav_output=None, subtitles=None, novoice_mp4=None, noextname='zh-cn', shibie_audio='H:/win-PatchUpdate-v3.86/tmp10352/speech2text/zh-cn-1764590888.3198414.wav', targetdir_mp4=None, instrument=None, vocal=None, back_audio=None, background_music=None, app_mode='biaozhun', subtitle_type=0, volume='+0%', pitch='+0Hz', voice_rate='+0%', voice_role=None, copysrt_rawvideo=False, clear_cache=False, translate_type=None, tts_type=None, recogn_type=2, model_name='paraformer-zh', split_type=0, voice_autorate=False, video_autorate=False, cuda=True, name="H:/xiazai/_video_out/Why Carl Jung's Ideas Still Haunt Us Bedtime History Stories Documentary/zh-cn.wav", basename='zh-cn.wav', ext='wav', dirname="H:/xiazai/_video_out/Why Carl Jung's Ideas Still Haunt Us Bedtime History Stories Documentary", shound_del_name=None, uuid='e346a6ce8d', remove_silent_mid=False, align_sub_audio=True, enable_diariz=False, nums_diariz=0, auto_fix=True)

=
system:Windows-10-10.0.26200-SP0
version:v3.86
frozen:True
language:zh
root_dir:H:/win-PatchUpdate-v3.86

Floor #1 DeepSeek AI reply at 3 months ago

这是一个 FunASR 语音识别模型在处理特定音频时出现的内部索引越界错误。这通常与音频文件本身或模型在处理长音频时的缓存机制有关。

请按以下步骤尝试解决：

切换识别模型：这是最直接的解决方法。在主界面第2行“语音识别”中，将模型从“阿里FunASR中文(本地)”更换为“faster-whisper(本地)”，并选择一个合适的模型（如 large-v3 或 medium）。Faster-whisper 的稳定性通常更好。
检查并预处理音频：
- 确保待识别的音频文件（zh-cn.wav）是完整的，没有损坏。
- 可以尝试使用工具箱中的“音视频合并/分离”功能，重新提取一次视频的音频，然后用新提取的音频进行识别。
调整识别参数（如果必须使用FunASR）：
- 在“高级选项”中，找到“Faster/OpenAI/Whisper 语音识别微调”区域（部分参数对FunASR也有效）。
- 尝试调整 VAD参数：
  - 将“语音阈值”从默认的 0.45 适当调高（例如 0.5 或 0.55），降低灵敏度。
  - 将“静音分割持续毫秒”从默认的 140 调大（例如 200 或 300），让切分更宽松。
- 勾选“强制批量推理”模式，可能会绕过导致错误的流式处理逻辑。
简化处理流程：
- 在识别阶段，暂时关闭“降噪”、“人声分离”等高级功能，以排除干扰。

如果以上方法均无效，请提供 H:/win-PatchUpdate-v3.86/videotrans/logs/ 目录下最新的日志文件，以便进一步分析错误发生的具体上下文。

#1712 TaskCfg(cache_folder='H:/win-PatchUpdate-v3.86/tmp10352/speech2text', target_dir='h:/xiazai/out/recogn', remove_noise=Fa

1 Replies

Post Your Reply

Related resource