#2762 TaskCfg(cache_folder='C:/SOFT/pyvideotrans/tmp/11556/a4cf4fc431', target_dir='c:/soft/pyvideotrans/output/recogn', remov

240e:3b7* Posted at: 6 days ago 👁21

语音识别阶段出错 [阿里FunASR(本地)] 操作失败,请稍后重试()
Traceback (most recent call last):
File "videotrans\task\job.py", line 113, in run
File "videotrans\task\_speech2text.py", line 146, in recogn
File "videotrans\recognition\__init__.py", line 226, in run
File "videotrans\recognition\_base.py", line 80, in run
File "videotrans\recognition\_funasr.py", line 73, in _exec
File "funasr\auto\auto_model.py", line 311, in generate

return self.inference_with_vad(

File "funasr\auto\auto_model.py", line 581, in inference_with_vad

sv_output = postprocess(all_segments, None, labels, spk_embedding.cpu())

File "funasr\models\campplus\utils.py", line 117, in postprocess

assert len(segments) == len(labels)

AssertionError

TaskCfg(cache_folder='C:/SOFT/pyvideotrans/tmp/11556/a4cf4fc431', target_dir='c:/soft/pyvideotrans/output/recogn', remove_noise=False, is_separate=False, detect_language='zh-cn', subtitle_language=None, source_language=None, target_language=None, source_language_code=None, target_language_code=None, source_sub=None, target_sub='c:/soft/pyvideotrans/output/recogn/R_MIC_251227-022342.srt', source_wav=None, source_wav_output=None, target_wav=None, target_wav_output=None, subtitles=None, novoice_mp4=None, noextname='R_MIC_251227-022342', shibie_audio='C:/SOFT/pyvideotrans/tmp/11556/a4cf4fc431/R_MIC_251227-022342-1768957408.6017406.wav', targetdir_mp4=None, instrument=None, vocal=None, back_audio=None, background_music=None, app_mode='biaozhun', subtitle_type=0, volume='+0%', pitch='+0Hz', voice_rate='+0%', voice_role=None, copysrt_rawvideo=False, clear_cache=False, translate_type=None, tts_type=None, recogn_type=2, model_name='paraformer-zh', split_type=0, voice_autorate=False, video_autorate=False, cuda=True, name='K:/Recoder/251225-1230/R_MIC_251227-022342.mp3', basename='R_MIC_251227-022342.mp3', ext='mp3', dirname='K:/Recoder/251225-1230', shound_del_name=None, uuid='a4cf4fc431', remove_silent_mid=False, align_sub_audio=True, enable_diariz=True, nums_diariz=0, rephrase=0, only_out_mp4=False)


=
system:Windows-10-10.0.22631-SP0
version:v3.92
frozen:True
language:zh
root_dir:C:/SOFT/pyvideotrans

1 Replies

此错误是由于在启用“识别说话人”功能时,阿里FunASR模型内部在处理语音分段和说话人标签时出现数据长度不匹配导致的。这是一个已知的软件内部兼容性问题。

解决方案:

  1. 临时方案(推荐): 在语音识别阶段,不要勾选“识别说话人”选项。这是导致此错误的直接原因。取消勾选后,FunASR将仅进行语音转文字,可以正常工作。
  2. 备用方案: 如果您确实需要区分说话人,请暂时切换到其他支持说话人识别的语音识别渠道,例如 faster-whisper(本地) 并确保勾选了“识别说话人”选项。

操作步骤:

  1. 在主界面第2行“语音识别”区域,将渠道从“阿里FunASR(本地)”切换为“faster-whisper(本地)”。
  2. 选择模型(如 large-v3)。
  3. 务必勾选“识别说话人”
  4. 重新执行任务。

根本原因说明:
该错误 (AssertionError: assert len(segments) == len(labels)) 表明FunASR的说话人识别模块在处理某些特定音频时,生成的语音片段数量与对应的说话人标签数量不一致,导致程序断言失败。这通常与音频内容或FunASR模型版本有关。

请查看更多参考文档:

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update.

Related resource