#2593 TaskCfg(cache_folder='D:/Program Files/win-videotrans-v3.67-tiny/tmp/27028/speech2text', target_dir='E:/Doc/pyvideo/reco

103.196* Posted at: 1 month ago 👁78

语音识别阶段出错:[阿里FunASR中文(本地)] 程序内部错误:expected Tensor as element 1 in argument 0, but got str:
Traceback (most recent call last):
File "videotrans\task\job.py", line 113, in run
File "videotrans\task\_speech2text.py", line 146, in recogn
File "videotrans\recognition\__init__.py", line 227, in run
File "videotrans\recognition\_base.py", line 80, in run
File "videotrans\recognition\_funasr.py", line 60, in _exec
File "funasr\auto\auto_model.py", line 306, in generate

return self.inference_with_vad(input, input_len=input_len, **cfg)

File "funasr\auto\auto_model.py", line 383, in inference_with_vad

res = self.inference(

File "funasr\auto\auto_model.py", line 345, in inference

res = model.inference(**batch, **kwargs)

File "D:\Program Files\win-videotrans-v3.67-tiny\_internal\funasr\models\fsmn_vad_streaming\model.py", line 690, in inference

audio_sample = torch.cat((cache["prev_samples"], audio_sample_list[0]))

TypeError: expected Tensor as element 1 in argument 0, but got str

TaskCfg(cache_folder='D:/Program Files/win-videotrans-v3.67-tiny/tmp/27028/speech2text', target_dir='E:/Doc/pyvideo/recogn', remove_noise=False, is_separate=False, detect_language='zh-cn', subtitle_language=None, source_language=None, target_language=None, source_language_code=None, target_language_code=None, source_sub=None, target_sub='E:/Doc/pyvideo/recogn/《从0到1》Zero to One - 彼得·蒂尔.srt', source_wav=None, source_wav_output=None, target_wav=None, target_wav_output=None, subtitles=None, novoice_mp4=None, noextname='《从0到1》Zero to One - 彼得·蒂尔', shibie_audio='D:/Program Files/win-videotrans-v3.67-tiny/tmp/27028/speech2text/《从0到1》Zero to One - 彼得·蒂尔-1768127361.4457128.wav', targetdir_mp4=None, instrument=None, vocal=None, back_audio=None, background_music=None, app_mode='biaozhun', subtitle_type=0, volume='+0%', pitch='+0Hz', voice_rate='+0%', voice_role=None, copysrt_rawvideo=False, clear_cache=False, translate_type=None, tts_type=None, recogn_type=2, model_name='paraformer-zh', split_type=0, voice_autorate=False, video_autorate=False, cuda=True, name='E:/Downloads/《从0到1》Zero to One - 彼得·蒂尔.mp4', basename='《从0到1》Zero to One - 彼得·蒂尔.mp4', ext='mp4', dirname='E:/Downloads', shound_del_name=None, uuid='88193bbeb5', remove_silent_mid=False, align_sub_audio=True, enable_diariz=False, nums_diariz=0, rephrase=0, only_out_mp4=False)


=
system:Windows-10-10.0.19045-SP0
version:v3.90
frozen:True
language:zh
root_dir:D:/Program Files/win-videotrans-v3.67-tiny

1 Replies

这个错误是阿里 FunASR 本地语音识别模块在处理特定音频文件时出现的内部异常。根据错误信息 expected Tensor as element 1 in argument 0, but got str,问题可能出在音频预处理或模型加载环节。

请按以下步骤尝试解决:

  1. 检查音频文件:确保待识别的音频/视频文件本身没有损坏,且包含有效的人声。
  2. 更换识别模式:在主界面第2行“语音识别”设置中,将“识别模式”从“整体识别”切换为“批量推理”,或反之。
  3. 更新或重新下载模型

    • 如果之前成功使用过 FunASR,可以尝试在“设置”->“模型下载”中,重新下载 paraformer-zh 模型。
    • 如果首次使用,请确保网络通畅,模型能完整下载。
  4. 使用其他识别渠道:作为临时替代方案,可以切换到 faster-whisper(本地) 模型(如 mediumlarge-v2)进行识别。
  5. 检查软件安装路径:您的软件安装在 D:/Program Files/...,这属于系统受保护目录,可能引起权限问题。强烈建议将整个软件文件夹移动到没有空格和特殊字符的非系统盘根目录,例如 D:/videotrans,然后重新尝试。

如果以上步骤均无效,请提供 videotrans/logs/ 目录下最新的日志文件,以便进一步分析。

请查看更多参考文档:

Post Your Reply