#2351 TaskCfg(cache_folder='D:/Program Files/win-videotrans-v3.67-tiny/tmp/18764/speech2text', target_dir='E:/Doc/pyvideo/reco

77.83* Posted at: 4 hours ago 👁5

语音识别阶段出错:[阿里FunASR中文(本地)] 程序内部错误:expected Tensor as element 1 in argument 0, but got str:
Traceback (most recent call last):
File "videotrans\task\job.py", line 113, in run
File "videotrans\task\_speech2text.py", line 146, in recogn
File "videotrans\recognition\__init__.py", line 227, in run
File "videotrans\recognition\_base.py", line 80, in run
File "videotrans\recognition\_funasr.py", line 60, in _exec
File "funasr\auto\auto_model.py", line 306, in generate

return self.inference_with_vad(input, input_len=input_len, **cfg)

File "funasr\auto\auto_model.py", line 383, in inference_with_vad

res = self.inference(

File "funasr\auto\auto_model.py", line 345, in inference

res = model.inference(**batch, **kwargs)

File "D:\Program Files\win-videotrans-v3.67-tiny\_internal\funasr\models\fsmn_vad_streaming\model.py", line 690, in inference

audio_sample = torch.cat((cache["prev_samples"], audio_sample_list[0]))

TypeError: expected Tensor as element 1 in argument 0, but got str

TaskCfg(cache_folder='D:/Program Files/win-videotrans-v3.67-tiny/tmp/18764/speech2text', target_dir='E:/Doc/pyvideo/recogn', remove_noise=False, is_separate=False, detect_language='auto', subtitle_language=None, source_language=None, target_language=None, source_language_code=None, target_language_code=None, source_sub=None, target_sub='E:/Doc/pyvideo/recogn/为什么直到年底中国人才知道斩杀线这种东西?_音频.srt', source_wav=None, source_wav_output=None, target_wav=None, target_wav_output=None, subtitles=None, novoice_mp4=None, noextname='为什么直到年底中国人才知道斩杀线这种东西?_音频', shibie_audio='D:/Program Files/win-videotrans-v3.67-tiny/tmp/18764/speech2text/为什么直到年底中国人才知道斩杀线这种东西?_音频-1767011505.8705533.wav', targetdir_mp4=None, instrument=None, vocal=None, back_audio=None, background_music=None, app_mode='biaozhun', subtitle_type=0, volume='+0%', pitch='+0Hz', voice_rate='+0%', voice_role=None, copysrt_rawvideo=False, clear_cache=False, translate_type=None, tts_type=None, recogn_type=2, model_name='paraformer-zh', split_type=0, voice_autorate=False, video_autorate=False, cuda=True, name='E:/Downloads/为什么直到年底中国人才知道斩杀线这种东西?_音频.mp4', basename='为什么直到年底中国人才知道斩杀线这种东西?_音频.mp4', ext='mp4', dirname='E:/Downloads', shound_del_name=None, uuid='d133b29ee2', remove_silent_mid=False, align_sub_audio=True, enable_diariz=False, nums_diariz=0, rephrase=0, only_out_mp4=False)


=
system:Windows-10-10.0.19045-SP0
version:v3.90
frozen:True
language:zh
root_dir:D:/Program Files/win-videotrans-v3.67-tiny

1 Replies

这个错误是阿里FunASR本地语音识别模块的一个已知问题,通常与音频文件路径或格式有关。请按以下步骤解决:

  1. 移动软件位置:您的软件安装在 D:/Program Files/win-videotrans-v3.67-tiny,这是一个需要管理员权限的系统目录。请将整个软件文件夹剪切并粘贴到非系统盘的纯英文路径下,例如 D:/videotransE:/videotrans。这是首要且最关键的步骤。
  2. 简化音频文件名:将您要识别的音频文件 为什么直到年底中国人才知道斩杀线这种东西?_音频.mp4 重命名为简单的英文名,例如 test.mp4,并放在一个简单的路径下(如 E:/test.mp4),然后再次尝试。
  3. 检查音频格式:确保您的音频/视频文件是标准格式。如果问题依旧,可以尝试使用工具箱中的“音视频合并/分离”功能,将音频提取为 .wav 格式,再用这个 .wav 文件进行识别。
  4. 临时更换识别渠道:如果急需使用,可以暂时切换到其他语音识别渠道,如 faster-whisper(本地)

完成第一步后,绝大多数类似错误都会消失。如果问题仍然存在,请提供新的完整错误日志。

相关文档链接:常见问题

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update.

Donate: https://ko-fi.com/jianchang512

Trending Questions