#1914 TaskCfg(cache_folder='E:/vtrans/tmp/29528/speech2text', target_dir='E:/vtrans/output/recogn', remove_noise=False, is_sep

202.150* Posted at: 3 months ago 👁119

ASR Error:[FunASR-CN (Local)] expected Tensor as element 1 in argument 0, but got str:
Traceback (most recent call last):
File "videotrans\task\job.py", line 113, in run
File "videotrans\task\_speech2text.py", line 146, in recogn
File "videotrans\recognition\__init__.py", line 231, in run
File "videotrans\recognition\_base.py", line 78, in run
File "videotrans\recognition\_funasr.py", line 57, in _exec
File "funasr\auto\auto_model.py", line 306, in generate

return self.inference_with_vad(input, input_len=input_len, **cfg)

File "funasr\auto\auto_model.py", line 383, in inference_with_vad

res = self.inference(

File "funasr\auto\auto_model.py", line 345, in inference

res = model.inference(**batch, **kwargs)

File "E:\vtrans\_internal\funasr\models\fsmn_vad_streaming\model.py", line 690, in inference

audio_sample = torch.cat((cache["prev_samples"], audio_sample_list[0]))

TypeError: expected Tensor as element 1 in argument 0, but got str

TaskCfg(cache_folder='E:/vtrans/tmp/29528/speech2text', target_dir='E:/vtrans/output/recogn', remove_noise=False, is_separate=False, detect_language='zh-cn', subtitle_language=None, source_language=None, target_language=None, source_language_code=None, target_language_code=None, source_sub=None, target_sub='E:/vtrans/output/recogn/002.srt', source_wav=None, source_wav_output=None, target_wav=None, target_wav_output=None, subtitles=None, novoice_mp4=None, noextname='002', shibie_audio='E:/vtrans/tmp/29528/speech2text/002-1765345908.9103088.wav', targetdir_mp4=None, instrument=None, vocal=None, back_audio=None, background_music=None, app_mode='biaozhun', subtitle_type=0, volume='+0%', pitch='+0Hz', voice_rate='+0%', voice_role=None, copysrt_rawvideo=False, clear_cache=False, translate_type=None, tts_type=None, recogn_type=2, model_name='paraformer-zh', split_type=0, voice_autorate=False, video_autorate=False, cuda=True, name='E:/2023/personal/inspo2/CN Course/Kana/W3/002.m4a', basename='002.m4a', ext='m4a', dirname='E:/2023/personal/inspo2/CN Course/Kana/W3', shound_del_name=None, uuid='d8eef384ee', remove_silent_mid=False, align_sub_audio=True, enable_diariz=False, nums_diariz=0, rephrase=0, only_out_mp4=False)

=
system:Windows-10-10.0.26100-SP0
version:v3.89
frozen:True
language:en
root_dir:E:/vtrans

Floor #1 DeepSeek AI reply at 3 months ago

The error indicates a data type mismatch in the FunASR local model processing. The specific error expected Tensor as element 1 in argument 0, but got str suggests the audio data passed to the model is not in the correct format.

Primary Solution:
This is likely caused by a corrupted or incompatible audio file generated during the preprocessing stage. Please follow these steps:

Clear Cache and Retry: On the main interface, ensure the "Clean generated" checkbox is selected before starting the task. This forces the software to regenerate all intermediate files, including the audio file for recognition.
Check Input File: Verify that your source file 002.m4a is not corrupted. Try playing it in a standard media player.
Use Alternative ASR Model: As a temporary workaround, switch to the "faster-whisper (local)" model in the ASR dropdown menu. It is more stable for general use.
Reinstall/Update Dependencies (Advanced): If the issue persists specifically with FunASR, it might be a library conflict. Since you are using the pre-packaged Windows version (frozen:True), the cleanest solution is to:
- Backup your config.ini file from the software directory.
- Download the latest complete package again from the official website.
- Extract it to a new, clean folder with a full English path (e.g., D:\videotrans_new).
- Copy your backed-up config.ini to the new folder to restore your settings.

If the problem continues after these steps, please provide the complete log file from E:/vtrans/logs/ for further analysis.

#1914 TaskCfg(cache_folder='E:/vtrans/tmp/29528/speech2text', target_dir='E:/vtrans/output/recogn', remove_noise=False, is_sep

1 Replies

Post Your Reply

Related resource