#3054 TaskCfg(cache_folder='D:/Programs/win-pyvideotrans-v3.95/tmp/10352/826c83aa97', target_dir='D:/Programs/win-pyvideotrans

60.28* Posted at: 23 hours ago 👁14

语音识别阶段出错 [faster-whisper(本地)] Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 257, in faster_whisper
File "faster_whisper\transcribe.py", line 586, in _batched_segments_generator
File "faster_whisper\transcribe.py", line 120, in forward
File "faster_whisper\transcribe.py", line 209, in generate_segment_batched
File "faster_whisper\transcribe.py", line 1400, in encode
RuntimeError: CUDA failed with error out of memory

Traceback (most recent call last):
File "videotrans\task\job.py", line 106, in run
File "videotrans\task\_speech2text.py", line 156, in recogn
File "videotrans\recognition\__init__.py", line 276, in run
File "videotrans\recognition\_base.py", line 140, in run
File "videotrans\recognition\_overall.py", line 63, in _exec
File "videotrans\recognition\_overall.py", line 142, in _faster
File "videotrans\configure\_base.py", line 276, in _new_process
RuntimeError: Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 257, in faster_whisper
File "faster_whisper\transcribe.py", line 586, in _batched_segments_generator
File "faster_whisper\transcribe.py", line 120, in forward
File "faster_whisper\transcribe.py", line 209, in generate_segment_batched
File "faster_whisper\transcribe.py", line 1400, in encode
RuntimeError: CUDA failed with error out of memory
TaskCfg(cache_folder='D:/Programs/win-pyvideotrans-v3.95/tmp/10352/826c83aa97', target_dir='D:/Programs/win-pyvideotrans-v3.95/output/recogn', remove_noise=False, is_separate=False, detect_language='zh-cn', subtitle_language=None, source_language=None, target_language=None, source_language_code=None, target_language_code=None, source_sub=None, target_sub='D:/Programs/win-pyvideotrans-v3.95/output/recogn/88a3cac0b9b57dd2f73d886af2773323.srt', source_wav=None, source_wav_output=None, target_wav=None, target_wav_output=None, subtitles='', novoice_mp4=None, noextname='88a3cac0b9b57dd2f73d886af2773323', shibie_audio='D:/Programs/win-pyvideotrans-v3.95/tmp/10352/826c83aa97/88a3cac0b9b57dd2f73d886af2773323-1770082732.0964499.wav', targetdir_mp4=None, instrument=None, vocal=None, back_audio=None, background_music=None, app_mode='biaozhun', subtitle_type=0, volume='+0%', pitch='+0Hz', voice_rate='+0%', voice_role=None, copysrt_rawvideo=False, clear_cache=False, translate_type=None, tts_type=None, recogn_type=0, model_name='medium', voice_autorate=False, video_autorate=False, cuda=True, name='C:/Users/DAWN/Desktop/88a3cac0b9b57dd2f73d886af2773323.mp4', basename='88a3cac0b9b57dd2f73d886af2773323.mp4', ext='mp4', dirname='C:/Users/DAWN/Desktop', shound_del_name=None, uuid='826c83aa97', remove_silent_mid=False, align_sub_audio=True, enable_diariz=False, nums_diariz=0, rephrase=0, only_out_mp4=False, fix_punc=False, recogn2pass=False)


=
system:Windows-10-10.0.22621-SP0
version:v3.95
frozen:True
language:zh
root_dir:D:/Programs/win-pyvideotrans-v3.95

1 Replies

这是由于显存不足导致的。在使用 faster-whisper 本地模型进行语音识别时,CUDA 内存耗尽。

解决方案:

  1. 降低模型大小:在语音识别设置中,将模型从 medium 切换为更小的模型,例如 smallbasemedium 模型对显存要求较高。
  2. 启用量化:在语音识别设置的“高级选项”中,尝试开启 int8 量化,这可以显著减少显存占用。
  3. 调整识别参数:在“高级选项”中,降低 beam_size 参数(例如从默认的5降低到2或1),这可以减少计算量。
  4. 关闭其他占用显存的程序:确保没有其他大型程序(如游戏、其他AI软件)同时占用GPU显存。
  5. 重启软件:执行上述修改后,保存设置并重启 pyVideoTrans 软件,然后重试任务。

如果您的视频较长,也可以考虑先使用“语音转录”功能,将长视频分割成多个较短的音频片段分别处理。

请查阅相关文档:

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update.

Related resource

Similar issues already exist