#3294 TaskCfg(cache_folder='D:/win-pyvideotrans/tmp/6604/d53264c2a9', target_dir='D:/win-pyvideotrans/output/recogn', remove_n

103.151* Posted at: 1 day ago 👁17

语音识别阶段出错 [faster-whisper(本地)] Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 179, in faster_whisper
File "faster_whisper\transcribe.py", line 689, in init
RuntimeError: CUDA failed with error out of memory

Traceback (most recent call last):
File "videotrans\task\job.py", line 106, in run
File "videotrans\task\_speech2text.py", line 156, in recogn
File "videotrans\recognition\__init__.py", line 282, in run
File "videotrans\recognition\_base.py", line 141, in run
File "videotrans\recognition\_overall.py", line 63, in _exec
File "videotrans\recognition\_overall.py", line 142, in _faster
File "videotrans\configure\_base.py", line 276, in _new_process
RuntimeError: Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 179, in faster_whisper
File "faster_whisper\transcribe.py", line 689, in init
RuntimeError: CUDA failed with error out of memory
TaskCfg(cache_folder='D:/win-pyvideotrans/tmp/6604/d53264c2a9', target_dir='D:/win-pyvideotrans/output/recogn', remove_noise=True, is_separate=False, detect_language='zh-cn', subtitle_language=None, source_language=None, target_language=None, source_language_code=None, target_language_code=None, source_sub=None, target_sub='D:/win-pyvideotrans/output/recogn/魔尊大师姐驾到,统统闪开!.srt', source_wav=None, source_wav_output=None, target_wav=None, target_wav_output=None, subtitles='', novoice_mp4=None, noextname='魔尊大师姐驾到,统统闪开!', shibie_audio='D:/win-pyvideotrans/tmp/6604/d53264c2a9/removed_noise_1771494925.9226604.wav', targetdir_mp4=None, instrument=None, vocal=None, back_audio=None, background_music=None, app_mode='biaozhun', subtitle_type=0, volume='+0%', pitch='+0Hz', voice_rate='+0%', voice_role=None, copysrt_rawvideo=False, clear_cache=False, translate_type=None, tts_type=None, recogn_type=0, model_name='large-v3', voice_autorate=False, video_autorate=False, cuda=True, name='U:/shuchu/魔尊大师姐驾到,统统闪开!.mp4', basename='魔尊大师姐驾到,统统闪开!.mp4', ext='mp4', dirname='U:/shuchu', shound_del_name=None, uuid='d53264c2a9', remove_silent_mid=False, align_sub_audio=True, enable_diariz=True, nums_diariz=0, rephrase=0, only_out_mp4=False, fix_punc=False, recogn2pass=False)


=
system:Windows-10-10.0.19045-SP0
version:v3.96
frozen:True
language:zh
root_dir:D:/win-pyvideotrans

1 Replies

这是由于显存不足导致的。在使用 faster-whisper 的 large-v3 模型进行语音识别时,GPU 显存耗尽。

解决方案:

  1. 降低模型精度:在“语音识别参数”设置中,将“模型精度”从 float16 改为 int8。这能显著减少显存占用,但可能略微影响识别精度。
  2. 使用更小的模型:将“模型名称”从 large-v3 改为 mediumsmall。模型越小,显存占用和速度越快,但中文识别率会相应降低。
  3. 关闭 GPU 加速:在“语音识别参数”设置中,取消勾选“使用 CUDA 加速”。这将强制使用 CPU 进行识别,速度会变慢,但不会占用显存。
  4. 关闭其他占用显存的程序:确保没有其他大型程序(如游戏、其他AI软件)在后台运行,以释放显存。
  5. 调整视频处理方式:如果视频很长,可以尝试在“高级选项”中启用“整体识别”模式,这可能会改变内存/显存的分配方式。

请查阅相关文档:

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update.

Related resource