#3792 TaskCfgSTT(is_cuda=True, uuid='a450915a17', cache_folder='E:/ytrans/tmp/23088/a450915a17', target_dir='E:/ytrans/output/

202.150* Posted at: 1 hour ago 👁4

ASR Error [faster-whisper (Local)] Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 257, in faster_whisper
File "faster_whisper\transcribe.py", line 586, in _batched_segments_generator
File "faster_whisper\transcribe.py", line 120, in forward
File "faster_whisper\transcribe.py", line 209, in generate_segment_batched
File "faster_whisper\transcribe.py", line 1400, in encode
RuntimeError: CUDA failed with error out of memory

Traceback (most recent call last):
File "videotrans\task\job.py", line 106, in run
File "videotrans\task\_speech2text.py", line 153, in recogn
File "videotrans\recognition\__init__.py", line 266, in run
File "videotrans\recognition\_base.py", line 142, in run
File "videotrans\recognition\_overall.py", line 63, in _exec
File "videotrans\recognition\_overall.py", line 136, in _faster
File "videotrans\configure\_base.py", line 290, in _new_process
RuntimeError: Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 257, in faster_whisper
File "faster_whisper\transcribe.py", line 586, in _batched_segments_generator
File "faster_whisper\transcribe.py", line 120, in forward
File "faster_whisper\transcribe.py", line 209, in generate_segment_batched
File "faster_whisper\transcribe.py", line 1400, in encode
RuntimeError: CUDA failed with error out of memory
TaskCfgSTT(is_cuda=True, uuid='a450915a17', cache_folder='E:/ytrans/tmp/23088/a450915a17', target_dir='E:/ytrans/output/recogn', source_language=None, source_language_code=None, source_sub=None, source_wav=None, source_wav_output=None, target_language=None, target_language_code=None, target_sub='E:/ytrans/output/recogn/6.srt', target_wav=None, target_wav_output=None, name='E:/PH/6.m4a', noextname='6', basename='6.m4a', ext='m4a', dirname='E:/PH', shound_del_name=None, detect_language='ko', recogn_type=0, model_name='large-v3-turbo', shibie_audio='E:/ytrans/tmp/23088/a450915a17/6-1774422403.8745005.wav', remove_noise=False, enable_diariz=False, nums_diariz=0, rephrase=1, fix_punc=False)


=
system:Windows-10-10.0.26200-SP0
version:v3.97
frozen:True
language:en
root_dir:E:/ytrans

1 Replies

This error is caused by insufficient GPU memory (VRAM) when using the faster-whisper model with CUDA acceleration.

Solution Steps:

  1. Use a smaller model: In the "Speech Recognition" section, change the model from large-v3-turbo to a smaller one like medium, small, or base.
  2. Adjust advanced settings (to reduce VRAM usage):

    • Go to Menu -> Tools/Options -> Advanced Options.
    • In the "Speech Recognition Parameters" section, make the following changes:

      • CUDA Data Type: Change from float32 to float16 or int8.
      • Recognition Accuracy beam_size: Change from 5 to 1.
      • Recognition Accuracy best_of: Change from 5 to 1.
      • Enable Context Awareness: Set to false.
  3. Ensure only one GPU task is running: In the same "Advanced Options" under "General Settings", set GPU Concurrent Tasks [Restart Required] to 1. Restart the software after changing this.
  4. If translating multiple videos: Disable the "Secondary Recognition" option in the main interface's top-right corner. Processing secondary recognition for previous videos concurrently with the first transcription of later videos can require 2-3 times the VRAM.

If the above steps do not resolve the issue, your GPU's VRAM may be insufficient for the selected task. Consider processing shorter audio segments or using CPU-only mode (uncheck "CUDA Acceleration").

Please refer to the relevant documentation:

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update.

Related resource