#5050 TaskCfgVTT(uuid='fd51a8beac', name='D:/BaiduNetdiskDownload/Blender零基础全能风格化专辑特训营【第二期】/56.mp4', dirname='D:/BaiduNetdiskD

2405:4802* Posted at: 1 hour ago

ASR Error[Qwen-ASR(Local)] CUDA out of memory. Tried to allocate 10.83 GiB. GPU 0 has a total capacity of 12.00 GiB of which 0 bytes is free. Of the allocated memory 16.11 GiB is allocated by PyTorch, and 577.33 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables):Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 549, in qwen3asr_fun
File "torch\utils\_contextlib.py", line 116, in decorate_context

return func(*args, **kwargs)

File "E:\4.01\_internal\qwen_asr\inference\qwen3_asr.py", line 383, in transcribe

raw_outputs = self._infer_asr(chunk_ctx, chunk_wavs, chunk_lang)

File "E:\4.01\_internal\qwen_asr\inference\qwen3_asr.py", line 485, in _infer_asr

return self._infer_asr_transformers(contexts, wavs, 

......
pl

return self._call_impl(*args, **kwargs)

File "torch\nn\modules\module.py", line 1762, in _call_impl

return forward_call(*args, **kwargs)

File "transformers\utils\generic.py", line 1072, in wrapper

outputs = func(self, *args, **kwargs)

File "E:\4.01\_internal\qwen_asr\core\transformers_backend\modeling_qwen3_asr.py", line 1043, in forward

layer_outputs = decoder_layer(

File "transformers\modeling_layers.py", line 94, in call

return super().__call__(*args, **kwargs)

File "torch\nn\modules\module.py", line 1751, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

File "torch\nn\modules\module.py", line 1762, in _call_impl

return forward_call(*args, **kwargs)

File "transformers\utils\deprecation.py", line 172, in wrapped_func

return func(*args, **kwargs)

File "E:\4.01\_internal\qwen_asr\core\transformers_backend\modeling_qwen3_asr.py", line 262, in forward

hidden_states, _ = self.self_attn(

File "torch\nn\modules\module.py", line 1751, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

File "torch\nn\modules\module.py", line 1762, in _call_impl

return forward_call(*args, **kwargs)

File "transformers\utils\deprecation.py", line 172, in wrapped_func

return func(*args, **kwargs)

File "E:\4.01\_internal\qwen_asr\core\transformers_backend\modeling_qwen3_asr.py", line 204, in forward

attn_output, attn_weights = attention_interface(

File "transformers\integrations\sdpa_attention.py", line 96, in sdpa_attention_forward

attn_output = torch.nn.functional.scaled_dot_product_attention(

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 10.83 GiB. GPU 0 has a total capacity of 12.00 GiB of which 0 bytes is free. Of the allocated memory 16.11 GiB is allocated by PyTorch, and 577.33 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
TaskCfgVTT(uuid='fd51a8beac', name='D:/BaiduNetdiskDownload/Blender零基础全能风格化专辑特训营【第二期】/56.mp4', dirname='D:/BaiduNetdiskDownload/Blender零基础全能风格化专辑特训营【第二期】', noextname='56', basename='56.mp4', ext='mp4', target_dir='D:/BaiduNetdiskDownload/Blender零基础全能风格化专辑特训营【第二期】/_video_out/56-mp4', cache_folder='E:/4.01/tmp/21872/fd51a8beac', is_cuda=True, source_language='Simplified Chinese', source_language_code='zh-cn', source_sub='D:/BaiduNetdiskDownload/Blender零基础全能风格化专辑特训营【第二期】/_video_out/56-mp4/zh-cn.srt', source_wav='E:/4.01/tmp/21872/fd51a8beac/zh-cn.wav', source_wav_output='D:/BaiduNetdiskDownload/Blender零基础全能风格化专辑特训营【第二期】/_video_out/56-mp4/zh-cn.m4a', target_language='English', target_language_code='en', target_sub='D:/BaiduNetdiskDownload/Blender零基础全能风格化专辑特训营【第二期】/_video_out/56-mp4/en.srt', target_wav='E:/4.01/tmp/21872/fd51a8beac/target.wav', target_wav_output='D:/BaiduNetdiskDownload/Blender零基础全能风格化专辑特训营【第二期】/_video_out/56-mp4/en.m4a', translate_type=8, tts_type=0, volume='+0%', pitch='+0Hz', voice_rate='+0%', voice_role='No', voice_autorate=False, video_autorate=True, remove_silent_mid=False, align_sub_audio=True, detect_language='zh-cn', recogn_type=2, model_name='1.7B', shibie_audio=None, remove_noise=False, enable_diariz=False, nums_diariz=0, rephrase=0, fix_punc=False, subtitle_language=None, app_mode='tiqu', subtitles='', targetdir_mp4='D:/BaiduNetdiskDownload/Blender零基础全能风格化专辑特训营【第二期】/_video_out/56-mp4/56.mp4', novoice_mp4='E:/4.01/tmp/21872/fd51a8beac/novoice.mp4', is_separate=False, embed_bgm=True, instrument=None, vocal='E:/4.01/tmp/21872/fd51a8beac/vocal.wav', clear_cache=True, background_music='', subtitle_type=0, only_out_mp4=False, recogn2pass=False, output_srt=0, copysrt_rawvideo=False, loop_backaudio=0, backaudio_volume=0.8)


=
system:Windows-10-10.0.26200-SP0
version:v4.01
frozen:True
language:en
root_dir:E:/4.01

1 Replies

CUDA Out of Memory on 12GB GPU for Qwen-ASR (Local) 1.7B

This is a VRAM exhaustion issue. Your GPU (12 GiB total) does not have enough free memory to run the Qwen-ASR 1.7B model. The error shows that PyTorch already allocated 16.11 GiB (due to fragmentation and 0 bytes free), and the model tried to allocate another 10.83 GiB—which is impossible on a 12GB card.

Why this happens:

  • Qwen-ASR 1.7B model itself requires roughly 8–10 GB of VRAM just for inference, plus additional memory for audio processing, intermediate tensors, and other GPU tasks.
  • With only 12 GiB total, there is no headroom left after the model is loaded, especially if other applications or previous tasks still occupy VRAM.

Solutions (try in order):

  1. Switch to a lighter ASR model
    Qwen-ASR 1.7B is too large for a 12GB GPU. Use Faster-Whisper (Local) with large-v3 model instead. It is more memory-efficient and still provides excellent accuracy.

    • Change the 语音识别 drop‑down to faster-whisper(本地) and select large-v3.
  2. Reduce VRAM pressure via environment variable
    As suggested in the error log, set the environment variable before starting the software to reduce fragmentation:
    set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
    You can also add this to your system environment variables permanently.
  3. Close all other GPU‑intensive applications
    Before running pyVideoTrans, close any browsers, games, or other programs that might be using the GPU (e.g., video players, virtual machines, or other AI tools).
  4. If you must keep Qwen-ASR:

    • Try using a smaller model variant (if available) – Qwen-ASR might offer a 0.5B or other lightweight version.
    • In 菜单 → 工具 → 高级选项 → 语音识别参数, set CUDA数据类型 to int8 (lower precision, less VRAM).
    • Lower beam_size and best_of to 1 (though these settings primarily affect Whisper, they may help general memory usage).
  5. Upgrade your GPU (long-term)
    A 12GB card is borderline for Qwen-ASR 1.7B. For comfortable local ASR with large models, 16GB or more VRAM is recommended.

Please refer to the relevant documentation:

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update. Dondate: ko-fi.com/jianchang512

Related resource