#5618 语音识别阶段出错[阿里FunASR(本地)(本地内置)] CUDA out of memory. Tried to allocate 3.42 GiB. GPU 0 has a total capacity of 7.96 GiB of w

15.235* Posted at: 4 hours ago

语音识别阶段出错[阿里FunASR(本地)(本地内置)] CUDA out of memory. Tried to allocate 3.42 GiB. GPU 0 has a total capacity of 7.96 GiB of which 0 bytes is free. Of the allocated memory 10.50 GiB is allocated by PyTorch, and 1.65 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables):Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 636, in funasr_mlt
File "funasr\auto\auto_model.py", line 324, in generate

return self.inference(

File "funasr\auto\auto_model.py", line 381, in inference

res = model.inference(**batch, **kwargs)

File "D:\Tool/win-pyvideotrans-v4.03-0622/videotrans/codes\model.py", line 610, in inference

return self.inference_llm(

File "D:\Tool/win-pyvideotrans-v4.03-0622/videotrans/codes\model.py", line 628,
......
task

File "videotrans\task\trans_create.py", line 319, in recogn

File "videotrans\recognition\__init__.py", line 191, in run

File "videotrans\recognition\_base.py", line 90, in run

File "videotrans\recognition\_funasr.py", line 61, in _exec

File "videotrans\configure\base.py", line 272, in _new_process

videotrans.configure.excepts.VideoTransError: CUDA out of memory. Tried to allocate 3.42 GiB. GPU 0 has a total capacity of 7.96 GiB of which 0 bytes is free. Of the allocated memory 10.50 GiB is allocated by PyTorch, and 1.65 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables):Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 636, in funasr_mlt
File "funasr\auto\auto_model.py", line 324, in generate

return self.inference(

File "funasr\auto\auto_model.py", line 381, in inference

res = model.inference(**batch, **kwargs)

File "D:\Tool/win-pyvideotrans-v4.03-0622/videotrans/codes\model.py", line 610, in inference

return self.inference_llm(

File "D:\Tool/win-pyvideotrans-v4.03-0622/videotrans/codes\model.py", line 628, in inference_llm

inputs_embeds, contents, batch, source_ids, meta_data = self.inference_prepare(

File "D:\Tool/win-pyvideotrans-v4.03-0622/videotrans/codes\model.py", line 496, in inference_prepare

encoder_out, encoder_out_lens = self.audio_adaptor(

File "torch\nn\modules\module.py", line 1751, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

File "torch\nn\modules\module.py", line 1762, in _call_impl

return forward_call(*args, **kwargs)

File "D:\Tool\win-pyvideotrans-v4.03-0622\_internal\funasr\models\llm_asr\adaptor.py", line 153, in forward

x, masks = block(x, masks)

File "torch\nn\modules\module.py", line 1751, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

File "torch\nn\modules\module.py", line 1762, in _call_impl

return forward_call(*args, **kwargs)

File "D:\Tool\win-pyvideotrans-v4.03-0622\_internal\funasr\models\transformer\encoder.py", line 122, in forward

x = residual + stoch_layer_coeff * self.dropout(self.self_attn(x_q, x, x, mask))

File "torch\nn\modules\module.py", line 1751, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

File "torch\nn\modules\module.py", line 1762, in _call_impl

return forward_call(*args, **kwargs)

File "D:\Tool\win-pyvideotrans-v4.03-0622\_internal\funasr\models\transformer\attention.py", line 120, in forward

return self.forward_attention(v, scores, mask)

File "D:\Tool\win-pyvideotrans-v4.03-0622\_internal\funasr\models\transformer\attention.py", line 90, in forward_attention

attn = torch.softmax(scores, dim=-1).masked_fill(

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.42 GiB. GPU 0 has a total capacity of 7.96 GiB of which 0 bytes is free. Of the allocated memory 10.50 GiB is allocated by PyTorch, and 1.65 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
cfg=[TaskCfgVTT]当前工作模式: 转录并翻译字幕
原始输入文件名: G:/七天(1)/2020.10.8(第一场)极品00后小萝莉奶声奶气,苗条清新沙发啪啪激情无限.mp4,
输出结果保存到文件夹: G:/七天(1)/_video_out/2020.10.8(第一场)极品00后小萝莉奶声奶气,苗条清新沙发啪啪激情无限-mp4,
临时文件夹: D:/Tool/win-pyvideotrans-v4.03-0622/tmp/23908/31ed3580c8
已选 清理已存在
已选 启用CUDA加速
未选 降噪
语音识别:阿里FunASR(本地)(本地内置), model_name: Fun-ASR-Nano-2512, 发音语言: 简体中文, 断句方式:默认断句
发音语言和目标语言相同,不翻译字幕
已选 将生成的字幕复制到视频目录下
输出字幕格式: 单字幕
代理地址:


=
system:Windows-10-10.0.26200-SP0
version:v4.03
frozen:True
language:zh
root_dir:D:/Tool/win-pyvideotrans-v4.03-0622

1 Replies

这是由于显卡显存不足导致的。你的 GPU 0 总容量约 8GB,但 FunASR 模型和 PyTorch 已分配了 10.5 GB(包含部分预留),导致无法继续分配。

解决方法(按推荐顺序尝试):

  1. 切换为识音效果相近但显存占用低的本地渠道
    在「ASR Channel」中选择 faster-whisper(本地内置),模型选 large-v3-turbomedium。这两个模型对 8GB 显存更友好。
  2. 若必须使用 FunASR,降低其显存消耗
    进入 菜单 → 工具 → 高级选项 → 语音识别参数,将「计算数据类型」从 float32 改为 int8(最省显存)或 float16。同时将「识别准确度 beam_size」和「best_of」都设为 1。
  3. 临时关闭 CUDA 加速,使用 CPU 处理
    在主界面取消勾选「CUDA加速」,用 CPU 运行。速度会慢很多,但可避免显存不足。适合短时间测试单个视频。
  4. 清理显存占用
    关闭其他占用 GPU 的程序(如浏览器、游戏、直播等),并重启软件后再试。
  5. 简化文件路径
    你的视频文件名和路径包含中文、括号、空格等特殊字符,虽然不直接导致显存不足,但可能引发其他解析问题。建议将视频移动到纯英文目录,例如 D:\video\2020.mp4

请查阅相关文档:

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update. Dondate: ko-fi.com/jianchang512

Related resource