#4839 TaskCfgVTT(uuid='32a6aada06', name='D:/yyyyyyyyyyyyy/-Jm7fKpwBIYWeekly Webinar 🔥 Trading Hub 2.O.mp4', dirname='D:/yyyy

2408:8220* Posted at: 2 hours ago

语音识别阶段出错[faster-whisper(本地)] Unable to allocate 788. MiB for an array with shape (1, 258283, 400) and data type float64:Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 282, in faster_whisper
File "faster_whisper\transcribe.py", line 916, in transcribe
File "faster_whisper\feature_extractor.py", line 215, in call
File "faster_whisper\feature_extractor.py", line 189, in stft
File "numpy\fft\_pocketfft.py", line 409, in rfft
File "numpy\fft\_pocketfft.py", line 70, in _raw_fft
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 788. MiB for an array with shape (1, 258283, 400) and data type float64

Traceback (most recent call last):

File "videotrans\task\job.py", line 54, in run

File "videotrans\task\job.py", line 119, in process_task

File "videotrans\task\trans_create.py", line 320, in recogn

File "videotrans\recognition\__init__.py", line 190, in run

File "videotrans\recognition\_base.py", line 93, in run

File "videotrans\recognition\_whisper.py", line 35, in _exec

File "videotrans\recognition\_whisper.py", line 108, in _faster

File "videotrans\configure\base.py", line 252, in _new_process

videotrans.configure.excepts.VideoTransError: Unable to allocate 788. MiB for an array with shape (1, 258283, 400) and data type float64:Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 282, in faster_whisper
File "faster_whisper\transcribe.py", line 916, in transcribe
File "faster_whisper\feature_extractor.py", line 215, in call
File "faster_whisper\feature_extractor.py", line 189, in stft
File "numpy\fft\_pocketfft.py", line 409, in rfft
File "numpy\fft\_pocketfft.py", line 70, in _raw_fft
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 788. MiB for an array with shape (1, 258283, 400) and data type float64
TaskCfgVTT(uuid='32a6aada06', name='D:/yyyyyyyyyyyyy/-Jm7fKpwBIYWeekly Webinar 🔥 Trading Hub 2.O.mp4', dirname='D:/yyyyyyyyyyyyy', noextname='-Jm7fKpwBIYWeekly Webinar 🔥 Trading Hub 2.O', basename='-Jm7fKpwBIYWeekly Webinar 🔥 Trading Hub 2.O.mp4', ext='mp4', target_dir='D:/hhhhhhhhhhhhhhhhhh/-Jm7fKpwBIYWeekly Webinar 🔥 Trading Hub 2.O-mp4', cache_folder='D:/翻译/win-pyvideotrans-v4.00-528/tmp/22824/32a6aada06', is_cuda=False, source_language='英语', source_language_code='en', source_sub='D:/hhhhhhhhhhhhhhhhhh/-Jm7fKpwBIYWeekly Webinar 🔥 Trading Hub 2.O-mp4/en.srt', source_wav='D:/翻译/win-pyvideotrans-v4.00-528/tmp/22824/32a6aada06/en.wav', source_wav_output='D:/hhhhhhhhhhhhhhhhhh/-Jm7fKpwBIYWeekly Webinar 🔥 Trading Hub 2.O-mp4/en.m4a', target_language='简体中文', target_language_code='zh-cn', target_sub='D:/hhhhhhhhhhhhhhhhhh/-Jm7fKpwBIYWeekly Webinar 🔥 Trading Hub 2.O-mp4/zh-cn.srt', target_wav='D:/翻译/win-pyvideotrans-v4.00-528/tmp/22824/32a6aada06/target.wav', target_wav_output='D:/hhhhhhhhhhhhhhhhhh/-Jm7fKpwBIYWeekly Webinar 🔥 Trading Hub 2.O-mp4/zh-cn.m4a', translate_type=0, tts_type=0, volume='+0%', pitch='+0Hz', voice_rate='+0%', voice_role='Xiaoxiao(Female/CN)', voice_autorate=True, video_autorate=True, remove_silent_mid=False, align_sub_audio=True, detect_language='en', recogn_type=0, model_name='large-v3-turbo', shibie_audio=None, remove_noise=False, enable_diariz=False, nums_diariz=0, rephrase=0, fix_punc=False, subtitle_language=None, app_mode='biaozhun', subtitles='', targetdir_mp4='D:/hhhhhhhhhhhhhhhhhh/-Jm7fKpwBIYWeekly Webinar 🔥 Trading Hub 2.O-mp4/-Jm7fKpwBIYWeekly Webinar 🔥 Trading Hub 2.O.mp4', novoice_mp4='D:/翻译/win-pyvideotrans-v4.00-528/tmp/22824/32a6aada06/novoice.mp4', is_separate=False, embed_bgm=True, instrument=None, vocal='D:/翻译/win-pyvideotrans-v4.00-528/tmp/22824/32a6aada06/vocal.wav', clear_cache=True, background_music='', subtitle_type=1, only_out_mp4=True, recogn2pass=False, output_srt=0, copysrt_rawvideo=False, loop_backaudio=1, backaudio_volume=0.8)


=
system:Windows-10-10.0.19045-SP0
version:v4.00
frozen:True
language:zh
root_dir:D:/翻译/win-pyvideotrans-v4.00-528

1 Replies

这是由于语音识别时使用的 large-v3-turbo 模型过大,导致系统内存(或 GPU 显存)不足以分配所需数组。

解决方案(按优先级尝试):

  1. 切换更小的模型
    在语音识别渠道下拉框中选择 basesmallmedium 模型,而非 large-v3-turbo。小模型所需资源大幅降低,处理速度也更快。
  2. 调整高级选项以降低资源消耗
    打开菜单 → 工具高级选项语音识别参数,按以下设置:

    • CUDA数据类型:选择 int8(如果未使用 GPU 可忽略此项)
    • beam_size:改为 1
    • best_of:改为 1
    • 启用上下文感知:取消勾选(设为 false
  3. 启用 CUDA 加速(仅限 NVIDIA 显卡)
    如果你有 NVIDIA 显卡,请安装 CUDA 12.8 和 cuDNN 9.x,然后在主界面勾选 CUDA加速。GPU 加速可显著降低 CPU 内存压力,并大幅提升处理速度。
  4. 简化文件路径
    你的视频文件名包含 🔥 等特殊符号且路径较长(D:/yyyyyyyyyyyyy/...),这可能导致 ffmpeg 和识别引擎异常。建议将视频文件重命名为简短英文(如 webinar.mp4),并放到浅层目录(例如 D:\videos\)。
  5. 临时关闭声画同步中的“视频慢速”和“音频加速”
    若仍遇到内存不足,可先取消勾选主界面中的 视频慢速音频加速,减少中间文件生成以降低整体资源占用。

请查阅相关文档:

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update.

Related resource