#4324 TaskCfgSTT(is_cuda=False, uuid='2a756a7671', cache_folder='E:/win-pyvideotrans-v3.99-420/tmp/3840/2a756a7671', target_di

2408:8256* Posted at: 2 hours ago 👁12

语音识别阶段出错 [faster-whisper(本地)] numpy.core._exceptions._ArrayMemoryError: Unable to allocate 563. MiB for an array with shape (147470861,) and data type float32
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "multiprocessing\pool.py", line 125, in worker
File "videotrans\task\vad.py", line 45, in get_speech_timestamp_silero
File "faster_whisper\audio.py", line 69, in decode_audio
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 563. MiB for an array with shape (147470861,) and data type float32
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "videotrans\task\job.py", line 105, in run
File "videotrans\task\_speech2text.py", line 159, in recogn
File "videotrans\recognition\__init__.py", line 293, in run
File "videotrans\recognition\_base.py", line 138, in run
File "videotrans\recognition\_overall.py", line 46, in _download
File "videotrans\recognition\_base.py", line 121, in _vad_split
File "videotrans\configure\_base.py", line 285, in _new_process
File "videotrans\process\signelobj.py", line 13, in result
File "multiprocessing\pool.py", line 774, in get
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 563. MiB for an array with shape (147470861,) and data type float32
TaskCfgSTT(is_cuda=False, uuid='2a756a7671', cache_folder='E:/win-pyvideotrans-v3.99-420/tmp/3840/2a756a7671', target_dir='E:/win-pyvideotrans-v3.99-420/output/recogn', source_language=None, source_language_code=None, source_sub=None, source_wav=None, source_wav_output=None, target_language=None, target_language_code=None, target_sub='E:/win-pyvideotrans-v3.99-420/output/recogn/MISM-437 アナル喉奥マ○コ 最狂贅沢3穴フルコース 雨宮凜.srt', target_wav=None, target_wav_output=None, name='G:/迅雷下载/MISM-437 アナル喉奥マ○コ 最狂贅沢3穴フルコース 雨宮凜/MISM-437 アナル喉奥マ○コ 最狂贅沢3穴フルコース 雨宮凜.mp4', noextname='MISM-437 アナル喉奥マ○コ 最狂贅沢3穴フルコース 雨宮凜', basename='MISM-437 アナル喉奥マ○コ 最狂贅沢3穴フルコース 雨宮凜.mp4', ext='mp4', dirname='G:/迅雷下载/MISM-437 アナル喉奥マ○コ 最狂贅沢3穴フルコース 雨宮凜', shound_del_name=None, detect_language='ja', recogn_type=0, model_name='large-v3-turbo', shibie_audio='E:/win-pyvideotrans-v3.99-420/tmp/3840/2a756a7671/MISM-437 アナル喉奥マ○コ 最狂贅沢3穴フルコース 雨宮凜-1777342905.0351837.wav', remove_noise=False, enable_diariz=False, nums_diariz=0, rephrase=0, fix_punc=False)


=
system:Windows-10-10.0.19045-SP0
version:v3.99
frozen:True
language:zh
root_dir:E:/win-pyvideotrans-v3.99-420

1 Replies

这是内存不足导致的语音识别失败,音频解码后生成的数组太大(约563MB),超出了系统可用内存。

解决方法(按推荐顺序尝试):

  1. 使用更小的模型。当前选用的是 large-v3-turbo,对内存要求较高。请在语音识别渠道右侧将模型更换为 smallbase 甚至 tiny。模型越小,内存占用越低。
  2. 开启 CUDA 加速(如果你的电脑配有 NVIDIA 显卡)。在主界面勾选CUDA加速,并将模型换回 large-v3-turbo,此时解码数据会放入显存,可避免系统内存不足。
  3. 在高级选项中开启音频预分割。点击菜单 工具/选项高级选项 → 语音识别参数区域,勾选 Whisper预分割音频?。这会将长音频提前切割成短片段,降低单次解码的内存峰值。
  4. 缩短最长语音持续秒数。同样在高级选项中,将 最长语音持续秒数 改为 35 秒,强制更早分割语音片段。
  5. 检查视频本身。如果原始视频特别长(如超过1小时),可尝试先将其分割为较短的片段再分别处理。

如果仍无法解决,请提供软件目录下 logs/ 文件夹中最新的日志文件,以便进一步分析。

请查阅相关文档:

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update.

Related resource