#4100 numpy.core._exceptions._ArrayMemoryError: Unable to allocate 816. MiB for an array with shape (371383, 576) and data typ

91.150* Posted at: 2 months ago

Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 432, in faster_whisper
File "faster_whisper\transcribe.py", line 890, in transcribe
File "faster_whisper\vad.py", line 98, in get_speech_timestamps
File "faster_whisper\vad.py", line 369, in call
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 816. MiB for an array with shape (371383, 576) and data type float32

Traceback (most recent call last):
File "videotrans\task\_only_one.py", line 44, in run
File "videotrans\task\trans_create.py", line 360, in recogn
File "videotrans\recognition\__init__.py", line 292, in run
File "videotrans\recognition\_base.py", line 143, in run
File "videotrans\recognition\_overall.py", line 33, in _exec
File "videotrans\recognition\_overall.py", line 105, in _faster
File "videotrans\configure\_base.py", line 289, in _new_process
RuntimeError: Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 432, in faster_whisper
File "faster_whisper\transcribe.py", line 890, in transcribe
File "faster_whisper\vad.py", line 98, in get_speech_timestamps
File "faster_whisper\vad.py", line 369, in call
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 816. MiB for an array with shape (371383, 576) and data type float32

=
system:Windows-10-10.0.19045-SP0
version:v3.99
frozen:True
language:en
root_dir:D:/pyvideotrans

Floor #1 DeepSeek reply at 2 months ago

This error is caused by insufficient memory (RAM) or VRAM when the faster-whisper model attempts to load the Voice Activity Detection (VAD) model for processing a long audio file. The system failed to allocate approximately 816 MB of memory for an array.

Solutions:

Use a smaller Whisper model: In the main interface, under "Speech Recognition," change the model from large-v3 to a smaller one like medium, small, or base. The large-v3 model requires significant memory (at least 10GB VRAM with CUDA or 16GB RAM without).
Adjust advanced settings (to reduce memory footprint):
- Go to Menu -> Tools/Options -> Advanced Options.
- In the "Speech Recognition Parameters" section:
  - Set CUDA Data Type to float16 or int8 (instead of float32).
  - Set beam_size to 1.
  - Set best_of to 1.
  - Set Enable Context Awareness to false.
Ensure you are using CUDA acceleration: If you have an NVIDIA GPU, make sure "CUDA Acceleration" is checked on the main interface. Verify you have CUDA 12.8 and cuDNN 9.x+ installed.
Disable "Secondary Recognition": If you have "Secondary Recognition" enabled on the main interface (top right), try disabling it, especially when processing multiple videos simultaneously, as it can double memory usage.
Process shorter audio segments: If possible, split very long audio/video files into smaller segments before processing.

#4100 numpy.core._exceptions._ArrayMemoryError: Unable to allocate 816. MiB for an array with shape (371383, 576) and data typ

1 Replies

Please refer to the relevant documentation:

Post Your Reply

Related resource