#4100 numpy.core._exceptions._ArrayMemoryError: Unable to allocate 816. MiB for an array with shape (371383, 576) and data typ

91.150* Posted at: 4 hours ago 👁5

Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 432, in faster_whisper
File "faster_whisper\transcribe.py", line 890, in transcribe
File "faster_whisper\vad.py", line 98, in get_speech_timestamps
File "faster_whisper\vad.py", line 369, in call
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 816. MiB for an array with shape (371383, 576) and data type float32

Traceback (most recent call last):
File "videotrans\task\_only_one.py", line 44, in run
File "videotrans\task\trans_create.py", line 360, in recogn
File "videotrans\recognition\__init__.py", line 292, in run
File "videotrans\recognition\_base.py", line 143, in run
File "videotrans\recognition\_overall.py", line 33, in _exec
File "videotrans\recognition\_overall.py", line 105, in _faster
File "videotrans\configure\_base.py", line 289, in _new_process
RuntimeError: Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 432, in faster_whisper
File "faster_whisper\transcribe.py", line 890, in transcribe
File "faster_whisper\vad.py", line 98, in get_speech_timestamps
File "faster_whisper\vad.py", line 369, in call
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 816. MiB for an array with shape (371383, 576) and data type float32


=
system:Windows-10-10.0.19045-SP0
version:v3.99
frozen:True
language:en
root_dir:D:/pyvideotrans

1 Replies

This error is caused by insufficient memory (RAM) or VRAM when the faster-whisper model attempts to load the Voice Activity Detection (VAD) model for processing a long audio file. The system failed to allocate approximately 816 MB of memory for an array.

Solutions:

  1. Use a smaller Whisper model: In the main interface, under "Speech Recognition," change the model from large-v3 to a smaller one like medium, small, or base. The large-v3 model requires significant memory (at least 10GB VRAM with CUDA or 16GB RAM without).
  2. Adjust advanced settings (to reduce memory footprint):

    • Go to Menu -> Tools/Options -> Advanced Options.
    • In the "Speech Recognition Parameters" section:

      • Set CUDA Data Type to float16 or int8 (instead of float32).
      • Set beam_size to 1.
      • Set best_of to 1.
      • Set Enable Context Awareness to false.
  3. Ensure you are using CUDA acceleration: If you have an NVIDIA GPU, make sure "CUDA Acceleration" is checked on the main interface. Verify you have CUDA 12.8 and cuDNN 9.x+ installed.
  4. Disable "Secondary Recognition": If you have "Secondary Recognition" enabled on the main interface (top right), try disabling it, especially when processing multiple videos simultaneously, as it can double memory usage.
  5. Process shorter audio segments: If possible, split very long audio/video files into smaller segments before processing.

Please refer to the relevant documentation:

Post Your Reply