#4668 RuntimeError: No speech detected. Check audio & language.

2a02:8071* Posted at: 5 hours ago 👁13

No speech detected. Check audio & language.
Traceback (most recent call last):
File "videotrans\task\_only_one.py", line 50, in run
File "videotrans\task\trans_create.py", line 390, in recogn
File "videotrans\recognition\__init__.py", line 293, in run
File "videotrans\recognition\_base.py", line 145, in run
RuntimeError: No speech detected. Check audio & language.


=
system:Windows-10-10.0.22631-SP0
version:v3.99
frozen:True
language:en
root_dir:C:/videotranspy

1 Replies

This error indicates the speech recognition engine could not find any human voice in the audio. It's typically due to language mismatch, silent/empty audio, or aggressive VAD settings.

Solutions (try in order):

  1. Verify source language – Ensure "Pronunciation Language" matches the video's actual spoken language. Do not rely on "Auto" for video translation; manually select the correct language (e.g., English, Chinese). If uncertain, use the "Batch audio/video to subtitle" panel which supports automatic detection.
  2. Check video audio – Confirm the video has an audio track that is audible. Play the file with a media player to ensure it’s not muted or silent. If the audio is very quiet or noisy, enable the "Noise reduction" option on the main interface.
  3. Adjust ASR parameters – Go to Menu → Tools/Options → Advanced Options → Speech Recognition Parameters and set:

    • Voice threshold: 0.5
    • Min speech duration (ms): 3000
    • Max speech duration (s): 6
    • Silent split duration (ms): 140
    • Enable "Merge short subs into neighbors" (optional)
  4. Switch ASR model – Try a larger model like large-v3 (if GPU memory allows) or medium. Avoid .en or distil- models unless the video is 100% English.
  5. Try “Separate voice and background” – Check this option on the main interface. It extracts voice from noise before recognition.
  6. Reduce GPU memory usage – In advanced options, set CUDA data type to int8 or float16, beam_size to 1, best_of to 1, and enable Pre-split audio for whisper models.

If none of the above works, please provide the last 30 lines from logs/ folder (named with current date) for further analysis.

Reference documentation:

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update.

Related resource