#4339 videotrans.configure._except.StopRetry: No Ref Audio. Clone disabled.

191.6* Posted at: 4 hours ago 👁9

No Ref Audio. Clone disabled.:
Traceback (most recent call last):
File "videotrans\util\ListenVoice.py", line 16, in run
File "videotrans\tts\__init__.py", line 394, in run
File "videotrans\tts\_base.py", line 137, in run
File "videotrans\tts\_clone.py", line 35, in _exec
File "videotrans\tts\_base.py", line 210, in _local_mul_thread
File "videotrans\tts\_clone.py", line 82, in _item_task
File "tenacity\__init__.py", line 338, in wrapped_f
File "tenacity\__init__.py", line 477, in call
File "tenacity\__init__.py", line 378, in iter
File "tenacity\__init__.py", line 400, in
File "concurrent\futures\_base.py", line 451, in result
File "concurrent\futures\_base.py", line 403, in __get_result
File "tenacity\__init__.py", line 480, in call
File "videotrans\tts\_clone.py", line 52, in _run
videotrans.configure._except.StopRetry: No Ref Audio. Clone disabled.


=
system:Windows-10-10.0.22631-SP0
version:v3.99
frozen:True
language:en
root_dir:F:/win-pyvideotrans-v3.99-428

1 Replies

This error occurs because the audio cloning feature is enabled (a clone voice role is selected) but no valid reference audio is found or the extracted speech segments are outside the required duration.

Cause: When you select a clone role (e.g., in F5-TTS, CosyVoice, GPT-SoVITS), the software attempts to clone the voice from the original video. This requires the system to extract short, clean speech segments as reference audio. If the source video has no clear, isolated speech of sufficient length (3–10 seconds) or the reference clips cannot be properly generated, the "Clone disabled" error is triggered.

Solution:

  1. Verify Reference Audio Files: Check if the required reference audio files exist in the f5-tts folder within your software installation directory (F:/win-pyvideotrans-v3.99-428/f5-tts/). For a standard clone, they may be missing or incorrectly named.
  2. Adjust Speech Segmentation Settings: For the clone process to work, the extracted audio clips must be between 3 and 10 seconds long. Go to Menu → Tools/Options → Advanced Options → Voice Recognition Parameters and set the following:

    • Minimum voice duration (ms): 3000
    • Maximum voice duration (seconds): 10
    • Check (enable) both Whisper pre-segmented audio and Merge short subtitles to adjacent.
  3. Enable Vocal Separation: In the main interface, expand the "More Settings..." panel and check the Separate vocals and background option. This is strongly recommended for a cleaner reference audio sample.
  4. Re-run the Task: Before starting again, ensure you check the Clean up generated checkbox on the main interface to force reprocessing from scratch.

请查阅相关文档:

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update.

Related resource