The robotic voice occurs because the "Dubbing Speedup" option forces synthesized audio to accelerate beyond natural limits, causing distortion. This typically happens when translated speech is significantly longer than the original segment, pushing the speed-up ratio past safe thresholds.
Solutions:
Disable Dubbing Speedup and rely on Video Slowdown or Remove silence between subtitles to handle timing mismatches.
Reduce the global speech rate: set Dubbing Speech Rate (配音语速) to a modest value like +5 or +10 instead of relying on automatic acceleration.
In Menu → Tools/Options → Advanced Options → Subtitle/Audio/Video Alignment, lower Maximum audio speedup factor to 50 (or less) so that no segment gets sped up beyond natural limits.
If you are using a voice clone (clone role), ensure the original reference audio segments are between 3 and 10 seconds (adjust Min speech duration to 3000 ms and Max speech duration to 10 s in Advanced Options).
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update.