#1557 large-v3-turbo，语音识别阈值 0.2了还是识别不出

2403:18c0* Posted at: 7 months ago

2025-11-22 13:26:41,706 - DEBUG - Compression ratio threshold is not met with temperature 0.2 (18.666667 > 2.400000)
2025-11-22 13:26:44,326 - DEBUG - Compression ratio threshold is not met with temperature 0.4 (18.666667 > 2.400000)
2025-11-22 13:26:46,945 - DEBUG - Compression ratio threshold is not met with temperature 0.6 (18.666667 > 2.400000)
2025-11-22 13:26:49,566 - DEBUG - Compression ratio threshold is not met with temperature 0.8 (18.666667 > 2.400000)
2025-11-22 13:26:52,187 - DEBUG - Compression ratio threshold is not met with temperature 1.0 (18.666667 > 2.400000)
Traceback (most recent call last):
File "videotrans\task\_only_one.py", line 42, in run
File "videotrans\task\trans_create.py", line 370, in recogn
RuntimeError: 2.mp4没有识别出字幕，请检查是否包含人类说话声音，以及说话语言是否和你选择的原始语言匹配
self.proxy_str='http://127.0.0.1:7897',self.uuid=None
2025-11-22 13:28:02,429 - INFO - 最终配置信息：self.cfg=TaskCfg(cache_folder='I:/tik/win-pyvideotrans-v3.85_20251121_002450/tmp10228/e63a28d4e7',
......
k/_video_out/2/en.m4a', subtitles='', novoice_mp4='I:/tik/win-pyvideotrans-v3.85_20251121_002450/tmp10228/e63a28d4e7/novoice.mp4', noextname='2', shibie_audio=None, targetdir_mp4='I:/tik/_video_out/2/2.mp4', instrument=None, vocal=None, back_audio='', background_music=None, app_mode='biaozhun', subtitle_type=0, volume='+0%', pitch='+0Hz', voice_rate='+2%', voice_role='Guy(Male/US)', copysrt_rawvideo=False, clear_cache=True, translate_type=0, tts_type=0, recogn_type=0, model_name='large-v3-turbo', split_type=0, voice_autorate=False, video_autorate=True, cuda=True, name='I:/tik/2.mp4', basename='2.mp4', ext='mp4', dirname='I:/tik', shound_del_name=None, uuid='e63a28d4e7', remove_silent_mid=False, align_sub_audio=True, enable_diariz=False, nums_diariz=0)
self.proxy_str='http://127.0.0.1:7897',self.uuid='e63a28d4e7'
2025-11-22 13:28:04,535 - INFO - 开始创建 pid:self.pidfile='I:/tik/win-pyvideotrans-v3.85_20251121_002450/tmp10228/5352.lock'
2025-11-22 13:28:09,191 - INFO - Processing audio with duration 04:17.488
2025-11-22 13:28:09,886 - INFO - VAD filter removed 00:37.488 of audio
2025-11-22 13:28:09,887 - DEBUG - VAD filter kept the following audio segments: [00:00.224 -> 00:05.216], [00:05.440 -> 00:09.472], [00:13.440 -> 00:16.416], [00:16.736 -> 00:20.800], [00:21.184 -> 00:22.048], [00:22.464 -> 00:23.296], [00:23.584 -> 00:26.848], [00:27.072 -> 00:29.600], [00:30.080 -> 00:34.336], [00:34.624 -> 00:36.480], [00:36.768 -> 00:39.776], [00:40.064 -> 00:41.792], [00:42.208 -> 00:43.488], [00:43.872 -> 00:46.304], [00:46.560 -> 00:48.128], [00:48.320 -> 00:50.752], [00:51.040 -> 00:56.032], [00:56.064 -> 01:00.064], [01:00.352 -> 01:03.456], [01:03.616 -> 01:05.760], [01:06.080 -> 01:10.784], [01:11.040 -> 01:14.016], [01:14.400 -> 01:16.704], [01:16.928 -> 01:18.304], [01:18.656 -> 01:19.968], [01:20.160 -> 01:25.152], [01:25.184 -> 01:25.536], [01:25.952 -> 01:26.208], [01:26.560 -> 01:27.360], [01:27.680 -> 01:29.344], [01:29.568 -> 01:33.088], [01:33.376 -> 01:35.264], [01:35.584 -> 01:37.152], [01:37.568 -> 01:39.200], [01:39.520 -> 01:40.768], [01:41.216 -> 01:42.592], [01:42.816 -> 01:44.096], [01:44.416 -> 01:45.920], [01:46.144 -> 01:47.904], [01:48.224 -> 01:50.016], [01:50.336 -> 01:51.328], [01:51.584 -> 01:52.896], [01:53.152 -> 01:56.128], [01:56.480 -> 01:58.144], [01:58.400 -> 02:01.536], [02:01.824 -> 02:04.320], [02:04.704 -> 02:07.456], [02:08.096 -> 02:09.696], [02:10.048 -> 02:12.832], [02:13.152 -> 02:14.496], [02:14.912 -> 02:16.480], [02:16.800 -> 02:19.200], [02:19.488 -> 02:22.432], [02:22.784 -> 02:24.288], [02:24.608 -> 02:26.528], [02:26.784 -> 02:28.096], [02:28.352 -> 02:32.768], [02:33.120 -> 02:35.968], [02:36.224 -> 02:37.920], [02:38.176 -> 02:41.504], [02:41.824 -> 02:42.240], [02:42.528 -> 02:45.472], [02:45.952 -> 02:47.200], [02:47.392 -> 02:49.984], [02:50.432 -> 02:52.128], [02:52.512 -> 02:53.664], [02:54.176 -> 02:58.560], [02:58.752 -> 03:00.064], [03:00.384 -> 03:01.312], [03:01.600 -> 03:03.552], [03:03.968 -> 03:07.648], [03:07.936 -> 03:12.512], [03:12.928 -> 03:16.192], [03:16.672 -> 03:19.744], [03:20.000 -> 03:24.384], [03:25.536 -> 03:27.424], [03:27.712 -> 03:29.504], [03:29.792 -> 03:31.392], [03:31.744 -> 03:33.504], [03:33.792 -> 03:35.200], [03:35.584 -> 03:37.760], [03:38.048 -> 03:40.000], [03:40.416 -> 03:42.688], [03:43.008 -> 03:45.696], [03:46.112 -> 03:47.360], [03:47.616 -> 03:49.056], [03:49.280 -> 03:52.032], [03:52.256 -> 03:54.016], [03:54.208 -> 03:56.800], [03:57.152 -> 03:59.296], [03:59.680 -> 04:00.640], [04:00.896 -> 04:01.952], [04:02.304 -> 04:04.000], [04:04.352 -> 04:06.624], [04:06.944 -> 04:08.160], [04:08.448 -> 04:10.432], [04:10.784 -> 04:14.880]
2025-11-22 13:28:10,025 - DEBUG - Processing segment at 00:00.000
2025-11-22 13:28:17,193 - DEBUG - Compression ratio threshold is not met with temperature 0.0 (18.666667 > 2.400000)
2025-11-22 13:28:17,193 - DEBUG - Log probability threshold is not met with temperature 0.0 (-291.128876 < -1.000000)

Floor #1 DeepSeek reply at 7 months ago

从日志分析，问题在于语音识别模型无法有效识别音频内容。日志显示 "Compression ratio threshold is not met" 和 "Log probability threshold is not met"，这表明音频质量或内容存在问题。

请按以下步骤排查：

检查音频内容：确认视频确实包含清晰的人类说话声音，而非纯背景音乐、环境噪音或机器语音。
调整语音识别设置：
- 在语音识别区域，将 识别模式 从 整体识别 改为 批量推理。
- 在 高级选项 -> Faster/OpenAI/Whisper 中：
  - 将 语音阈值 从 0.2 进一步降低到 0.1 或 0.05（提高灵敏度）
  - 将 静音分割持续毫秒 从默认的 140 增加到 300 或 500（减少过度切分）
更换识别模型：
- 如果使用 large-v3-turbo 仍无法识别，请尝试切换到 阿里FunASR(本地)，该模型对中文语音识别效果更好。
验证音频文件：
- 使用工具箱中的 音视频合并/分离 功能提取视频中的音频，用播放器确认是否包含可听清的人声。

如果以上调整后问题依旧，请提供视频文件的基本信息（时长、语言类型、音频质量描述）。

#1557 large-v3-turbo，语音识别阈值 0.2了还是识别不出

1 Replies

Post Your Reply

Related resource