#4507 TaskCfgSTT(is_cuda=True, uuid='57ea865ef8', cache_folder='C:/Users/user/Documents/WhisperJAV/output/ja.whisperjav/win-py

183.179* Posted at: 1 hour ago 👁15

语音识别阶段出错 [Deepgram.com] 程序内部错误:'>' not supported between instances of 'int' and 'str'
Traceback (most recent call last):
File "videotrans\recognition\_base.py", line 143, in run
File "tenacity\__init__.py", line 338, in wrapped_f
File "tenacity\__init__.py", line 477, in call
File "tenacity\__init__.py", line 378, in iter
File "tenacity\__init__.py", line 421, in exc_check
tenacity.RetryError: RetryError[]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "videotrans\task\job.py", line 105, in run
File "videotrans\task\_speech2text.py", line 191, in recogn
File "videotrans\recognition\__init__.py", line 265, in run
File "videotrans\recognition\_base.py", line 180, in run
File "tenacity\__init__.py", line 480, in call
File "videotrans\recognition\_deepgram.py", line 98, in _exec
File "deepgram_captions\srt.py", line 10, in srt
File "deepgram_captions\converters.py", line 18, in get_lines
TypeError: '>' not supported between instances of 'int' and 'str'
TaskCfgSTT(is_cuda=True, uuid='57ea865ef8', cache_folder='C:/Users/user/Documents/WhisperJAV/output/ja.whisperjav/win-pyvideotrans-v3.99-0508/tmp/56756/57ea865ef8', target_dir='C:/Users/user/Documents/WhisperJAV/output/ja.whisperjav/win-pyvideotrans-v3.99-0508/output/recogn', source_language=None, source_language_code=None, source_sub=None, source_wav=None, source_wav_output=None, target_language=None, target_language_code=None, target_sub='C:/Users/user/Documents/WhisperJAV/output/ja.whisperjav/win-pyvideotrans-v3.99-0508/output/recogn/[email protected]', target_wav=None, target_wav_output=None, name='F:/qb bt3/WAV/pyvideotrans test/[email protected]3', noextname='hhd800.com@FC2-PPV-3117668', basename='[email protected]3', ext='mp3', dirname='F:/qb bt3/WAV/pyvideotrans test', shound_del_name=None, detect_language='ja', recogn_type=10, model_name='nova-3', shibie_audio='C:/Users/user/Documents/WhisperJAV/output/ja.whisperjav/win-pyvideotrans-v3.99-0508/tmp/56756/57ea865ef8/[email protected]', remove_noise=False, enable_diariz=False, nums_diariz=0, rephrase=1, fix_punc=False)


=
system:Windows-10-10.0.26200-SP0
version:v3.99
frozen:True
language:zh
root_dir:C:/Users/user/Documents/WhisperJAV/output/ja.whisperjav/win-pyvideotrans-v3.99-0508

9 Replies

这是由于 deepgram_captions 库内部数据格式不兼容导致的,可能是 Deepgram API 返回的时间戳字段类型异常(int 与 str 比较错误)。

解决方案

  1. 尝试更新 pyVideoTrans 到 v3.99 版本(若当前不是最新)。
  2. 暂时更换其他语音识别渠道,如 faster-whisper(本地)OpenAI Whisper
  3. 如果问题持续,请提供 Deepgram 的 API 响应日志(复制 logs/ 文件夹下最新日志的底部 30 行)以便进一步分析。

如果问题持续,请提供 Deepgram 的 API 响应日志(复制 logs/ 文件夹下最新日志的底部 30 行)以便进一步分析。
how to access this

Locate the logs folder in the software's installation directory. Inside, you'll find log files named with the current date (e.g., 2025-03-20.log). Open the latest one and copy the last 30 lines of error information.

[INFO] Starting call to 'videotrans.translator._chatgpt.ChatGPT.llm_segment.._send', this is the 1st time calling it.
[DEBUG] 需要断句的:message=[{'role': 'system', 'content': 'Role:\nYou are an expert Multi-Language ASR Post-Editor. Your goal is to losslessly repair, correct, and re-segment subtitles from the given SRT.\n\n


\n\n# GOLDEN RULE — ABSOLUTE REQUIREMENTS\n1. Do NOT delete any meaningful information.\n2. Removing meaningless fillers ("uh", "umm", "erm", “えー”, “嗯”) is allowed and does NOT violate rule #1.\n3. All semantic content must appear in the final output. No summarizing. No shortening.\n\n
\n\n# CORE TASKS\n1. Retention: preserve all meaning.\n2. Correction: fix ASR mistakes (typos, homophones, missing punctuation).\n3. Segmentation: cut long text into natural sentences.\n4. Formatting: output valid SRT wrapped in and .\n\n
\n\n# SEGMENTATION RULES (CRITICAL)\n1. If an original block contains multiple sentences or is longer th
......
UT RULES\n1. Always discard original SRT numbering.\n2. Rebuild numbering starting from 1, strictly increasing.\n3. Output must be:\n\n\n1\n00:00:00,000 --> 00:00:02,000\nText...\n...\n\n\n4. If any required structure is missing → regenerate internally before showing.\n\n
\n\n# Input SRT:\n'}, {'role': 'user', 'content': 'srt\n1\n00:07:49,260 --> 00:07:49,660\nじゃ\n\n2\n00:07:50,540 --> 00:07:55,500\nサーいい女のね、条件でそれ。じゃ、それしちゃいけなから。\n\n3\n00:07:58,060 --> 00:07:59,900\nいいね。\n\n4\n00:08:03,855 --> 00:08:04,495\nさ\n\n5\n00:08:06,175 --> 00:08:09,455\nどうなってんの服そうなんですよ。それも\n\n6\n00:08:10,095 --> 00:08:14,490\n抜かされづらい服にしたんで。確かにね。これだとさー、軽\n\n7\n00:08:14,490 --> 00:08:17,290\nい女にならな。連れて帰ってもすぐにセックスできなも\n\n8\n00:08:17,290 --> 00:08:18,890\nね。ここ\n\n9\n00:08:18,890 --> 00:08:21,290\nでまた止めれるから。\n\n10\n00:09:35,760 --> 00:09:37,120\n早いのまだ\n\n11\n00:09:37,200 --> 00:09:40,960\n早い。まだ何回かね。何かあるからいいね\n\n12\n00:09:42,160 --> 00:09:45,920\n。逆に教えてもらえる感じがするわ。あ、こっち。\n\n13\n00:09:46,560 --> 00:09:48,080\n応援して。\n\n14\n00:09:48,880 --> 00:09:50,480\nアイソンの私でも\n\n15\n00:09:52,080 --> 00:09:53,055\n。これこれ\n\n16\n00:09:54,815 --> 00:09:58,015\nほどほどにいのだっけちょっと\n\n17\n00:09:58,015 --> 00:10:00,415\nよくわからなくなってるけど。という\n\n18\n00:10:01,695 --> 00:10:03,855\n感じお、鼻\n\n19\n00:10:03,855 --> 00:10:06,100\nが出てきた。出てきた。いい\n\n20\n00:10:09,940 --> 00:10:11,700\nね、ほら。こう\n\n21\n00:10:12,580 --> 00:10:14,580\nいうのめちゃくちゃそそるんだよ\n\n22\n00:10:15,620 --> 00:10:19,585\n。夏になってさ、みんなこういう感じの格好なるでしょみんな次郎じゃ\n\n23\n00:10:19,585 --> 00:10:23,265\nな。家とか見てるんだ。いや男は全員見てるから。見て\n\n24\n00:10:23,265 --> 00:10:26,785\nないふりして格好せいでけど、みんな見てるよね。てる。そこ\n\n25\n00:10:27,345 --> 00:10:28,545\nのあなた。いい\n\n26\n00:10:31,665 --> 00:10:35,980\nね。海とかよく行かれるですか。海海行かないよ。いや行\n\n27\n00:10:35,980 --> 00:10:40,060\nかな。だって海行ったらさ、こういうとこ見に行くんじゃな違うの、海\n\n28\n00:10:40,060 --> 00:10:44,380\n行ったらさ、俺とは住む世界が違う奴らもいっぱいでしょ。\n\n29\n00:10:44,380 --> 00:10:45,820\n海の男たちが\n\n30\n00:10:46,780 --> 00:10:49,145\n。俺とりあえず帰ってき、こんな\n\n31\n00:10:49,465 --> 00:10:50,505\n感じいつも\n\n32\n00:10:51,145 --> 00:10:53,865\n女がいたらへーってくる。何か\n\n33\n00:10:54,105 --> 00:10:57,865\nやばいすげえ。いい女のね、普段見せなとこ見たいの。\n\n34\n00:10:59,260 --> 00:11:02,060\nわー、すげー綺麗。めちゃ\n\n35\n00:11:03,740 --> 00:11:05,740\nくちゃすべすべじゃん、脇。\n\n36\n00:11:11,035 --> 00:11:15,115\n何やっぱケアしてるのケアしてるように見えるだけ見える\n\n37\n00:11:15,275 --> 00:11:18,555\n、だってないも怪我、ほらすげえ。\n\n38\n00:11:19,835 --> 00:11:23,755\nおお、そこをつつする人はついてる。今日\n\n39\n00:11:24,395 --> 00:11:26,475\nは初めてづくしだから\n\n40\n00:11:26,980 --> 00:11:28,900\nやったことないとづくし。初めて。。\n\n41\n00:11:33,780 --> 00:11:34,180\nて\n\n42\n00:11:40,005 --> 00:11:42,565\n柔らかさを表すやつうん\n\n43\n00:11:42,565 --> 00:11:45,605\n。本当に嘘でしょ、ちょっと。ま\n\n44\n00:11:47,285 --> 00:11:48,565\nんこっぽくなでも\n\n45\n00:11:49,605 --> 00:11:50,085\nいい\n\n46\n00:11:51,765 --> 00:11:52,565\n感じ\n\n47\n00:11:53,445 --> 00:11:57,520\nどういう漫画が分かった、これで。分かった。二はいいいい\n\n48\n00:12:02,720 --> 00:12:03,920\n匂いするね。\n\n49\n00:12:04,800 --> 00:12:06,160\n構成つけてるここ。\n\n50\n00:12:12,295 --> 00:12:15,335\nちょっとさ、逆やりたいよね一回。え\n'}]

感谢提供日志,但其中并未包含 Deepgram API 的响应信息。日志显示的是 LLM 重新断句(ChatGPT)的调用过程,而非 Deepgram 语音识别的报错。

请检查 logs/ 文件夹中与 Deepgram 使用时间对应的日志文件,复制底部 30 行包含以下关键字的报错内容:

  • deepgram
  • RetryError
  • TypeError
  • Deepgram API

这样可以定位 Deepgram 返回的数据异常问题。

ended workers
[WARNING] 修正字幕时间轴重叠:将前面字幕 end_time=137420 改为当前字幕 start_time, it={'line': 13, 'start_time': 136940, 'end_time': 138780, 'text': 'ます撮っ', 'startraw': '00:02:16,940', 'endraw': '00:02:18,780', 'time': '00:02:16,940 --> 00:02:18,780'}
进入执行说话人分离阶段
进入执行完成阶段
进入执行说话人分离阶段
进入执行完成阶段
进入执行说话人分离阶段
进入执行完成阶段
[ERROR] list index out of range
Traceback (most recent call last):
File "videotrans\recognition\_base.py", line 143, in run
File "tenacity\__init__.py", line 338, in wrapped_f
File "tenacity\__init__.py", line 477, in call
File "tenacity\__init__.py", line 378, in iter
File "tenacity\__init__.py", line 421, in exc_check
tenacity.RetryError: RetryError[]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "videotrans\task\job.py", line 105, in run
File "videotrans\task\_speech2text.py", line 191, in recogn
File "videotrans\recognition\__init__.py", line 265, in run
File "videotrans\recognition\_base.py", line 180, in run
File "tenacity\__init__.py", line 480, in call
File "videotrans\recognition\_deepgram.py", line 98, in _exec
File "deepgram_captions\srt.py", line 19, in srt
IndexError: list index out of range

list index out of range 错误表明 Deepgram API 返回的转录数据结构不完整,导致 deepgram_captions 库在解析生成 SRT 时引用了空列表。这是 Deepgram 渠道自身的兼容性问题,不是您的配置错误。

解决方案(按推荐顺序尝试):

  1. 更换语音识别渠道:暂时改用其他更稳定的渠道,例如:

    • faster-whisper(本地)(推荐,支持 CUDA 加速)
    • openai-whisper(本地)
    • 或通过 菜单-工具-选项-语音识别参数 选择其他在线服务(如火山引擎、阿里百炼等)
  2. 升级软件版本:确认您使用的是最新版 v3.99,可前往 pyvideotrans.com/downpackage 下载更新。
  3. 如果仍需使用 Deepgram,请尝试在 高级选项语音识别参数 中调整 rephrase(断句方式)为 0(默认),或降低 best_ofbeam_size 值以减少数据复杂性。

请查阅相关文档:

调整 rephrase(断句方式)为 0(默认)
cannot see the option of rephrase(断句方式)

rephrase 选项位于 菜单 → 工具/选项 → 高级选项 → 语音识别参数 区域,在界面中显示为 "LLM重新断句" 或其下的选项。请按以下步骤操作:1. 点击顶部菜单栏 工具/选项;2. 选择 高级选项 选项卡;3. 在 语音识别参数 部分找到 LLM重新断句 相关设置,将其设为 "不处理""默认"(对应 rephrase=0)。如果仍然看不到,可以直接关闭 “LLM重新断句” 功能(勾选区域内的禁用项),或保持默认设置即可(rephrase 默认值即为 0)。

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update.

Related resource