#4973 Admin大神,二次识别后,能不能加个与识别前字幕比对纠错的步骤?

120.229* Posted at: 17 hours ago

二次识别会产生很多错别字,但识别这前的字幕是好的,能不能在识别之后加一步与之前字幕逐字比对的功能

9 Replies

二次识别的主要目的是获取与配音精确对齐的简短时间轴,并非文字校对工具。目前软件在二次识别后会弹出字幕编辑框,允许你手动逐条修改错别字(第5.3节)。软件没有内置自动逐字比对纠错功能。

你可以采取以下方案:

  1. 手动修改:在单视频翻译模式下,二次识别完成后会弹出编辑窗口,你可以在其中对照之前的原始字幕修正错别字。
  2. 启用LLM重新断句:在菜单→工具→高级选项→通用设置中,将“LLM重新断句所用AI渠道”配置为DeepSeek或OpenAI等,可以在语音识别后自动利用AI修正错别字和断句(但需注意在克隆音色时不宜使用,详见第16节)。

如果自动比对纠错是核心需求,建议向开发者反馈,或考虑使用外部字幕校对工具。

请查阅相关文档:

Transcribe bath阶段的语音识别文字阶段,默认好像是4条一次,这个批的数量能改吗

可以在高级选项中修改。在菜单 → 工具/选项 → 高级选项 → 通用设置中:

  • CPU同时任务数[重启生效]:控制语音转录时CPU并行处理的任务数,修改后需重启。
  • GPU同时任务数[重启生效]:控制ASR时GPU并行的任务数,若使用GPU加速,建议设为1(除非单卡显存>24G)。
  • 批量翻译视频时每批数量:这个参数同样适用于“批量语音转字幕”功能,可限制每批同时处理的文件数(设为0表示不限制)。

上述参数均可在高级选项中按需调整,重启软件后生效。若仍无法满足,可检查高级选项中的GPU同时任务数CPU同时任务数的具体数值。

[Errno 2] No such file or directory: 'D:\win-pyvideotrans-v4.01-0602\tmp\34996\68fa2c4fcf\speech_timestamps_1780628668.9381607.json':Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 201, in faster_whisper
File "pathlib.py", line 1134, in read_text
File "pathlib.py", line 1119, in open
FileNotFoundError: [Errno 2] No such file or directory: 'D:\win-pyvideotrans-v4.01-0602\tmp\34996\68fa2c4fcf\speech_timestamps_1780628668.9381607.json'
语音识别阶段出错:faster-whisper(本地), 翻译字幕阶段出错: DeepSeek, 配音阶段出错:Index-TTS(本地API)
Traceback (most recent call last):
File "videotrans\task\only_one.py", line 47, in run
File "videotrans\task\trans_create.py", line 322, in recogn
File "videotrans\recognition\__init__.py", line 190, in run
File "videotrans\recognition\_base.py", line 94, in run
File "videotrans\recognition\_whisper.py", line 36, in _exec
File "videotrans\recognition\_whisper.py", line 109, in _faster
File "videotrans\configure\base.py", line 253, in _new_process
videotrans.configure.excepts.VideoTransError: [Errno 2] No such file or directory: 'D:\win-pyvideotrans-v4.01-0602\tmp\34996\68fa2c4fcf\speech_timestamps_1780628668.9381607.json':Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 201, in faster_whisper
File "pathlib.py", line 1134, in read_text
File "pathlib.py", line 1119, in open
FileNotFoundError: [Errno 2] No such file or directory: 'D:\win-pyvideotrans-v4.01-0602\tmp\34996\68fa2c4fcf\speech_timestamps_1780628668.9381607.json'
TaskCfgVTT(uuid='68fa2c4fcf', name='F:/#0Daydown/025/video/4 - Creating realistic Grass Foliage/4 - Import Grass Foliage to Unreal Engine 5 and create Subsurface Material.mp4', dirname='F:/#0Daydown/025/video/4 - Creating realistic Grass Foliage', noextname='4 - Import Grass Foliage to Unreal Engine 5 and create Subsurface Material', basename='4 - Import Grass Foliage to Unreal Engine 5 and create Subsurface Material.mp4', ext='mp4', target_dir='F:/#0Daydown/025/video/4 - Creating realistic Grass Foliage/_video_out/4 - Import Grass Foliage to Unreal Engine 5 and create Subsurface Material-mp4', cache_folder='D:/win-pyvideotrans-v4.01-0602/tmp/34996/68fa2c4fcf', is_cuda=True, source_language='英语', source_language_code='en', source_sub='F:/#0Daydown/025/video/4 - Creating realistic Grass Foliage/_video_out/4 - Import Grass Foliage to Unreal Engine 5 and create Subsurface Material-mp4/en.srt', source_wav='D:/win-pyvideotrans-v4.01-0602/tmp/34996/68fa2c4fcf/en.wav', source_wav_output='F:/#0Daydown/025/video/4 - Creating realistic Grass Foliage/_video_out/4 - Import Grass Foliage to Unreal Engine 5 and create Subsurface Material-mp4/en.m4a', target_language='简体中文', target_language_code='zh-cn', target_sub='F:/#0Daydown/025/video/4 - Creating realistic Grass Foliage/_video_out/4 - Import Grass Foliage to Unreal Engine 5 and create Subsurface Material-mp4/zh-cn.srt', target_wav='D:/win-pyvideotrans-v4.01-0602/tmp/34996/68fa2c4fcf/target.wav', target_wav_output='F:/#0Daydown/025/video/4 - Creating realistic Grass Foliage/_video_out/4 - Import Grass Foliage to Unreal Engine 5 and create Subsurface Material-mp4/zh-cn.m4a', translate_type=4, tts_type=9, volume='+0%', pitch='+0Hz', voice_rate='+0%', voice_role='No', voice_autorate=True, video_autorate=False, remove_silent_mid=False, align_sub_audio=True, detect_language='en', recogn_type=0, model_name='large-v3-turbo', shibie_audio=None, remove_noise=False, enable_diariz=False, nums_diariz=0, rephrase=1, fix_punc=True, subtitle_language=None, app_mode='biaozhun', subtitles='', targetdir_mp4='F:/#0Daydown/025/video/4 - Creating realistic Grass Foliage/_video_out/4 - Import Grass Foliage to Unreal Engine 5 and create Subsurface Material-mp4/4 - Import Grass Foliage to Unreal Engine 5 and create Subsurface Material.mp4', novoice_mp4='D:/win-pyvideotrans-v4.01-0602/tmp/34996/68fa2c4fcf/novoice.mp4', is_separate=False, embed_bgm=False, instrument=None, vocal='D:/win-pyvideotrans-v4.01-0602/tmp/34996/68fa2c4fcf/vocal.wav', clear_cache=False, background_music='', subtitle_type=3, only_out_mp4=False, recogn2pass=False, output_srt=2, copysrt_rawvideo=False, loop_backaudio=1, backaudio_volume=0.8)

该错误是由于文件路径过长且包含空格和特殊字符导致的,Windows对路径长度有限制(通常260字符)。日志中文件名:“4 - Import Grass Foliage to Unreal Engine 5 and create Subsurface Material.mp4” 和目录层级过深,导致临时文件创建失败。解决方案:将视频文件重命名为极短的英文名称(如“grass.mp4”),并放到浅层目录下(如 D:\videos\),然后重新执行任务。

这个问题也是随机出现的吗,之前都没问题,偶尔就会报这个错

是的,这个问题具有随机性。虽然根本原因是Windows路径长度限制(260字符),但具体是否报错取决于:

  1. 临时文件路径的长度:每次运行时 tmp 目录下的随机UUID长度不同,加上原始视频文件名和层级深度,有时总路径刚好在限制内(成功),有时超出(失败)。
  2. 视频文件名长度:您使用的文件名过长(4 - Import Grass Foliage to Unreal Engine 5 and create Subsurface Material.mp4,约80字符),且放在多级目录中(F:/#0Daydown/025/video/4 - Creating realistic Grass Foliage/),本身已接近边界,因此偶尔因UUID或缓存文件名的微小差异而触发错误。

之前成功只是因为运气好,UUID较短或临时文件名恰好未触顶。建议将视频重命名为简短名称(如 grass.mp4)并放到浅层目录(如 D:\videos\),即可稳定复现不再报错。

实际模拟一下,将字幕内容和recharge-llm.txt的提示词一并发送给Deepseek的时候,最终合并发送内容的具体形势,比如有以下字幕:
1
00:00:00,000 --> 00:00:01,760
好,欢迎回来。

2
00:00:01,760 --> 00:00:06,200
这一节我们开始制作一些植被。

3
00:00:06,200 --> 00:00:08,880
我来给大家演示一种方法。

4
00:00:08,880 --> 00:00:14,310
如何制作植被?我们会以草为例,使用SpeedTree。

5
00:00:14,310 --> 00:00:19,280
你可能会问,为什么用SpeedTree?为什么不用Blender?

软件会将 prompts/recharge/recharge-llm.txt 中的系统提示词与字幕文本拼接后,作为一条完整消息发送给 DeepSeek。由于知识库未收录该提示词的具体内容,我无法给出精确模拟。但您可以直接打开软件安装目录下的 prompts/recharge/recharge-llm.txt 文件查看完整原文。通常其结构如下(示例性说明):系统提示词(要求AI对字幕进行重新断句、修正错别字等)+ 用户提供的字幕文本(按行排列,包含序号和时间轴)。发送格式一般为:系统消息(System Message)包含提示词,用户消息(User Message)包含字幕内容。实际操作时会将字幕按“每批字幕行数”(高级选项中设置)分组发送。

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update. Dondate: ko-fi.com/jianchang512

Related resource