#2761 用Fun-ASR-Nano-2512批量识别语音时卡住了,勾选了识别说话人与插入说话人选项,日志有一个循环提示,这是什么问题?Downloading Model from https://www.modelscope.cn to direc

240e:3b7* Posted at: 6 days ago 👁23

用Fun-ASR-Nano-2512批量识别语音时卡住了,勾选了识别说话人与插入说话人选项,日志有一个循环提示,这是什么问题?
Downloading Model from https://www.modelscope.cn to directory: C:/SOFT/pyvideotrans/models\models\iic\punc_ct-transformer_cn-en-common-vocab471067-large
WARNING:root:trust_remote_code: False
Building prefix dict from the default dictionary ...
DEBUG:jieba:Building prefix dict from the default dictionary ...
Dumping model to file cache C:\TEMP\jieba.cache
DEBUG:jieba:Dumping model to file cache C:\TEMP\jieba.cache
Loading model cost 0.822 seconds.
DEBUG:jieba:Loading model cost 0.822 seconds.
Prefix dict has been built successfully.
DEBUG:jieba:Prefix dict has been built successfully.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:151645 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
end_idx=1,nu=3749
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:151645 for open-end generation.
end_idx=2,nu=3749
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:151645 for open-end generation.
end_idx=3,nu=3749

1 Replies

根据日志,您遇到的问题是在使用 FunASR 进行批量语音识别并勾选“识别说话人”时,程序卡在下载或加载说话人识别(VAD/ASR)模型阶段。循环出现的 The attention mask and the pad token id were not set... 是模型加载时的警告信息,并非根本原因。

核心问题很可能是模型下载失败或网络连接超时,导致程序在反复尝试。请按以下步骤排查:

  1. 检查网络与代理

    • 确保您的网络可以正常访问 modelscope.cn(魔搭社区)。如果使用代理,请在软件主界面第5行的“网络代理”处正确填写(如 http://127.0.0.1:7860)。
    • 尝试暂时关闭代理或切换网络环境。
  2. 手动下载模型

    • 日志显示模型尝试下载到 C:/SOFT/pyvideotrans/models\models\iic\punc_ct-transformer_cn-en-common-vocab471067-large
    • 您可以尝试手动下载该模型:

      • 访问模型地址:https://www.modelscope.cn/models/iic/punc_ct-transformer_cn-en-common-vocab471067-large
      • 按照页面指引下载模型文件,并放置到上述日志指示的目录中(注意路径中的 \models\ 可能重复,实际路径应为 C:/SOFT/pyvideotrans/models/iic/punc_ct-transformer_cn-en-common-vocab471067-large)。
    • 放置后重启软件再试。
  3. 简化任务,关闭说话人识别

    • 这是最直接的解决方案。如果您不需要区分不同说话人,请取消勾选“识别说话人”选项,这可以避免加载额外的VAD/ASR模型,通常能立刻解决问题。
  4. 使用其他识别模式

    • 如果必须使用说话人识别,可以尝试将“识别模式”从“整体识别”切换到“批量推理”。后者对模型加载的依赖可能不同,有时能绕过此问题。

如果以上步骤均无效,请提供 videotrans/logs/ 目录下更完整的日志文件,以便进一步分析。

请查看更多参考文档:

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update.

Related resource