#4304 ValueError: The elements of the batch contain different keys. Cannot batch them ({'attention_mask', 'input_features', 'i

38.244* Posted at: 3 hours ago 👁12

Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 578, in pipe_asr
File "transformers\pipelines\pt_utils.py", line 126, in next

item = next(self.iterator)

File "transformers\pipelines\pt_utils.py", line 271, in next

processed = self.infer(next(self.iterator), **self.params)

File "torch\utils\data\dataloader.py", line 733, in next

data = self._next_data()

File "torch\utils\data\dataloader.py", line 789, in _next_data

data = self._dataset_fetcher.fetch(index)  # may raise StopIteration

File "torch\utils\data\_utils\fetch.py", line 43, in fetch

return self.collate_fn(data)

File "transformers\pipelines\base.py", line 182, in inner

raise ValueError(

ValueError: The elements of the batch contain different keys. Cannot batch them ({'attention_mask', 'input_features', 'is_last'} != {'attention_mask', 'num_frames', 'input_features', 'is_last'})

Traceback (most recent call last):
File "videotrans\task\_only_one.py", line 44, in run
File "videotrans\task\trans_create.py", line 380, in recogn
File "videotrans\recognition\__init__.py", line 280, in run
File "videotrans\recognition\_base.py", line 143, in run
File "videotrans\recognition\_huggingface.py", line 38, in _exec
File "videotrans\recognition\_huggingface.py", line 63, in _pipe_asr
File "videotrans\configure\_base.py", line 289, in _new_process
RuntimeError: Traceback (most recent call last):
File "videotrans\process\stt_fun.py", line 578, in pipe_asr
File "transformers\pipelines\pt_utils.py", line 126, in next

item = next(self.iterator)

File "transformers\pipelines\pt_utils.py", line 271, in next

processed = self.infer(next(self.iterator), **self.params)

File "torch\utils\data\dataloader.py", line 733, in next

data = self._next_data()

File "torch\utils\data\dataloader.py", line 789, in _next_data

data = self._dataset_fetcher.fetch(index)  # may raise StopIteration

File "torch\utils\data\_utils\fetch.py", line 43, in fetch

return self.collate_fn(data)

File "transformers\pipelines\base.py", line 182, in inner

raise ValueError(

ValueError: The elements of the batch contain different keys. Cannot batch them ({'attention_mask', 'input_features', 'is_last'} != {'attention_mask', 'num_frames', 'input_features', 'is_last'})


=
system:Windows-10-10.0.22631-SP0
version:v3.99
frozen:True
language:zh
root_dir:D:/win-pyvideotrans-v3.99-420

1 Replies

这个错误是由于 HuggingFace_ASR 语音识别渠道在处理音频时,数据批次(batch)中的元素键名不一致导致的。简单说,部分音频片段返回了 num_frames 字段,而其他片段没有,导致批处理失败。通常与音频长度差异过大、模型内部处理不一致或兼容性问题有关。

解决方案:

  1. 更换语音识别渠道:这是最直接的解决方法。推荐使用更稳定的 faster-whisper(本地) 并选择 large-v3large-v3-turbo 模型,这些模型很少出现此类内部批处理错误。
  2. 调整 HuggingFace 参数:如果坚持使用当前渠道,请尝试在 菜单-工具-高级选项-语音识别参数 中,勾选 Whisper预分割音频,并适当增大 最长语音持续秒数(如 10 秒)。这可以提前将音频切分成更均匀的片段,降低批处理报错概率。
  3. 检查模型完整性:手动检查 models 目录下对应的 HuggingFace 模型文件夹是否完整,特别是 .json 配置文件。若损坏或缺失,删除后让软件重新下载。
  4. 关闭 CUDA 加速测试:如果开启了 GPU 加速,暂时关闭它,看是否恢复正常。若关闭后正常,可能是 CUDA/cuDNN 版本与当前 transformers 库不兼容,建议更新显卡驱动或使用 CUDA 12.8。
  5. 升级软件:请确保使用的是最新版 v3.99。如果不是,下载最新完整包覆盖。

如果以上步骤无效,请提供完整日志文件进一步分析。

请查阅相关文档:

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update.

Related resource

Similar issues already exist