#2400 Generating audio for text: '低下头,心想这下完了。...'inputs:(1, 0)inputs after padding:(1, 32000)Traceback (most recent call last)

2406:da18* Posted at: 4 hours ago 👁5

Generating audio for text: '低下头,心想这下完了。...'
inputs:(1, 0)
inputs after padding:(1, 32000)
Traceback (most recent call last):
File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/torchaudio/_torchcodec.py", line 128, in load_with_torchcodec

audio_samples = decoder.get_all_samples()
                ^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/torchcodec/decoders/_audio_decoder.py", line 108, in get_all_samples

return self.get_samples_played_in_range()
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/torchcodec/decoders/_audio_decoder.py", line 137, in get_samples_played_in_range

frames, first_pts = core.get_frames_by_pts_in_range_audio(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/torch/_ops.py", line
......
too low.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/voxcpm/zipenhancer.py", line 67, in enhance

self._normalize_loudness(output_path)

File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/voxcpm/zipenhancer.py", line 38, in _normalize_loudness

audio, sr = torchaudio.load(wav_path)
            ^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/torchaudio/__init__.py", line 86, in load

return load_with_torchcodec(
       ^^^^^^^^^^^^^^^^^^^^^

File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/torchaudio/_torchcodec.py", line 130, in load_with_torchcodec

raise RuntimeError(f"Failed to decode audio samples: {e}") from e

RuntimeError: Failed to decode audio samples: No audio frames were decoded. This is probably because start_seconds is too high(0),or because stop_seconds(std::nullopt) is too low.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/gradio/queueing.py", line 759, in process_events

response = await route_utils.call_process_api(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/gradio/route_utils.py", line 354, in call_process_api

output = await app.get_blocks().process_api(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/gradio/blocks.py", line 2191, in process_api

result = await self.call_function(
         ^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/gradio/blocks.py", line 1698, in call_function

prediction = await anyio.to_thread.run_sync(  # type: ignore
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/anyio/to_thread.py", line 61, in run_sync

return await get_async_backend().run_sync_in_worker_thread(
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2525, in run_sync_in_worker_thread

return await future
       ^^^^^^^^^^^^

File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 986, in run

result = context.run(func, *args)
         ^^^^^^^^^^^^^^^^^^^^^^^^

File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/gradio/utils.py", line 915, in wrapper

response = f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^

File "/Volumes/Ventoy/video/VoxCPM/app.py", line 102, in generate_tts_audio

wav = current_model.generate(
      ^^^^^^^^^^^^^^^^^^^^^^^

File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/voxcpm/core.py", line 130, in generate

return next(self._generate(*args, streaming=False, **kwargs))
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/voxcpm/core.py", line 196, in _generate

self.denoiser.enhance(prompt_wav_path, output_path=temp_prompt_wav_path)

File "/Volumes/Ventoy/opt/miniconda3/envs/voxcpm/lib/python3.11/site-packages/voxcpm/zipenhancer.py", line 76, in enhance

raise RuntimeError(f"Audio denoising processing failed: {e}")

RuntimeError: Audio denoising processing failed: Failed to decode audio samples: No audio frames were decoded. This is probably because start_seconds is too high(0),or because stop_seconds(std::nullopt) is too low.

1 Replies

这个错误表明您在使用 VoxCPM 进行语音克隆或合成时,音频降噪处理环节失败。核心问题是 torchaudio 无法解码音频文件,提示“没有音频帧被解码”,通常是因为音频文件本身有问题(如格式损坏、内容为空或过短)。

请按以下步骤排查:

  1. 检查参考音频文件:确保您用于语音克隆的参考音频文件是完整、清晰、包含有效人声的常见格式(如 WAV, MP3)。尝试用其他播放器打开该文件确认能正常播放。
  2. 检查音频文件路径:确认参考音频文件的路径没有中文字符、空格或特殊符号。最好将文件放在纯英文路径下(例如 D:/ref.wav)。
  3. 检查音频时长:参考音频不能过短(建议至少 3-5 秒),且不能是静音文件。
  4. 重新生成参考音频:如果可能,换一个不同的、质量更好的说话人音频片段作为参考。
  5. 检查 VoxCPM 环境:如果您是独立部署 VoxCPM,请确保其依赖(特别是 torchaudio)版本正确,并且有足够的权限读取音频文件。
  6. 在 pyVideoTrans 中尝试:如果您是在 pyVideoTrans 中使用 GPT-SoVITS 或 VoxCPM 功能,请确保在“配音角色”中正确选择了克隆后的角色,并且参考音频是通过软件流程正常生成的。

如果以上步骤无法解决,可能是 VoxCPM 模型或环境存在特定问题。请提供更完整的操作上下文(例如,是在 pyVideoTrans 中操作还是独立使用 VoxCPM),以便进一步分析。

请查看更多参考文档:

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update.

Donate: https://ko-fi.com/jianchang512

Trending Questions