#3273 请求支持luxtts

185.238* Posted at: 8 hours ago 👁10

luxtts速度很快在我的2080ti上短句秒出,需求显存也很低号称低至1G,效果也不错,但是它的Api调用方法我就不是很懂怎么用了,我黏贴一份它的API文档

Choose one of the following ways to interact with the API.

  1. Install the python client (docs) if you don't already have it installed.

copy
$ pip install gradio_client

  1. Find the API endpoint below corresponding to your desired function in the app. Copy the code snippet, replacing the placeholder values with your own input data. Or use the
    API Recorder

    to automatically generate your API requests.

API name: /infer Total requests: 11 (100% successful) | p50/p90/p99: 1.59 s / 1.71 s / 5.22 s
copy
from gradio_client import Client, handle_file

client = Client("http://127.0.0.1:7860/")
result = client.predict(
        text="你好,这是一个语音克隆的示例效果。",
        audio_prompt=handle_file('https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav'),
        rms=0.01,
        ref_duration=5,
        t_shift=0.9,
        num_steps=4,
        speed=0.8,
        return_smooth=False,
        api_name="/infer"
)
print(result)

Accepts 8 parameters:
text str Default: "你好,这是一个语音克隆的示例效果。"

The input value that is provided in the "待合成文本" Textbox component.

audio_prompt filepath Required

The input value that is provided in the "参考音频(WAV 格式)" Audio component. The FileData class is a subclass of the GradioModel class that represents a file object within a Gradio interface. It is used to store file data and metadata when a file is uploaded. Attributes: path: The server file path where the file is stored. url: The normalized server URL pointing to the file. size: The size of the file in bytes. orig_name: The original filename before upload. mime_type: The MIME type of the file. is_stream: Indicates whether the file is a stream. meta: Additional metadata used internally (should not be changed).

rms float Default: 0.01

The input value that is provided in the "RMS 音量(响度)" Number component.

ref_duration float Default: 5

The input value that is provided in the "参考音频时长(秒)" Number component.

t_shift float Default: 0.9

The input value that is provided in the "T-Shift(音色偏移)" Number component.

num_steps float Default: 4

The input value that is provided in the "采样步数(Steps)" Slider component.

speed float Default: 0.8

The input value that is provided in the "语速(越小越慢 / 越清晰)" Slider component.

return_smooth bool Default: False

The input value that is provided in the "启用平滑输出" Checkbox component.

Returns tuple of 2 elements
[0] filepath

The output value that appears in the "生成结果" Audio component.

[1] str

The output value that appears in the "value_21" Markdown component.

1 Replies

根据提供的API文档,luxtts是一个基于Gradio Client API的本地TTS服务。要在pyVideoTrans中使用它,你需要将其配置为“自定义TTS API”渠道。

以下是配置步骤:

  1. 部署luxtts服务:确保你已经按照luxtts项目的要求,在本地(通常是http://127.0.0.1:7860)成功启动了它的WebUI服务。终端窗口需要保持运行。
  2. 在pyVideoTrans中配置

    • 打开pyVideoTrans,进入“配音渠道”设置。
    • 在渠道列表中选择“自定义TTS API”。
    • 在“API地址”栏填写你的luxtts服务地址,例如:http://127.0.0.1:7860
    • 在“API模型名称”栏,根据API文档,填写其推理接口的名称:/infer
    • 关键:构造请求体。根据API文档,请求需要多个参数。你需要在“请求数据”框中构造一个JSON格式的请求体。参考文档,一个基础的请求体结构如下(你需要根据实际需求调整参数值):

      {
        "text": "{text}",
        "audio_prompt": "{audio_file}",
        "rms": 0.01,
        "ref_duration": 5,
        "t_shift": 0.9,
        "num_steps": 4,
        "speed": 0.8,
        "return_smooth": false
      }

      注意

      • {text}{audio_file} 是pyVideoTrans会自动替换的占位符,分别代表要合成的文本和用于克隆的参考音频文件路径。请务必保留这两个占位符的格式
      • 其他参数(如rms, speed等)你可以使用默认值或根据luxtts文档调整。
    • 保存配置。
  3. 使用方式

    • 在视频翻译或单独配音任务中,选择“配音渠道”为你刚配置的“自定义TTS API”条目。
    • 如果用于声音克隆,在“配音角色”处选择“clone”,并确保原视频或你指定的参考音频可用于提取音色。
    • 执行任务,软件会将每一句文本和对应的参考音频文件路径,按照你定义的JSON格式,发送到你配置的http://127.0.0.1:7860/infer 接口。

重要提示:自定义API的配置需要你对API的输入输出格式有准确理解。请确保luxtts服务返回的音频文件路径能被pyVideoTrans正确读取。如果遇到问题,请检查pyVideoTrans的logs目录下的日志文件,查看具体的请求和错误信息。

请查阅相关文档:

Post Your Reply
Open source and free maintenance is not easy. If this project is helpful to you, please consider making a small donation to help the project continue to maintain and update.

Related resource