Text to Speech

This innovative model has the potential to harness the capabilities of state-of-the-art technology, enabling the creation of lifelike, enthralling speech across a diverse array of languages. The more detail about this feature could be found on this Text To Speech.

Generate Speech

POST /getAudio

Headers

NameValue

Content-Type

application/json

Authorization

Bearer <token>

Body

NameTypeDescription

app_id

string

-> Each model is uniquely characterized by its own app_id.

prompt

string

-> The prompt parameter is the textual input that guides the audio generation process. This prompt serves as an artistic compass, shaping the audio output.

-> The minimum length of the prompt is 10 characters, and the maximum length is 2500 characters.

generate_audio

bool

-> Enable the generate_audio parameter to generate audio output in the form of speech.

audio_parameters

dict/map

-> The audio_parameters parameter is a dictionary that allows you to specify various audio-related settings. Here's an example of the structure:

"audio_parameters": {
  "voice_id": "21m00Tcm4TlvDq8ikWAM",
  "stability": 0.5,
  "similarity_boost": 0.75,
  "style": null,
  "use_speaker_boost": true
}

You can customize the values within this dictionary to adjust the audio generation according to your preferences.

celery

bool

-> The celery parameter is used for queuing tasks that require extended processing time. When you enqueue a task, you receive a unique task_id. This task_id allows you to check the task's status later using the task status API, which is useful for managing and tracking long-running tasks.

The audio_parameters dictionary contains the following parameters:

NameTypeDescription

voice_id

string

-> The voice_id parameter specifies a unique identifier for the voice to be used in the audio generation process.

stability

float

-> The stability parameter controls the stability of the generated audio. Higher values (up to 1) result in more stable output, while lower values can lead to more variable output.

similarity_boost

float

-> The similarity_boost parameter adjusts the similarity boost applied to the generated audio. Higher values (up to 1) result in output that is more similar to the target, while lower values can lead to more variation.

style

float

-> The style parameter allows you to control the style of the generated speech. Higher values (up to 1) can result in more exaggerated or closely following the given voice style, but may also lead to increased instability in the generated speech.

-> Setting this parameter to 0.0 (the default) will greatly increase the generation speed.

use_speaker_boost

bool

-> When enabled, the use_speaker_boost parameter will boost the similarity of the synthesized speech to the selected voice, at the cost of some generation speed. This option makes the model try to generate speech that more closely aligns with the chosen voice.

Response

{
  "time_required": "",
  "error": "",
  "error_data": "",
  "input": "",
  "output": "",
  "app_id": "",
  "task_id": "",
  "status": ""
}

Run the API

To test this API, please use the following link:

Last updated