Text to Speech
Last updated
Last updated
This innovative model has the potential to harness the capabilities of state-of-the-art technology, enabling the creation of lifelike, enthralling speech across a diverse array of languages. The more detail about this feature could be found on this Text To Speech.
POST
/getAudio
Headers
Name | Value |
---|---|
Body
Name | Type | Description |
---|---|---|
The audio_parameters
dictionary contains the following parameters:
Name | Type | Description |
---|---|---|
Response
To test this API, please use the following link:
Content-Type
application/json
Authorization
Bearer <token>
app_id
string
-> Each model is uniquely characterized by its own app_id
.
prompt
string
-> The prompt
parameter is the textual input that guides the audio generation process. This prompt serves as an artistic compass, shaping the audio output.
-> The minimum length of the prompt is 10 characters, and the maximum length is 2500 characters.
generate_audio
bool
-> Enable the generate_audio
parameter to generate audio output in the form of speech.
audio_parameters
dict/map
-> The audio_parameters
parameter is a dictionary that allows you to specify various audio-related settings. Here's an example of the structure:
You can customize the values within this dictionary to adjust the audio generation according to your preferences.
celery
bool
-> The celery
parameter is used for queuing tasks that require extended processing time. When you enqueue a task, you receive a unique task_id
. This task_id
allows you to check the task's status later using the task status API, which is useful for managing and tracking long-running tasks.
voice_id
string
-> The voice_id
parameter specifies a unique identifier for the voice to be used in the audio generation process.
stability
float
-> The stability
parameter controls the stability of the generated audio. Higher values (up to 1) result in more stable output, while lower values can lead to more variable output.
similarity_boost
float
-> The similarity_boost
parameter adjusts the similarity boost applied to the generated audio. Higher values (up to 1) result in output that is more similar to the target, while lower values can lead to more variation.
style
float
-> The style
parameter allows you to control the style of the generated speech. Higher values (up to 1) can result in more exaggerated or closely following the given voice style, but may also lead to increased instability in the generated speech.
-> Setting this parameter to 0.0 (the default) will greatly increase the generation speed.
use_speaker_boost
bool
-> When enabled, the use_speaker_boost
parameter will boost the similarity of the synthesized speech to the selected voice, at the cost of some generation speed. This option makes the model try to generate speech that more closely aligns with the chosen voice.