Text to Speech

This innovative model has the potential to harness the capabilities of state-of-the-art technology, enabling the creation of lifelike, enthralling speech across a diverse array of languages. The more detail about this feature could be found on this Text To Speech.

Generate Speech

POST /getAudio

Headers

Name
Value

Content-Type

application/json

Authorization

Bearer <token>

Body

Name
Type
Description

app_id

string

-> Each model is uniquely characterized by its own app_id.

prompt

string

-> The prompt parameter is the textual input that guides the audio generation process. This prompt serves as an artistic compass, shaping the audio output.

-> The minimum length of the prompt is 10 characters, and the maximum length is 2500 characters.

generate_audio

bool

-> Enable the generate_audio parameter to generate audio output in the form of speech.

audio_parameters

dict/map

-> The audio_parameters parameter is a dictionary that allows you to specify various audio-related settings. Here's an example of the structure:

"audio_parameters": {
  "voice_id": "21m00Tcm4TlvDq8ikWAM",
  "stability": 0.5,
  "similarity_boost": 0.75,
  "style": null,
  "use_speaker_boost": true
}

You can customize the values within this dictionary to adjust the audio generation according to your preferences.

celery

bool

-> The celery parameter is used for queuing tasks that require extended processing time. When you enqueue a task, you receive a unique task_id. This task_id allows you to check the task's status later using the task status API, which is useful for managing and tracking long-running tasks.

The audio_parameters dictionary contains the following parameters:

Name
Type
Description

voice_id

string

-> The voice_id parameter specifies a unique identifier for the voice to be used in the audio generation process. Some supported voice_id and their Attributes are-

  • VoiceID: EXAVITQu4vr4xnSDxMaL Name: Sarah Attributes: american, professional, young, female, en, entertainment_tv

  • VoiceID: N2lVS1w4EtoT3dr4eOWO Name: Callum Attributes: en, middle_aged, male, characters

  • VoiceID: JBFqnCBsd6RMkjVDRZzb Name: George Attributes: british, mature, middle_aged, male, en, narrative_story

  • VoiceID: pqHfZKP75CvOlQylNhV4 Name: Bill Attributes: american, crisp, old, male, en, advertisement

  • VoiceID: NFG5qt843uXKj4pFvR7C Name: Adam Stone - late night radio Attributes: british, meditative, middle_aged, male, en, narrative_story

  • VoiceID: XrExE9yKIg1WjnnlVkGX Name: Matilda Attributes: american, upbeat, middle_aged, female, en, informative_educational

stability

float

-> The stability parameter controls the stability of the generated audio. Higher values (up to 1) result in more stable output, while lower values can lead to more variable output.

similarity_boost

float

-> The similarity_boost parameter adjusts the similarity boost applied to the generated audio. Higher values (up to 1) result in output that is more similar to the target, while lower values can lead to more variation.

style

float

-> The style parameter allows you to control the style of the generated speech. Higher values (up to 1) can result in more exaggerated or closely following the given voice style, but may also lead to increased instability in the generated speech.

-> Setting this parameter to 0.0 (the default) will greatly increase the generation speed.

use_speaker_boost

bool

-> When enabled, the use_speaker_boost parameter will boost the similarity of the synthesized speech to the selected voice, at the cost of some generation speed. This option makes the model try to generate speech that more closely aligns with the chosen voice.

Response

{
  "time_required": "",
  "error": "",
  "error_data": "",
  "input": "",
  "output": "",
  "app_id": "",
  "task_id": "",
  "status": ""
}

Run the API

To test this API, please use the following link:

Last updated

Was this helpful?