# Text to Speech

Converts text into lifelike audio using Google's Gemini TTS model. Supports multilingual synthesis, a wide selection of distinct voices, and optional style prompting to control tone and delivery.

## Generate Speech

**POST** `/api/v1/studio/synthesizeSpeech`

**Headers**

| Name          | Value              |
| ------------- | ------------------ |
| Content-Type  | `application/json` |
| Authorization | `Bearer <token>`   |

**Body**<br>

| Parameter       | Type     | Required | Description                                                                                                                                                                                                                                   |
| --------------- | -------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `text`          | `string` | **Yes**  | The input text to be converted into speech. Must be a non-empty string. This is the raw content that will be spoken aloud in the generated audio.                                                                                             |
| `voice`         | `string` | **Yes**  | The name of the voice persona to use for synthesis. Each voice has a distinct character and tone. Must be one of the supported voices (see `GET /api/v1/studio/speech/voices`). Examples: `"Zephyr"`, `"Puck"`, `"Aoede"`, `"Kore"`.          |
| `model`         | `string` | No       | The TTS model to use for generation. Defaults to `"gemini-2.5-flash-tts"`                                                                                                                                                                     |
| `language_code` | `string` | No       | BCP-47 language code to control the language of the synthesized speech (e.g., `"en-US"`, `"fr-FR"`, `"hi-IN"`). If omitted, the model infers the language from the input text. See `GET /api/v1/studio/speech/languages` for supported codes. |
| `style_prompt`  | `string` | No       | A natural-language instruction that shapes the speaking style, tone, or emotion of the output. Examples: `"Speak in a calm, soothing tone"`, `"Sound excited and energetic"`, `"Read like a news anchor"`.                                    |

**Response**

{% tabs %}
{% tab title="200" %}

```json
{
  "url": "",
  "mime_type": "",
  "referenceID": ""
}
```

{% endtab %}

{% tab title="403" %}

```json
{
  "message": "Forbidden: Insufficient credit"
}
```

{% endtab %}

{% tab title="500" %}

```json
{
  "message": "Internal Server Error"
}
```

{% endtab %}

{% tab title="401" %}

```json
{
  "message": "Missing authorization token"
}
```

{% endtab %}

{% tab title="400" %}

```json
{
  "message": "text is required and cannot be empty."
}
```

{% endtab %}
{% endtabs %}

| Field         | Type     | Description                                                                           |
| ------------- | -------- | ------------------------------------------------------------------------------------- |
| `url`         | `string` | A publicly accessible URL to the generated audio file.                                |
| `mime_type`   | `string` | The MIME type of the audio file (e.g., `"audio/mpeg"`, `"audio/wav"`).                |
| `referenceID` | `string` | A unique identifier for this synthesis request, used for tracking and audit purposes. |

***

#### Error Responses

| Status | Condition                                      |
| ------ | ---------------------------------------------- |
| `400`  | `text` is missing or empty                     |
| `400`  | `voice` is missing                             |
| `400`  | `voice` is not in the list of supported voices |
| `500`  | Internal server error                          |

***

#### Related Endpoints

* `GET /api/v1/studio/speech/voices` — Returns all supported voice names
* `GET /api/v1/studio/speech/languages` — Returns all supported language codes

## Run the API

To test this API, please use the following link:

{% embed url="<https://app.theneo.io/api-runner/qolaba/ml-apis/api-reference/text-to-speech-copy-1>" %}
