# Speech Models

Qolaba's Speech Generation is powered by **Google's Gemini TTS** — two models offering different trade-offs between speed, quality, and credit cost. Both models support single-speaker narration and multi-speaker dialogue generation with customizable voice, accent, and delivery style.

***

### Available Models

| Model                | Speed           | Quality                              | Best For                                                                 |
| -------------------- | --------------- | ------------------------------------ | ------------------------------------------------------------------------ |
| **Gemini Flash TTS** | Fast            | Good                                 | Script drafting, quick iterations, testing voice and accent combinations |
| **Gemini Pro TTS** ⭐ | Slightly slower | Higher — more expressive and natural | Final production output, client-ready audio, publishing                  |

{% hint style="info" %}
Test your script, voice selection, and accent configuration with Flash TTS first. Switch to Pro TTS for the final generation only. This approach saves credits without compromising final output quality.
{% endhint %}

***

### Model Capabilities

Both models share the same core capabilities:

| Capability                     | Flash TTS        | Pro TTS          |
| ------------------------------ | ---------------- | ---------------- |
| **Single Speaker Mode**        | ✓                | ✓                |
| **Multi-Speaker Mode**         | ✓                | ✓                |
| **Voice library**              | 30+ voices       | 30+ voices       |
| **Accent & dialect selection** | ✓                | ✓                |
| **Style instructions**         | ✓                | ✓                |
| **Multi-language support**     | ✓                | ✓                |
| **Max script length**          | 5,000 characters | 5,000 characters |

***

### Voice Library

Both models provide access to a library of 30+ distinct voice profiles. Each voice has a unique combination of tone, pitch, energy, and speaking style. Click any voice in the interface to hear a preview before selecting.

**Available Voices**

| Voice             | Character                      |
| ----------------- | ------------------------------ |
| **Zephyr**        | Bright, clear, and energetic   |
| **Puck**          | Upbeat and playful             |
| **Charon**        | Informative and measured       |
| **Kore**          | Firm and confident             |
| **Fenrir**        | Excitable and expressive       |
| **Aoede**         | Smooth and warm                |
| **Leda**          | Youthful and approachable      |
| **Orus**          | Clear and neutral              |
| **Schedar**       | Gravelly and deep              |
| **Gacrux**        | Soft and gentle                |
| **Pulcherrima**   | Calm and composed              |
| **Achird**        | Conversational and natural     |
| **Zubenelgenubi** | Professional and authoritative |
| **Vindemiatrix**  | Storytelling and expressive    |
| **Sadachbia**     | Warm and personable            |
| **Sadaltager**    | Crisp and articulate           |
| **Sulafat**       | Rich and resonant              |

{% hint style="info" %}
The full voice library is accessible directly in the [Speech Generation workspace](https://www.qolaba.ai/ai-speech-generator/text-to-speech). Click any voice name to preview it before selecting.
{% endhint %}

***

### Accent & Dialect Support

The output language is determined automatically by the language of your input script — write in any language and the audio is generated in that language. Accent selection refines pronunciation for languages with multiple regional dialects.

#### **Available Accents by Language**

| Language       | Available Dialects                              |
| -------------- | ----------------------------------------------- |
| **English**    | United States, United Kingdom, India, Australia |
| **French**     | France, Canada                                  |
| **Spanish**    | Spain, Latin America                            |
| **Arabic**     | Egypt, Global                                   |
| **Mandarin**   | China, Taiwan                                   |
| **Hindi**      | India                                           |
| **Portuguese** | Brazil, Portugal                                |
| **German**     | Germany, Austria                                |
| **Japanese**   | Japan                                           |
| **Korean**     | Korea                                           |

{% hint style="info" %}
The output language always follows the language of your input text. Accent selection narrows the regional dialect within that language — it does not override the language itself.
{% endhint %}

***

### Style Instructions

Both models support style instructions — a plain-language description of the desired delivery tone and manner entered in the **Style Prompt** field.

**Examples:**

<table><thead><tr><th width="273.52734375">Intent</th><th>Style Instruction</th></tr></thead><tbody><tr><td>Warm and conversational</td><td><em>"Speak warmly and conversationally, like talking to a friend"</em></td></tr><tr><td>Professional narration</td><td><em>"Clear, professional, and authoritative tone"</em></td></tr><tr><td>Energetic marketing</td><td><em>"Enthusiastic and high-energy delivery"</em></td></tr><tr><td>Calm instructional</td><td><em>"Calm, slow-paced, and easy to follow"</em></td></tr><tr><td>Storytelling</td><td><em>"Engaging narrative style with natural pauses and expression"</em></td></tr></tbody></table>

***

### Flash TTS vs. Pro TTS — When to Use Each

#### **Use Flash TTS when:**

* Testing a new script for the first time
* Validating voice, accent, and style combinations before final generation
* Producing audio for internal use, drafts, or non-published content
* Working at high volume where credit efficiency matters

#### **Use Pro TTS when:**

* Generating final production audio for publishing
* Delivering client-ready voiceovers, ads, or podcast content
* The naturalness and expressiveness of the voice matters for the audience
* Multi-speaker dialogue needs to sound as realistic as possible


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.qolaba.ai/model-reference/speech-models.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.