# Introduction

Qolaba's Speech Generation tool converts written text into high-quality, natural-sounding audio using Gemini-powered text-to-speech models. Whether you need a single narrator for a voiceover or a two-speaker dialogue for a podcast, Speech Generation gives you full control over voice, accent, language, tone, and delivery style — all from a single interface.

***

#### What You Can Do

* Generate natural-sounding audio from any written script
* Choose from a library of 30+ voice profiles with distinct tones and styles
* Write your script in any language — audio is generated in that language automatically
* Refine pronunciation with accent and dialect selection
* Guide delivery style with custom style instructions
* Produce single-speaker narration or two-speaker dialogue audio
* Download, share, and manage all generated audio from one place

***

#### Core Use Cases

* Podcast narration and introductions
* YouTube and video voiceovers
* Product walkthroughs and demos
* Marketing advertisements and announcements
* Audiobook-style storytelling
* Training scripts and instructional audio
* Conversational simulations and interview-format dialogue

***

#### How to Access

1. Go to the **left navigation panel**
2. Click **Audio & Video**
3. Select [**Text-to-Speech**](https://www.qolaba.ai/ai-speech-generator/text-to-speech)

This opens the dedicated Speech Generation workspace.

***

#### Interface Overview

The Speech Generation workspace is organized into two areas:

1. **Audio History Panel** A persistent panel displaying all previously generated audio files. Each entry includes playback controls and a three-dot menu for download, share, and delete actions. All generated audio is saved here automatically.
2. **Configuration & Generation Area** The primary workspace where you configure and generate audio. This is where you select your mode, choose voices, set accent and style, write your script, select a model, and generate output.

***

#### Available Models

Speech Generation is powered by Gemini and offers two models:

| Model         | Speed           | Quality                              | Credit Cost | Best For                                    |
| ------------- | --------------- | ------------------------------------ | ----------- | ------------------------------------------- |
| **Flash TTS** | Faster          | Good                                 | Lower       | Script drafting, quick iterations, testing  |
| **Pro TTS**   | Slightly slower | Higher — more expressive and natural | Higher      | Final production output, client-ready audio |

Test your script with Flash TTS first to validate voice, accent, and style choices. Switch to Pro TTS for the final generation. This approach saves credits while ensuring production-quality output.

***

#### What's in This Section

1. [Voices, Accents & Style →](/speech-generation/voices-accents-and-style.md)
2. Speech Generation Modes →

* [Single Speaker Mode →](/speech-generation/speech-generation-modes/single-speaker-mode.md)
* [Multi-Speaker Mode →](/speech-generation/speech-generation-modes/multi-speaker-mode.md)

3. [Managing Generated Audio →](/speech-generation/managing-generated-audios.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.qolaba.ai/speech-generation/introduction.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
