Introduction

An introduction to Speech Generation in Qolaba — what it does, how to access it, an overview of the interface, and the available models.

Qolaba's Speech Generation tool converts written text into high-quality, natural-sounding audio using Gemini-powered text-to-speech models. Whether you need a single narrator for a voiceover or a two-speaker dialogue for a podcast, Speech Generation gives you full control over voice, accent, language, tone, and delivery style — all from a single interface.


What You Can Do

  • Generate natural-sounding audio from any written script

  • Choose from a library of 30+ voice profiles with distinct tones and styles

  • Write your script in any language — audio is generated in that language automatically

  • Refine pronunciation with accent and dialect selection

  • Guide delivery style with custom style instructions

  • Produce single-speaker narration or two-speaker dialogue audio

  • Download, share, and manage all generated audio from one place


Core Use Cases

  • Podcast narration and introductions

  • YouTube and video voiceovers

  • Product walkthroughs and demos

  • Marketing advertisements and announcements

  • Audiobook-style storytelling

  • Training scripts and instructional audio

  • Conversational simulations and interview-format dialogue


How to Access

  1. Go to the left navigation panel

  2. Click Audio & Video

This opens the dedicated Speech Generation workspace.


Interface Overview

The Speech Generation workspace is organized into two areas:

  1. Audio History Panel A persistent panel displaying all previously generated audio files. Each entry includes playback controls and a three-dot menu for download, share, and delete actions. All generated audio is saved here automatically.

  2. Configuration & Generation Area The primary workspace where you configure and generate audio. This is where you select your mode, choose voices, set accent and style, write your script, select a model, and generate output.


Available Models

Speech Generation is powered by Gemini and offers two models:

Model
Speed
Quality
Credit Cost
Best For

Flash TTS

Faster

Good

Lower

Script drafting, quick iterations, testing

Pro TTS

Slightly slower

Higher — more expressive and natural

Higher

Final production output, client-ready audio

Test your script with Flash TTS first to validate voice, accent, and style choices. Switch to Pro TTS for the final generation. This approach saves credits while ensuring production-quality output.


What's in This Section

  1. Speech Generation Modes →

Last updated