Voices, Accents & Style

How to select voices, configure language and accent, and write style instructions in Qolaba's Speech Generation workspace.

Before generating audio, configure how your output should sound — which voice delivers it, what accent and dialect it uses, and what tone and style it carries. These three settings work together to define the personality, clarity, and emotional quality of your generated audio.


Voice Library

Qolaba provides a library of 30+ voice profiles, each with distinct characteristics in tone, pitch, energy, and speaking style. Selecting the right voice is the single most impactful decision in your configuration — it defines how your audience experiences the content.

Browsing and Previewing Voices

Click any voice in the library to hear a preview before selecting it. This lets you evaluate tone and style before committing to a generation.

Voice Categories

Voices are organized by characteristic style to help you find the right fit quickly:

Category
Characteristic

Bright

Clear, positive, and energetic

Upbeat

Enthusiastic and engaging

Informative

Measured, authoritative, and clear

Firm

Confident and direct

Excitable

High energy and expressive

Youthful

Fresh, casual, and approachable

Clear

Neutral and precise

Smooth

Warm and fluid delivery

Soft

Gentle and calm

Gravelly

Deep and textured

Filtering Voices

Use the search and filter options to narrow down the library by:

  • Gender — male or female voices

  • Tone category — filter by characteristic style (Bright, Smooth, Firm, etc.)

Match the voice category to the content type — an Informative voice works well for product walkthroughs and instructional content, while an Upbeat or Excitable voice suits marketing and promotional audio.


Language & Accent

  1. Multi-Language Support

The output language of your generated audio is determined entirely by the language of your script. Write your script in any language — English, Hindi, French, Arabic, Mandarin, or any other supported language — and the audio will be generated in that language automatically. There is no separate language setting to configure.

This makes Speech Generation natively multilingual — switch languages simply by changing the language of your input text.

  1. Accent & Dialect Selection

Many languages have multiple regional dialects with distinct pronunciation patterns. Accent selection lets you specify which regional variant the voice should follow — improving clarity, naturalness, and audience relatability for region-specific content.

Examples of available accents:

Language
Available Dialects

English

United States, United Kingdom, India, Australia

French

France, Canada

Arabic

Egypt, Global

Mandarin

China, Taiwan

Hindi

India

Spanish

Spain, Latin America

If your script is in a language with multiple regional dialects and your audience is in a specific region, selecting the matching accent improves pronunciation accuracy and makes the audio feel more natural to that audience.


Style Instructions

Style instructions let you guide the emotional tone and delivery manner of the generated audio — going beyond voice selection to define how the voice speaks, not just which voice speaks.

How to Write Style Instructions

Enter a brief, plain-language description of the desired delivery in the Style Prompt field. The model interprets this and adjusts its delivery accordingly.

Examples:

Intent
Style Instruction

Warm and personable

"Speak warmly and conversationally, like talking to a friend"

Professional narration

"Clear, professional, and authoritative tone"

Energetic marketing

"Enthusiastic and high-energy delivery"

Calm instructional

"Calm, slow-paced, and easy to follow"

Storytelling

"Engaging narrative style, with natural pauses and expression"

Combining Voice and Style

Voice selection and style instructions work best in combination. A Smooth voice with a "warm and conversational" style instruction produces a noticeably different output than the same voice with a "professional and authoritative" instruction.

If your first generation doesn't match the intended tone, refine the style instruction before switching voices. Often a more specific style prompt produces better results than changing the voice entirely.

Last updated