Single Speaker Mode

Single Speaker Mode is used when a single voice narrates the entire script. This mode is ideal for announcements, narrations, advertisements, and monologue-style content.


4.1 Voice Selection

Users can choose from a curated library of voice profiles. Each voice has distinct characteristics in terms of:

  • Tone

  • Pitch

  • Energy level

  • Speaking style

Voice categories may include styles such as:BrightUpbeatInformativeFirmExcitableYouthfulClearSmoothSoftGravellyA search feature is available to quickly filter voices by attributes such as gender or tone.Selecting the correct voice is essential for aligning audio output with the intended audience and message.


4.2 Language and Accent Configuration

The output language is automatically determined by the language of the input text.For example:

  • If the script is written in Hindi, the output audio will be in Hindi.

  • If written in English, the output will be in English.

Accent selection is available to refine pronunciation in languages with multiple dialects.Examples include:

  • Arabic (Egypt, Global)

  • Mandarin (China, Taiwan)

  • Indian regional accents (Hindi, Gujarati, Kannada, Punjabi, Sindhi, etc.)

Accent selection enhances clarity and naturalness in region-specific communication.


4.3 Style Instructions

The Style Prompt allows users to guide how the speech should be delivered.Examples of style instructions include:

  • Speak warmly and enthusiastically

  • Calm and slow-paced narration

  • Professional and authoritative tone

  • Energetic and engaging delivery

This field enables emotional and tonal customization beyond voice selection.


4.4 Script Input

Users enter or paste their content into the script input field.Single Speaker Mode supports:

  • Long-form narration

  • Promotional scripts

  • Announcements

  • Informational content

Maximum character limit: 4,000 characters.Well-structured punctuation improves pacing and realism in output.


4.5 Model Selection

Speech Generator offers two Gemini-powered models:Flash Text-to-SpeechDesigned for faster generation and lower credit usage. Suitable for drafts and quick iterations.Pro Text-to-SpeechOptimized for higher quality, more expressive, and natural-sounding output. Ideal for final production use.Model selection affects both audio realism and credit consumption.


4.6 Generating Audio

Once configuration is complete:

  1. Click Generate.

  2. The system processes the script.

  3. The audio output appears below.

You can then:

  • Play the audio

  • Download it

  • Share it

The Reset option clears all selections and inputs.

Last updated