Configuring Voice

Voice providers

Sonara supports multiple text-to-speech (TTS) providers, each offering different voice styles, languages, and quality levels. Choose the provider and voice that best fits your agent’s personality and use case.

Available providers

Provider	Highlights
Cartesia	High-quality, natural-sounding voices. Great for conversational agents.
ElevenLabs	Premium voice cloning and ultra-realistic speech.
OpenAI	Solid general-purpose voices from OpenAI’s TTS models.
Google	Wide language support with region-specific accents.
Sarvam	Specialized for Indian languages and accents.

Selecting a voice

Open voice settings

Navigate to your agent’s configuration and click the Voice tab.

Choose a provider

Select a TTS provider from the dropdown. Each provider offers different voices.

Browse and preview voices

Browse the available voices for your selected provider. Each voice shows:

Name
Gender
Accent/language
Preview button to hear a sample

Click the play button to preview a voice before selecting it.

Adjust speed

Use the speed slider to adjust how fast the agent speaks. The default is 1.0x. Lower values slow the speech down; higher values speed it up.

Language support

Sonara supports 60+ languages with automatic language detection for multi-language agents.

Single language mode

Select one language for your agent. The agent will only speak and understand this language.

Multi-language mode

Enable multi-language support to let your agent automatically detect and respond in the caller’s language. You can configure:

Supported languages - Which languages the agent can use

Fallback language - The default language if detection is uncertain

Not all voice-language combinations are supported. If your selected voice doesn’t support a language, Sonara will automatically suggest a compatible alternative.

Speech-to-text (STT)

The STT provider determines how your agent transcribes what callers say. Configure this in the voice settings:

Provider - Select an STT provider (Deepgram is the default)

Model - Choose the transcription model (e.g., nova-3 for highest accuracy)

Deepgram’s nova-3 model offers the best balance of speed and accuracy for real-time voice conversations.

​Voice providers

​Available providers

​Selecting a voice

​Language support

​Single language mode

​Multi-language mode

​Speech-to-text (STT)

Voice providers

Available providers

Selecting a voice

Language support

Single language mode

Multi-language mode

Speech-to-text (STT)