Skip to main content

Voice providers

Sonara supports multiple text-to-speech (TTS) providers, each offering different voice styles, languages, and quality levels. Choose the provider and voice that best fits your agent’s personality and use case.

Available providers

ProviderHighlights
CartesiaHigh-quality, natural-sounding voices. Great for conversational agents.
ElevenLabsPremium voice cloning and ultra-realistic speech.
OpenAISolid general-purpose voices from OpenAI’s TTS models.
GoogleWide language support with region-specific accents.
SarvamSpecialized for Indian languages and accents.

Selecting a voice

1

Open voice settings

Navigate to your agent’s configuration and click the Voice tab.
2

Choose a provider

Select a TTS provider from the dropdown. Each provider offers different voices.
3

Browse and preview voices

Browse the available voices for your selected provider. Each voice shows:
  • Name
  • Gender
  • Accent/language
  • Preview button to hear a sample
Click the play button to preview a voice before selecting it.
4

Adjust speed

Use the speed slider to adjust how fast the agent speaks. The default is 1.0x. Lower values slow the speech down; higher values speed it up.

Language support

Sonara supports 60+ languages with automatic language detection for multi-language agents.

Single language mode

Select one language for your agent. The agent will only speak and understand this language.

Multi-language mode

Enable multi-language support to let your agent automatically detect and respond in the caller’s language. You can configure:
  • Supported languages - Which languages the agent can use
  • Fallback language - The default language if detection is uncertain
Not all voice-language combinations are supported. If your selected voice doesn’t support a language, Sonara will automatically suggest a compatible alternative.

Speech-to-text (STT)

The STT provider determines how your agent transcribes what callers say. Configure this in the voice settings:
  • Provider - Select an STT provider (Deepgram is the default)
  • Model - Choose the transcription model (e.g., nova-3 for highest accuracy)
Deepgram’s nova-3 model offers the best balance of speed and accuracy for real-time voice conversations.