Overview
Speech settings control the low-level behavior of your agent’s voice interaction - how it detects when someone is speaking, handles interruptions, and manages conversation flow.
Voice Activity Detection (VAD)
VAD determines when the agent recognizes that a caller has started or stopped speaking. This affects how responsive your agent feels.
Adjusting VAD sensitivity:
- Higher sensitivity - The agent responds faster but may cut off callers who pause mid-sentence
- Lower sensitivity - The agent waits longer before responding, reducing interruptions but feeling slightly slower
Interruption handling
Configure how your agent responds when a caller speaks while the agent is talking:
- Allow interruptions - The agent stops speaking and listens when the caller talks over it. This feels natural for most conversations.
- Interruption sensitivity - Controls how easily the agent can be interrupted. Higher values mean even quiet sounds will interrupt the agent.
For agents that deliver important information (like legal disclaimers or appointment details), consider reducing interruption sensitivity so the agent can finish its message.
Backchannel sounds
Backchannel sounds are subtle audio cues (like “mm-hmm”, “uh-huh”) that the agent plays while the caller is speaking. These make the conversation feel more natural by signaling that the agent is listening.
- Enable/disable - Toggle backchannel sounds on or off
- Frequency - Control how often the agent produces these sounds
Conversation context
Controls how much conversation history the agent considers when generating responses:
- Full context - The agent remembers the entire conversation
- Limited context - The agent focuses on recent exchanges, reducing latency for long conversations
Function calling behavior
Configure how the agent handles tool/function calls during speech:
- Sequential - The agent pauses speaking while executing a function call
- Parallel - The agent continues speaking while functions execute in the background