Quick Start Guide
Ready to set up your first AI assistant? Here’s the essential flow:- Choose Call Direction: Inbound (answers calls) or Outbound (makes calls)
- Set Assistant Name: Internal label like “Support Bot” or “Sales Bot”
- Configure Phone Numbers: Assign platform numbers, SIP, or Caller ID
- Select Voice & Language: Choose from built-in voices or clone custom ones
- Adjust Advanced Settings: Fine-tune models, timing, and audio parameters
Always test your changes by calling the assistant or running a small campaign to confirm it behaves as expected.
Follow this page section by section to configure your assistant. Each setting includes detailed explanations and best practices to help you make the right choices.
Call Direction & Basic Setup
Assistant Type
Choose whether your assistant handles inbound or outbound calls. This fundamental choice affects which other options become available. Inbound (Receive calls): Handles incoming calls from customers. See Inbound calls overview. Outbound (Make calls): Initiates calls to leads or customers. See Outbound calls overview.Assistant Name
A descriptive name to identify your assistant in the dashboard. Use something memorable that describes the assistant’s purpose (e.g. “Sales Qualifier”, “Support Bot”, “Appointment Scheduler”).Phone Number Configuration
Your assistant needs a phone number to operate. The available options depend on your call direction choice.For Outbound Assistants
You can use:- Platform numbers: Numbers rented directly from our platform
- SIP numbers: Connect your existing VOIP/PBX system
- Caller ID only: Verify ownership of an existing number to display it on outbound calls
For Inbound Assistants
You can use:- Platform numbers: Numbers rented directly from our platform
- SIP numbers: Connect your existing VOIP/PBX system
Pricing & Costs
- Platform numbers: Monthly rental fees starting from $3.99/month. See renting a dedicated number for detailed pricing.
- SIP integration: No monthly fee, only $0.00045/min for AI bridging. See SIP integration pricing.
- Caller ID: No monthly fee, region-based per-minute rates (e.g., $0.01/min in the US). See Caller ID pricing.
Engine Type (Voice Processing Mode)
Choose how your AI processes speech and generates responses. Each mode is optimized for different use cases. See Assistant modes for detailed comparisons.Pipeline Mode
Traditional Speech-to-Text → LLM → Text-to-Speech pipeline. Offers maximum control over voice selection and response generation. Best for: Complex reasoning, function calling, custom voice requirementsSpeech-to-Speech Mode
Direct speech-to-speech generation without intermediate text processing. Provides the most natural conversational flow. Best for: Quick conversations, natural back-and-forth dialogueDualplex Mode (Beta)
Combines fast multimodal processing with premium ElevenLabs voice output. Best for: Most use cases - recommended defaultLanguage Configuration
Primary Language
The main language your assistant will use for speech recognition and synthesis. This affects:- Speech recognition accuracy
- Available voice options
- Filler audio phrases
- Voice model selection
Secondary Languages
Additional languages your assistant can understand and speak. Useful for:- Multilingual customer support
- International businesses
- Code-switching conversations
AI Voice Selection
Your assistant can choose from existing voices, clone custom voices, or request voices from the ElevenLabs library.Voice Options
You have three ways to get the perfect voice for your assistant: 1. Choose from existing voices:- Professional voices: Pre-trained, high-quality options from ElevenLabs
- Multiple accents: Available for most languages
- Gender options: Male and female voices for each language
- Tone variety: From formal business to casual conversational
- Clear, high-quality audio sample (1-5 minutes recommended)
- MP3 or WAV format
- Consistent speaking pace and tone
- Minimal background noise
- Same voice used throughout
- Record yourself or a voice actor reading sample text
- Upload the audio file in assistant settings
- Wait for training to complete (few minutes to hours)
- Test the cloned voice before using in production
- Brand consistency with company spokesperson
- Personal touch for customer relationships
- Matching voice to specific business persona
Timezone Configuration
Timezone
Set the timezone your assistant operates in. This affects:- Time-based variables in conversations
- Appointment scheduling functions
- “Current time” references in system prompts
- Timestamps in call logs and data extraction
Audio Enhancement Settings
Ambient Sound
Optional background sound mixed under your assistant’s voice to mask processing delays and create a more natural audio experience. Options:- None: No background sound (default)
- Office: Subtle office environment sounds
Turn off or lower volume if the assistant isn’t hearing the customer clearly.
Filler Audio
Short conversational phrases like “mhm”, “okay”, “I understand” that play during AI processing time. See Filler audio guide for full details.Benefits
- Eliminates awkward silences during processing
- Keeps callers engaged
- Creates more natural conversation flow
- Reduces hang-up rates
Positive responses
Positive responses
“Great!”, “Perfect!”, “Super!”
Negative responses
Negative responses
“Hmm.”, “I see.”, “Okay.”
Question responses
Question responses
“Right?”, “Really?”, “How so?”
Neutral responses
Neutral responses
“Okay.”, “I understand.”, “Got it.”
Enable by default - most conversations benefit from fillers. Test with your target audience and adjust phrases to match your assistant’s personality.
Advanced Settings
LLM Model Selection
Choose the best language model for your assistant’s mode. See LLM model selection guide for detailed recommendations. Recommended models by mode:Model | Strengths | Best for |
---|---|---|
GPT-5 Mini | Balanced reasoning with low latency | Pipeline mode for complex reasoning |
GPT-5 Realtime | Ultra-low-latency voice turns | Speech-to-Speech and Dualplex |
GPT-4o | Strong reasoning and multimodal understanding | Complex tasks (higher latency) |
Gemini Flash 2.0/2.5 | Ultra-fast for voice turns | Dualplex/Multimodal for minimal latency |
- Speed is critical: Use GPT-5 Realtime or Gemini Flash 2.0/2.5
- Rich reasoning needed: Use GPT-4o or GPT-5 Mini with filler audios to offset latency
LLM Temperature
Range: 0.0 - 1.0 | Default: 0.1 Adjust the level of creativity of the AI when generating responses. Lower value yields better function call results.Lower (0.0-0.3)
More stable: Predictable responses, better for function calling and business use cases
Higher (0.7-1.0)
More random: Creative and varied responses, good for casual conversations
Special behavior: For GPT-5 Mini and GPT-5 Nano models in Pipeline mode, temperature is automatically set to 1.0 for optimal performance.
Duration Settings
Control timing and call limits to optimize user experience and costs:Re-engagement Interval
Re-engagement Interval
Range: 7 - 600 seconds | Default: 30 secondsAI will try to re-engage the user if no reply is detected within this time.Recommended: 30-60 seconds for professional calls.
Max Call Duration
Max Call Duration
Range: 20 - 1200 seconds | Default: 600 seconds (10 minutes)Call will automatically end if this value is reached.Recommended: 5-10 minutes for lead qualification to control costs.
Max Silence Duration
Max Silence Duration
Range: 1 - 120 seconds | Default: 40 secondsCall will end if user doesn’t reply within this time.Recommended: 30-45 seconds to balance patience with efficiency.
Ringing Time
Ringing Time
Range: 1 - 60 seconds | Default: 30 secondsFor how long the call will ring before marking as unanswered. Good when you want to avoid voicemail by setting a lower value.
Cost optimization: Lower duration limits help control per-minute costs, especially important for high-volume campaigns.
Call Protection Settings
Noise Cancellation
Noise Cancellation
Default: EnabledFilters caller background noise for clearer speech recognition. Turn OFF if experiencing audio clipping.
End Call on Voicemail
End Call on Voicemail
Default: EnabledImmediately ends call if voicemail is detected during outbound calls (saves costs).
Record Calls
Record Calls
Default: EnabledRecords call audio for review and analysis. Ensure compliance with local recording laws.
Max Initial Silence
Max Initial Silence
Range: 1 - 120 seconds | Default: 20 seconds (when enabled)If enabled, end the call if no first user response within this time. Counts only from call start to first user response.Use case: Detect if anyone actually answered the phone.
Synthesizer Settings
Configure text-to-speech voice parameters for natural-sounding conversations. Available for: Pipeline and Dualplex modes only. Speech-to-Speech mode uses native voice generation.Voice Tuning Parameters
Fine-tune your assistant’s voice characteristics for optimal performance:Voice Stability
Voice Stability
Range: 0.0 - 1.0 | Default: 0.7Lower settings make the voice more expressive but less predictable, while higher settings make it steadier but less emotional.
More Expressive (0.0-0.3)
Dynamic and varied delivery but less predictable
More Stable (0.7-1.0)
Consistent and steady but less emotional range
Voice Similarity
Voice Similarity
Range: 0.0 - 1.0 | Default: 0.5Determines how closely the AI matches the original voice. Higher settings potentially include unwanted noise from the original recording.
More Stable (0.0-0.4)
Cleaner audio but less accurate to original voice
More Similar (0.6-1.0)
Accurate to original but may include background noise
For cloned voices: Start at 0.5 and increase gradually. Higher similarity can introduce unwanted artifacts from the original recording.
Speech Speed
Speech Speed
Range: 0.7 - 1.2 | Default: 1.0Adjust the speed of the AI’s speech for optimal comprehension and user experience.
Slower (0.7-0.85)
Better for complex information or older demographics
Normal (0.9-1.1)
Standard conversational pace for most use cases
Faster (1.15-1.2)
Quick conversations or time-sensitive scenarios
Transcriber Settings
Configure speech-to-text recognition for optimal accuracy and speed. Available for: Pipeline mode only. Speech-to-Speech and Dualplex modes use integrated transcription.Provider Selection
Choose the best transcriber for your language and use case. The provider that will be used to transcribe the user speech.Azure
Accuracy: ⭐⭐⭐⭐
Latency: SlowerBest for highest transcription fidelity when accuracy is critical.
Gladia
Accuracy: ⭐⭐⭐
Latency: FasterGood all-rounder for most languages. Supports multilingual configurations.
Deepgram
Accuracy: ⭐⭐⭐
Latency: FasterSolid choice for English and major languages.
Different languages, accents, or background noise can impact each provider differently. Test which performs better for your specific language and audio setup.
Endpoint Configuration
AI Turn Detection
Uses AI to intelligently detect when the caller has finished speaking
Voice Activity Detection (VAD)
Default: Traditional voice activity detectionChoose how the AI will detect the end of the user phrase
Voice Activity Detection (VAD)
Control when your assistant starts and stops talking. See Handling interruptions guide for detailed VAD configuration.Fine-tune these settings if experiencing interruption issues or sluggish responses.
Endpoint Sensitivity
Endpoint Sensitivity
Range: 0 - 5 seconds | Default: 0.5Adjust the time the AI will wait for the user to speak after the last word. Lower values make the AI faster, higher values are better for long user phrases.
- 0 (Faster): Quick responses but may cut off callers
- 5 (Slower): Waits longer, reduces interruptions
Interrupt Sensitivity
Interrupt Sensitivity
How easily the assistant stops when caller talks over it. Controls the sensitivity for detecting when a caller is trying to interrupt.
Minimum Interrupt Words
Minimum Interrupt Words
Require at least N caller words before interrupting assistant.
Use: Prevents false triggers from background noise or brief sounds.
Pro tip: Start with default VAD settings and adjust based on real call testing. Increase endpoint sensitivity if callers get cut off, decrease if responses feel slow.