Configure the fundamental settings for your AI assistant including call direction, phone numbers, voice selection, and technical parameters.

Quick Start Guide

Ready to set up your first AI assistant? Here’s the essential flow:
  1. Choose Call Direction: Inbound (answers calls) or Outbound (makes calls)
  2. Set Assistant Name: Internal label like “Support Bot” or “Sales Bot”
  3. Configure Phone Numbers: Assign platform numbers, SIP, or Caller ID
  4. Select Voice & Language: Choose from built-in voices or clone custom ones
  5. Adjust Advanced Settings: Fine-tune models, timing, and audio parameters
Always test your changes by calling the assistant or running a small campaign to confirm it behaves as expected.
Follow this page section by section to configure your assistant. Each setting includes detailed explanations and best practices to help you make the right choices.

Call Direction & Basic Setup

Assistant Type

Choose whether your assistant handles inbound or outbound calls. This fundamental choice affects which other options become available. Inbound (Receive calls): Handles incoming calls from customers. See Inbound calls overview. Outbound (Make calls): Initiates calls to leads or customers. See Outbound calls overview.

Assistant Name

A descriptive name to identify your assistant in the dashboard. Use something memorable that describes the assistant’s purpose (e.g. “Sales Qualifier”, “Support Bot”, “Appointment Scheduler”).

Phone Number Configuration

Your assistant needs a phone number to operate. The available options depend on your call direction choice.

For Outbound Assistants

You can use:
  • Platform numbers: Numbers rented directly from our platform
  • SIP numbers: Connect your existing VOIP/PBX system
  • Caller ID only: Verify ownership of an existing number to display it on outbound calls

For Inbound Assistants

You can use:
  • Platform numbers: Numbers rented directly from our platform
  • SIP numbers: Connect your existing VOIP/PBX system
Note: Caller ID only numbers cannot handle inbound calls - they only display on outbound calls.

Pricing & Costs

See Phone number types for detailed explanations and SIP integration guide for VOIP setup.

Engine Type (Voice Processing Mode)

Choose how your AI processes speech and generates responses. Each mode is optimized for different use cases. See Assistant modes for detailed comparisons.

Pipeline Mode

Traditional Speech-to-Text → LLM → Text-to-Speech pipeline. Offers maximum control over voice selection and response generation. Best for: Complex reasoning, function calling, custom voice requirements

Speech-to-Speech Mode

Direct speech-to-speech generation without intermediate text processing. Provides the most natural conversational flow. Best for: Quick conversations, natural back-and-forth dialogue

Dualplex Mode (Beta)

Combines fast multimodal processing with premium ElevenLabs voice output. Best for: Most use cases - recommended default

Language Configuration

Primary Language

The main language your assistant will use for speech recognition and synthesis. This affects:
  • Speech recognition accuracy
  • Available voice options
  • Filler audio phrases
  • Voice model selection
See Language support for all available languages and accents.

Secondary Languages

Additional languages your assistant can understand and speak. Useful for:
  • Multilingual customer support
  • International businesses
  • Code-switching conversations
Note: The AI can detect which language the customer is speaking and respond appropriately.

AI Voice Selection

Your assistant can choose from existing voices, clone custom voices, or request voices from the ElevenLabs library.

Voice Options

You have three ways to get the perfect voice for your assistant: 1. Choose from existing voices:
  • Professional voices: Pre-trained, high-quality options from ElevenLabs
  • Multiple accents: Available for most languages
  • Gender options: Male and female voices for each language
  • Tone variety: From formal business to casual conversational
2. Clone a custom voice: Create a custom voice by uploading audio samples: Requirements:
  • Clear, high-quality audio sample (1-5 minutes recommended)
  • MP3 or WAV format
  • Consistent speaking pace and tone
  • Minimal background noise
  • Same voice used throughout
Process:
  1. Record yourself or a voice actor reading sample text
  2. Upload the audio file in assistant settings
  3. Wait for training to complete (few minutes to hours)
  4. Test the cloned voice before using in production
Use cases:
  • Brand consistency with company spokesperson
  • Personal touch for customer relationships
  • Matching voice to specific business persona
3. Request from ElevenLabs library: You can request specific voices from the ElevenLabs public library - contact support to add them to your account. Browse the ElevenLabs Voice Library to discover thousands of professional voices across different languages, accents, and use cases. See Voice selection guide for detailed setup instructions.

Timezone Configuration

Timezone

Set the timezone your assistant operates in. This affects:
  • Time-based variables in conversations
  • Appointment scheduling functions
  • “Current time” references in system prompts
  • Timestamps in call logs and data extraction
Important: Choose the timezone where your business operates or where most customers are located. The assistant will use this for any time-related calculations or scheduling.

Audio Enhancement Settings

Ambient Sound

Optional background sound mixed under your assistant’s voice to mask processing delays and create a more natural audio experience. Options:
  • None: No background sound (default)
  • Office: Subtle office environment sounds
Volume control: Adjust the level of ambient sound relative to the voice. Lower values are usually better - too much background sound can interfere with speech recognition.
Turn off or lower volume if the assistant isn’t hearing the customer clearly.

Filler Audio

Short conversational phrases like “mhm”, “okay”, “I understand” that play during AI processing time. See Filler audio guide for full details.

Benefits

  • Eliminates awkward silences during processing
  • Keeps callers engaged
  • Creates more natural conversation flow
  • Reduces hang-up rates
Language-aware configuration: Filler phrases are automatically set for your selected language:
“Great!”, “Perfect!”, “Super!”
“Hmm.”, “I see.”, “Okay.”
“Right?”, “Really?”, “How so?”
“Okay.”, “I understand.”, “Got it.”
Customization: You can edit the default phrases for each category to match your brand voice or regional preferences.
Enable by default - most conversations benefit from fillers. Test with your target audience and adjust phrases to match your assistant’s personality.

Advanced Settings

LLM Model Selection

Choose the best language model for your assistant’s mode. See LLM model selection guide for detailed recommendations. Recommended models by mode:
ModelStrengthsBest for
GPT-5 MiniBalanced reasoning with low latencyPipeline mode for complex reasoning
GPT-5 RealtimeUltra-low-latency voice turnsSpeech-to-Speech and Dualplex
GPT-4oStrong reasoning and multimodal understandingComplex tasks (higher latency)
Gemini Flash 2.0/2.5Ultra-fast for voice turnsDualplex/Multimodal for minimal latency
Quick selection guide:
  • Speed is critical: Use GPT-5 Realtime or Gemini Flash 2.0/2.5
  • Rich reasoning needed: Use GPT-4o or GPT-5 Mini with filler audios to offset latency

LLM Temperature

Range: 0.0 - 1.0 | Default: 0.1 Adjust the level of creativity of the AI when generating responses. Lower value yields better function call results.

Lower (0.0-0.3)

More stable: Predictable responses, better for function calling and business use cases

Higher (0.7-1.0)

More random: Creative and varied responses, good for casual conversations
Special behavior: For GPT-5 Mini and GPT-5 Nano models in Pipeline mode, temperature is automatically set to 1.0 for optimal performance.

Duration Settings

Control timing and call limits to optimize user experience and costs:
Range: 7 - 600 seconds | Default: 30 secondsAI will try to re-engage the user if no reply is detected within this time.Recommended: 30-60 seconds for professional calls.
Range: 20 - 1200 seconds | Default: 600 seconds (10 minutes)Call will automatically end if this value is reached.Recommended: 5-10 minutes for lead qualification to control costs.
Range: 1 - 120 seconds | Default: 40 secondsCall will end if user doesn’t reply within this time.Recommended: 30-45 seconds to balance patience with efficiency.
Range: 1 - 60 seconds | Default: 30 secondsFor how long the call will ring before marking as unanswered. Good when you want to avoid voicemail by setting a lower value.
Cost optimization: Lower duration limits help control per-minute costs, especially important for high-volume campaigns.

Call Protection Settings

Default: EnabledFilters caller background noise for clearer speech recognition. Turn OFF if experiencing audio clipping.
Default: EnabledImmediately ends call if voicemail is detected during outbound calls (saves costs).
Default: EnabledRecords call audio for review and analysis. Ensure compliance with local recording laws.
Range: 1 - 120 seconds | Default: 20 seconds (when enabled)If enabled, end the call if no first user response within this time. Counts only from call start to first user response.Use case: Detect if anyone actually answered the phone.

Synthesizer Settings

Configure text-to-speech voice parameters for natural-sounding conversations. Available for: Pipeline and Dualplex modes only. Speech-to-Speech mode uses native voice generation.

Voice Tuning Parameters

Fine-tune your assistant’s voice characteristics for optimal performance:
Range: 0.0 - 1.0 | Default: 0.7Lower settings make the voice more expressive but less predictable, while higher settings make it steadier but less emotional.

More Expressive (0.0-0.3)

Dynamic and varied delivery but less predictable

More Stable (0.7-1.0)

Consistent and steady but less emotional range
Range: 0.0 - 1.0 | Default: 0.5Determines how closely the AI matches the original voice. Higher settings potentially include unwanted noise from the original recording.

More Stable (0.0-0.4)

Cleaner audio but less accurate to original voice

More Similar (0.6-1.0)

Accurate to original but may include background noise
For cloned voices: Start at 0.5 and increase gradually. Higher similarity can introduce unwanted artifacts from the original recording.
Range: 0.7 - 1.2 | Default: 1.0Adjust the speed of the AI’s speech for optimal comprehension and user experience.

Slower (0.7-0.85)

Better for complex information or older demographics

Normal (0.9-1.1)

Standard conversational pace for most use cases

Faster (1.15-1.2)

Quick conversations or time-sensitive scenarios

Transcriber Settings

Configure speech-to-text recognition for optimal accuracy and speed. Available for: Pipeline mode only. Speech-to-Speech and Dualplex modes use integrated transcription.

Provider Selection

Choose the best transcriber for your language and use case. The provider that will be used to transcribe the user speech.

Azure

Accuracy: ⭐⭐⭐⭐ Latency: SlowerBest for highest transcription fidelity when accuracy is critical.

Gladia

Accuracy: ⭐⭐⭐ Latency: FasterGood all-rounder for most languages. Supports multilingual configurations.

Deepgram

Accuracy: ⭐⭐⭐ Latency: FasterSolid choice for English and major languages.
Different languages, accents, or background noise can impact each provider differently. Test which performs better for your specific language and audio setup.

Endpoint Configuration

AI Turn Detection

Uses AI to intelligently detect when the caller has finished speaking

Voice Activity Detection (VAD)

Default: Traditional voice activity detectionChoose how the AI will detect the end of the user phrase

Voice Activity Detection (VAD)

Control when your assistant starts and stops talking. See Handling interruptions guide for detailed VAD configuration.
Fine-tune these settings if experiencing interruption issues or sluggish responses.
Range: 0 - 5 seconds | Default: 0.5Adjust the time the AI will wait for the user to speak after the last word. Lower values make the AI faster, higher values are better for long user phrases.
  • 0 (Faster): Quick responses but may cut off callers
  • 5 (Slower): Waits longer, reduces interruptions
How easily the assistant stops when caller talks over it. Controls the sensitivity for detecting when a caller is trying to interrupt.
Require at least N caller words before interrupting assistant. Use: Prevents false triggers from background noise or brief sounds.
Pro tip: Start with default VAD settings and adjust based on real call testing. Increase endpoint sensitivity if callers get cut off, decrease if responses feel slow.