Quick guide to fine-tune mode, transcriber, model, and other settings for the best call experience.
Last updated: June 29, 2025Getting great results often comes down to picking the right engine settings. Use this checklist when configuring an assistant:
Mode | Why choose it? | Notes |
---|---|---|
Speech-to-Speech (Multimodal) | Fastest turn-taking & most natural flow | We recommend starting here. Try the Gemini 2.5 engine (beta) for the lowest latency—but be aware it’s still experimental and may be less stable. |
Pipeline | Maximum control over voice & long-form replies | If you select Pipeline, continue to the Transcriber step below. |
Transcriber | Accuracy | Latency | Best for |
---|---|---|---|
Azure | ⭐⭐⭐⭐ | ⏱️⏱️⏱️ (slower) | When you need the highest transcription fidelity. |
Gladia | ⭐⭐⭐ | ⏱️ (faster) | Good all-rounder for most languages. |
Deepgram | ⭐⭐⭐ | ⏱️ (faster) | Another solid choice—test which performs better for your language & audio setup. |
Tip: Different languages, accents, or background noise can impact each engine differently. Run a quick A/B test and keep the best performer.
Model | Strengths | Trade-offs |
---|---|---|
GPT-4o | Smartest reasoning, handles complex prompts | Slightly higher latency and cost. |
Gemini 2.5-Flash-Lite | Blazing-fast, still highly capable | May miss nuance in very complex tasks—test for your use-case. |
Parameter | Recommended | Why |
---|---|---|
Re-engagement | ≈ 30 s | Gives callers enough time to think. Lower values can feel pushy. |
Max silence duration | ≈ 60 s | Prevents premature hang-ups while still ending truly silent calls. |
Mode | How it’s used | Best practice |
---|---|---|
Pipeline | Read exactly as written (converted by TTS). | Write the greeting verbatim: “Hello, this is Alex from …”. |
Speech-to-Speech | Interpreted as a prompt by the model. | Include instructions like “Greet the customer and say …” or prepend say exactly: to ensure literal output. |
Setting | Effect | Use when |
---|---|---|
Lower sensitivity | Assistant responds faster after caller stops speaking | You want snappy, quick-turn conversations |
Higher sensitivity | Assistant waits longer before responding | Callers give longer, more detailed replies |