The Comeback
Voice AI agents handle customer inquiries, process orders, upsell products, and resolve account issues. Phone — long presumed dying — has become a preferred channel for urgent and high-intent interactions because AI removed the wait-time problem that killed its reputation. The 2025 Forrester CX wave noted that voice’s share of the support mix climbed from 18% to 27% in voice-AI-enabled enterprises while overall ticket volume held steady. Sierra, Decagon, Cresta, PolyAI, Vapi, and Retell AI are the named voice-native platforms; the hyperscalers (AWS Connect with Bedrock, Azure Communication Services with Copilot, Google Dialogflow CX with Vertex AI) cover the build-it-yourself path.
Where Voice Wins
Urgent problems — service outages, billing surprises, account lockouts, anything where the customer wants resolution within minutes. High-intent transactions — flight changes, medical scheduling, insurance claim opening, mortgage payoff requests. Populations underserved by chat — older customers with mobility or vision issues, customers in vehicles, customers in environments where typing is impractical, anyone covered by ADA or similar accessibility frameworks. The 2026 maturity is comfortable handling 50+ languages on the leading platforms; PolyAI is particularly strong on accent variation in English markets.
What Voice Struggles
Multi-step workflows requiring form completion compress poorly to voice. Complex data consumption — comparing five plan options, reading a long policy document, scanning a list of charges — works far better in a visual interface. Detail-heavy tasks where the customer needs to process a list. Anything that benefits from screenshots, links, or attachments. The right pattern is often hybrid: voice for the urgent open, chat or email for the document-heavy follow-up. Build the handoff explicitly with session-store continuity rather than asking the customer to repeat themselves.
Implementation Reality
Voice AI requires sub-second time-to-first-audio (Sierra and Decagon design to 500-800ms targets), barge-in support (the customer interrupts mid-utterance and the agent yields gracefully), end-of-utterance detection that handles “umm” and pauses without hanging up, prosody control, and TTS quality that does not feel robotic. The integration burden is substantial: telephony (SIP trunking, IVR sit-alongside or replacement), CRM screen-pop and post-call wrap, workforce management (NICE, Verint, Calabrio), recording and compliance (two-party consent in California, Florida, Pennsylvania, Massachusetts, plus the EU AI Act Article 50 disclosure requirement). Budget six months and the platform cost in parity with the integration cost.
Voice deployment unit economics
Per resolution platform (Sierra, Decagon) $0.85 - $4.50
Telephony per minute (Twilio, Vonage) $0.012 - $0.025
LLM tokens (input + output) $0.04 - $0.18 per call
Recording storage + compliance $0.002 per minute
ASR (Deepgram, AssemblyAI) $0.005 - $0.015 per min
TTS (ElevenLabs, OpenAI, Cartesia) $0.005 - $0.022 per min
What Changed in 2026
Three shifts: voice-native vendors became distinct from chat-first vendors who retrofitted voice; the EU AI Act Article 50 disclosure mandate (“you are speaking with an AI”) became enforced practice in EU markets; and outcome-based pricing reshaped procurement so the conversation moved from per-minute to per-resolved-call.
Common Failure Modes
The recurring failures: deploying voice AI behind a 90-second IVR menu the agent inherits before it ever speaks; missing the AI-disclosure requirement and accruing CSAT damage and regulatory risk; under-investing in human handoff so the customer repeats the entire problem when escalated; and not running the full scenario library against the agent before launch — voice failure modes (mishearing, talkover) do not appear in chat eval sets.
Cost Considerations
Voice deployments cost 3-5x chat per interaction but often resolve faster, so cost per resolution can come out comparable. Budget the integration line at parity with the platform line. Plan for compliance attestation (HIPAA BAA, PCI scope if payment is collected) before procurement.
What to do this week
Pull a week of recorded calls from your highest-volume queue. Tag each as voice-suited, chat-suited, or either. The mix decides the rollout sequence and whether voice AI is a deflection play or a copilot play.