[object Object]

Chat Isn’t One Thing

Synchronous chat (live web/in-app, expectation of seconds) differs fundamentally from asynchronous chat (SMS, WhatsApp, email, expectation of minutes to hours). Voice differs from both — turn detection must be near-instant, errors degrade more visibly, accessibility needs differ. Multimodal (chat with image, video, screenshare, document attachment) differs again — interpreting the attached artifact is a different design problem than interpreting words. Design per channel, not per “conversational” abstraction.

A useful 2x2: synchronous vs asynchronous, voice vs text. Each quadrant has its own affordances:

SynchronousAsynchronous
TextLive chat, in-appSMS, WhatsApp, email
VoicePhone, voice agentVoicemail, async voice notes

Multimodal cuts across all four.

Turn-Taking Design

Make turn boundaries clear. The user must always know whether it’s their move or the system’s. In sync chat, typing indicators and “thinking” animations during model inference. In voice, audible micro-cues (subtle tone, breathing pause) signal the agent is processing without leaving dead air. In async, status indicators (delivered, read, agent typing) plus realistic time expectations.

Voice-specific concerns:

  • Endpointing (when the user stops speaking) is the hardest UX problem in voice. Tune for under-cutting (interrupting too soon) and over-waiting (long awkward silences) per language and population.
  • Barge-in: let the user interrupt the agent mid-utterance. Critical for accessibility and impatience tolerance.
  • Filler detection: “um” and “uh” should not trigger turn-taking.

The user should always know what the system is doing — the most common voice-agent failure is leaving the user uncertain whether the agent heard them.

Error Recovery

When the AI fails, make recovery obvious. “I didn’t quite catch that — could you rephrase?” is better than a cryptic error or silent retry. Offer concrete alternatives:

  • Human escalation in one tap or one phrase (“connect me to a person”).
  • Reformulated input (“you can try saying it differently, or pick from these options: …”).
  • Related capability (“I can’t book that flight, but I can help you check your itinerary”).
  • Async fallback (“I’ll have a specialist email you within 24 hours”).

Never loop. Two failed clarification turns means escalate; don’t ask a third time.

Multimodal Patterns

When the user attaches a screenshot, image, or document, acknowledge receipt explicitly and reflect what was understood (“I see your invoice from March 15 for $1,247 — is this the one you’re asking about?”). Misinterpretation is more confusing in multimodal because users expect the AI to “see” what they see.

Image generation in conversation (chart of account history, visual confirmation of an action) carries its own design weight: alt text mandatory, contrast tested, no critical information conveyed only visually.

Accessibility (Non-Optional)

Voice interfaces must work for users with speech differences, accents, dysarthria, stuttering. Provide a parallel text path always. Chat must work with screen readers — ARIA live regions throttled per sentence, focus management on dynamic content, no decorative-only elements that require sight. Video must have captions and transcripts. Generated images need alt text.

EU AI Act Article 16 makes accessibility for high-risk AI systems mandatory. WCAG 2.2 AA is the operative standard in most jurisdictions. The DOJ’s April 2024 ADA web rule sets a hard April 2026 compliance deadline for state and local government public-facing AI. Accessibility is design discipline, not a post-launch retrofit.

Common Failure Modes

  • Designing for the happy path; failure UX is an afterthought that defines the brand.
  • Voice agents with no barge-in — users feel trapped.
  • “Talk to human” buried three menus deep.
  • Async chat with no SLA expectation set — user thinks they were ignored.
  • Multimodal that ignores the modality — generates text descriptions of images the user already sent.

What to Do This Week

Use your own conversational CRM surface for one real customer task. Note every moment of friction. That list is your design backlog.

[object Object]
Share