The Virtual Agent project shipped 47 topics and the containment rate plateaued at 19%. The post-mortem found the problem in the pre-work that the team had skipped: topic priorities were guesses, NLU training was thin, and the handoff confused users instead of helping them. Building a Virtual Agent that actually deflects work is straightforward when the foundations are right and impossible when they are not.
The Pre-Work You Can’t Skip
Before building topics, decide which channel surfaces the agent (Agent Chat, Microsoft Teams, Slack, Service Portal), which NLU model handles intent classification, who owns ongoing training data and review, and what “success” means. Deflection rate is the obvious metric and the misleading one — containment (resolved without human) and CSAT (user happy with the resolution) matter more. A high deflection number with low CSAT means users gave up.
Run a two-week ticket sample analysis. The top five intents usually account for 40% or more of volume. Build those first; resist the urge to ship a topic for every category.
Pre-build checklist:
- sampled 2 weeks of inbound tickets
- ranked intents by volume
- identified top 5 (covers 40%+ of volume)
- confirmed channel(s) and SSO/auth model
- assigned NLU training data owner
- defined success metrics with baseline
Topics, Not Conversations
A topic is a unit of work (request software, reset password, check ticket status, look up PTO balance). Keep topics narrow — one task per topic. Long conversational topics that try to handle multiple workflows become unmaintainable and produce confusing transitions when the user pivots mid-conversation. Include a clear escape hatch at every step so users can reach a human without abandoning the session.
NLU Training
Give each topic 20-40 example utterances. Include misspellings, slang, abbreviations, alternative phrasings, and the way users actually talk in your organization (jargon, internal product names, shorthand). The pre-trained model does the heavy lifting on grammar and general intent; your examples teach it the vocabulary and phrasing specific to your environment. Re-train monthly against the previous month’s actual user inputs.
Sample utterances for "Reset Password":
reset my password
i need to change my password
forgot password help
pwd reset pls
cant log in, need new password
password expired what now
unlock my account and reset password
Human Handoff
Handoff rules matter as much as the topics themselves. Configure handoff when confidence is below a threshold (start at 0.6, tune from data), when the user says “agent” or “human” or equivalents, when the same intent fires twice without progress, and when the topic explicitly hits a dead end. The handoff carries full conversation context so the live agent does not start cold and ask the user to repeat everything.
// Handoff condition example
function shouldHandoff(conversation) {
if (conversation.last_intent_confidence < 0.6) return true;
if (conversation.last_user_input.match(/agent|human|representative/i)) return true;
if (conversation.repeated_intent_no_progress >= 2) return true;
return false;
}
Measuring What Matters
Containment (resolved without human handoff). Deflection (resolved without a ticket created). CSAT (user satisfaction post-conversation). Publish weekly to operations leadership. A 60% containment rate is respectable; 80%+ is excellent but rarely achievable uniformly across topics. Some topics (password reset) can hit 90%+; others (complex troubleshooting) plateau lower and that is fine — the metric is the operations conversation, not a target for shaming individual topics.
Common Failure Modes
Topics shipped without a tested handoff path — the user reaches a dead end and disengages. Always have a tested handoff. Topics that ask for information the system already knows about the user — frustrates users; always pre-fill from the user’s profile. Topics with confidence thresholds set too high — the agent escalates too readily, the user thinks the agent is useless. Tune thresholds from observed data.
What Changed in 2026
The 2026 release added multi-turn context retention across sessions, which makes longer workflows usable that previously required restart on every visit. NLU re-ranking via Now Assist (where licensed) refines intent during the conversation rather than committing to the initial classification. The Service Operations Workspace integration shows live conversations to supervising agents who can intervene proactively.
Implementation Sequence
Pilot one channel with the top three topics for 30 days. Measure containment, deflection, CSAT. Refine the topics based on actual conversation logs (read the transcripts; the data is in sys_cs_conversation). Add the next two topics. Expand to additional channels only after the first channel is stable. The “ship 50 topics across 4 channels in the first quarter” approach produces a Virtual Agent nobody trusts.
What to do this week: pick the single top intent from your last two weeks of tickets and design one Virtual Agent topic for exactly that intent; everything else can wait until the first one ships.