[object Object]

Two years ago every CRM agent was a monolith. In 2026 the agent count per workflow is closer to five — a planner, a retriever, a writer, a verifier, a notifier — and the orchestration shape decides whether the system is reliable or theatrical. Three patterns dominate: supervisor, swarm, pipeline. Each has a specific failure mode. Picking the wrong one is the most common architectural mistake we see this year.

The three patterns, named

Supervisor. One agent (the supervisor) routes work to specialist sub-agents and synthesizes results. LangGraph’s supervisor pattern, AutoGen’s GroupChat with a moderator, CrewAI’s hierarchical mode, OpenAI’s Assistants API multi-tool agent.

Swarm. Peer agents pass work to each other via handoff, no central coordinator. OpenAI Swarm (the framework), Anthropic’s multi-agent research demo, the Agentforce 360 “Agent Network” announcement. Handoff is explicit but distributed.

Pipeline. Linear or DAG-shaped sequence of agents, each with a fixed role. Output of agent N is input of agent N+1. Like a Unix pipe. Inngest, Temporal, Step Functions, dbt-style agent orchestration.

These map to organizational shapes. Supervisor = manager + team. Swarm = peer engineers. Pipeline = assembly line. The right shape depends on whether the work is decomposable upfront, dynamic, or fixed.

Supervisor: when the planner is the bottleneck

The supervisor decomposes incoming work, picks a specialist, hands off, evaluates, possibly delegates again, and synthesizes.

# supervisor.py (pseudo)
specialists = {
    "research": research_agent,
    "draft":    draft_agent,
    "verify":   verify_agent,
    "send":     notify_agent,
}

def supervisor(task):
    state = {"task": task, "history": []}
    for _ in range(MAX_STEPS):
        decision = supervisor_llm.plan(state)   # which specialist next?
        if decision.action == "complete":
            return state["result"]
        result = specialists[decision.specialist].run(decision.subtask, state)
        state["history"].append((decision, result))
    return escalate(state)

When it works. When the task space is varied, sub-tasks are loosely defined, and a smart router adds real value (account research, customer support investigations, opportunity strategy).

Failure mode. The supervisor is a single point of intelligence. If it picks wrong, the whole chain wastes tokens. Supervisor LLM cost dominates — every step pays the planning tax. Latency stacks (supervisor → specialist → supervisor → specialist).

Cost shape. Highest per-task cost of the three patterns. Best per-task quality on novel work.

Swarm: when handoffs are explicit and local

Each agent has a defined role and an explicit set of handoffs it can make. No central planner. The agent currently holding the task decides who to hand it to (or to finish).

# swarm.py (pseudo)
class Agent:
    role: str
    handoffs: list[Agent]
    def run(self, task):
        result, next_agent = self.llm.decide(task, self.handoffs)
        if next_agent is None:
            return result
        return next_agent.run(result)

triage = Agent("triage", handoffs=[billing, support, sales])
billing = Agent("billing", handoffs=[support, escalation])
# ...

When it works. Customer service flows with clear specializations. The triage agent hands to billing, billing might hand back or escalate, support handles the catch-all. Each handoff is a local decision, no global plan.

Failure mode. Looping. Without a circuit breaker, agents can ping-pong indefinitely. Also: discovery — when a new specialist is added, every existing agent needs its handoff list updated. Swarm doesn’t scale linearly with agent count.

Cost shape. Lower than supervisor on simple flows. Comparable on complex flows. Latency is the dominant complaint.

Pipeline: when the shape of work is fixed

The flow is known. Lead comes in → enrich → score → assign → notify. Each step is an agent (or a tool call). DAG, not graph.

# pipeline.yaml
pipeline: lead_to_meeting
steps:
  - id: enrich
    agent: enrichment_agent
    inputs: [lead]
    outputs: [enriched_lead]
    timeout_s: 30
  - id: score
    agent: scoring_agent
    inputs: [enriched_lead]
    outputs: [score, band]
    timeout_s: 5
  - id: route
    agent: routing_agent
    inputs: [enriched_lead, band]
    outputs: [assigned_ae]
    timeout_s: 10
  - id: outreach_draft
    agent: writer_agent
    inputs: [enriched_lead, assigned_ae]
    outputs: [draft_email]
    timeout_s: 20
  - id: human_review
    type: approval
    approvers: [assigned_ae]
    timeout_s: 86400
  - id: send
    agent: send_agent
    inputs: [draft_email]

When it works. Repeatable workflows. Compliance-bound flows where every step needs to be auditable. High-volume operations where determinism matters more than intelligence.

Failure mode. Inflexibility. The first time a lead requires an extra step (verification, dedup, multi-language translation), the pipeline either bypasses or breaks. Pipelines accumulate special cases until they become hairballs.

Cost shape. Cheapest per task. Most predictable latency. Lowest quality on tasks that don’t fit the shape.

Comparison

DimensionSupervisorSwarmPipeline
Task varietyHighMediumLow
Cost / taskHighMediumLow
LatencyHighMediumLow
DeterminismLowMediumHigh
DebuggabilityHardMediumEasy
Audit fitnessOKOKExcellent
Add new capabilityAdd specialistUpdate handoff listsAdd step + rewire
Failure modeBad planningInfinite handoffSpecial-case bloat

The hybrid most teams land on

Pure patterns are rare in production. Common compositions:

  • Pipeline of supervisors. Fixed phases (intake → execute → close), supervisor within each phase. Combines auditability with flexibility.
  • Swarm under supervisor. Supervisor routes to a swarm of equivalent specialists; the swarm handles handoff among themselves. Used in customer service.
  • Pipeline with optional supervisor branch. Default fast path, escalate to supervisor when confidence drops. The escape hatch.

Match the shape to the workflow. Customer service tier-1: pipeline with supervisor escape. Account research: supervisor. Inbound triage: swarm.

State management: the part nobody talks about

All three patterns require shared state between agents. The state object grows, accumulates context, and becomes the single most expensive thing in every prompt.

Three approaches:

  • Pass-everything. Each agent gets the full history. Simple, expensive, hits context limits fast.
  • Curated state. A state-management layer summarizes between hops. Cheaper, lossy, requires its own model calls.
  • Append-only event log + selective replay. Each agent reads only the events relevant to it. Most efficient at scale; most complex to build.

For workflows under 10 steps, pass-everything works. Above that, curate.

Observability across patterns

Whatever shape you pick, instrument identically:

  • One trace per workflow invocation.
  • Each agent execution = one span.
  • Each tool call = one child span.
  • Decision points (supervisor routing, swarm handoff, pipeline branch) = annotated events.
  • Outcome (completed, escalated, failed, timed out) = root span status.

Use the OpenTelemetry GenAI conventions — same as for single-agent evaluation. The multi-agent topology is visible in the parent-child relationships. Without traces, debugging multi-agent failure is reading a ouija board.

Vendor implementations in 2026

  • Salesforce Agentforce Atlas Reasoning Engine. Supervisor pattern under the hood; “Agent Network” feature is swarm-flavored handoff.
  • Microsoft Copilot Studio multi-agent. Supervisor with deterministic plug-in routing; pipeline via Power Automate.
  • ServiceNow Now Assist Workflow Studio. Pipeline-shaped, deterministic, opinionated.
  • LangGraph. All three, you pick.
  • OpenAI Swarm / Assistants v2 multi-agent. Swarm-shaped, lightweight.

The vendor’s preferred shape will be the shape that’s cheapest to support, not necessarily the shape that fits your workflow. Don’t let tooling pick architecture.

What breaks first in production

  • Supervisor. Token cost. The planner becomes 60% of inference spend. Fix: cache plans for similar tasks; use a smaller model for planning.
  • Swarm. Loops. Agent A and B pass forever. Fix: handoff count limit, circuit breaker, escalate-to-human after N hops.
  • Pipeline. Special cases. Step 4 needs a branch for legal review. Then step 7 needs another. Fix: refactor early; introduce a supervisor branch rather than nesting if/else in pipeline config.

Governance and audit across patterns

Pipelines are easiest to audit — each step has a known role, known inputs, known outputs. Supervisor patterns are harder because the planner’s reasoning is itself a model call; you need to capture its rationale, not just its decision. Swarms are hardest — the handoff graph at runtime may not match the design-time graph.

For regulated workloads (financial, healthcare, employment decisions under EU AI Act human oversight rules), default to pipeline. Use supervisor only where you can capture the plan as a structured artifact.

Concurrency: do you actually need it

Multi-agent does not automatically mean concurrent. Many “multi-agent” systems are sequential — one agent at a time, just specialized. That’s fine and often correct.

Concurrent multi-agent (multiple agents running in parallel) is rarely worth the complexity. Race conditions on shared state, duplicate tool calls, conflicting writes — all real problems. Only go concurrent when:

  • The sub-tasks are genuinely independent (parallel retrievals across stores).
  • The tool calls are idempotent or coordinated.
  • The latency win is large enough to justify the complexity.

In our experience, 80% of “concurrent multi-agent” systems would be cheaper, simpler, and just as fast as sequential with parallel retrieval.

Cost optimization across patterns

A few moves that generalize:

  • Smallest competent model per role. The supervisor doesn’t need Opus-class. The verifier rarely does. Reserve the big model for the synthesis step.
  • Caching at handoff boundaries. Common sub-task results cache well.
  • Token budgets per agent. Each role has a max-token cap; exceeding triggers escalation, not a bigger context.
  • Batch handoffs. Instead of one agent → one agent, queue multiple work items and let specialists process batches.

These compound. A multi-agent system without cost engineering becomes the highest line item in the AI budget within two quarters.

The pattern that works

  • Pick by workflow shape, not by framework. Supervisor for variable tasks, pipeline for fixed flows, swarm for specialist routing.
  • Most production stacks are hybrids — usually pipeline with a supervisor escape hatch.
  • State curation between hops is the difference between viable and unaffordable at scale.
  • Trace every workflow with OTel GenAI conventions. Multi-agent without traces is undebuggable.
  • Cap iterations, handoffs, and token budget. Every pattern has a runaway failure mode.
[object Object]
Share