Multi-Agent Patterns That Actually Work
Not everything needs multiple agents. But when it does, here are the patterns I've found useful.
Not everything needs multiple agents. But when it does, here are the patterns I've found useful.

"Let's use multiple AI agents working together!"
Usually, this is a solution looking for a problem. Single agents with good tools solve most use cases.
But sometimes — sometimes — multiple agents genuinely work better. Here are the patterns I've found valuable.
First, the negative space.
Don't use multi-agent for:
Multi-agent systems are harder to debug, more expensive to run, and more complex to maintain. Only use them when the benefits outweigh these costs.
Use multi-agent when:
With that filter, here are patterns that actually work.
The simplest multi-agent pattern. One agent routes to specialists.
User Query
↓
[Router Agent]
├─→ Sales Agent (if sales inquiry)
├─→ Support Agent (if support issue)
└─→ Technical Agent (if technical question)
When to use: When you have distinct domains that need different instructions, tools, or knowledge bases.
Key insight: The router should be fast and cheap. Use a smaller model. Route based on clear signals.
Failure mode: Over-routing. If every query goes through a router before doing anything, you've added latency and cost for no benefit.
Multiple agents work simultaneously, results get merged.
Research Query
↓
[Coordinator]
├─→ [Web Research Agent]
├─→ [Database Query Agent]
└─→ [Document Search Agent]
↓ (all complete)
[Synthesizer Agent]
↓
Final Answer
When to use: When you need information from multiple sources and can query them concurrently.
Key insight: Parallelism only helps if the subtasks are truly independent. If Agent B needs Agent A's output, it's not parallelism.
Failure mode: Fan-out explosion. Spawning 20 parallel agents for a simple query is wasteful. Limit concurrency.
A supervisor decomposes tasks, assigns to workers, aggregates results.
Complex Task
↓
[Supervisor Agent]
↓ (breaks into subtasks)
├─→ [Worker A] → subtask 1
├─→ [Worker B] → subtask 2
└─→ [Worker C] → subtask 3
↓ (workers complete)
[Supervisor Agent]
↓ (aggregates and refines)
Final Output
When to use: Complex tasks that naturally decompose. When you need oversight over specialist work.
Key insight: The supervisor needs to be genuinely smarter/more capable than workers. Otherwise, why have hierarchy?
Failure mode: Over-decomposition. Breaking simple tasks into subtasks adds overhead. Let the supervisor handle simple things directly.
One agent generates, another critiques, iterate until good.
Task
↓
[Generator Agent]
↓
Draft Output
↓
[Critic Agent]
↓ (if not good enough)
Feedback → Generator → Draft → Critic → ... (loop)
↓ (if good enough)
Final Output
When to use: When quality matters more than speed. Code generation. Writing. Analysis.
Key insight: The critic must have clear criteria. "Is this good?" is useless. "Does this handle edge case X?" is useful.
Failure mode: Infinite loops. Always have a max iteration limit. Sometimes "good enough" after 3 iterations beats "perfect" after 10.
Agent proposes, human approves, agent continues.
Request
↓
[Agent proposes action]
↓
[Human Review Gate]
├─ Approved → Execute → Continue
└─ Rejected → Agent revises → Human Review → ...
When to use: High-stakes decisions. Actions with real-world consequences. Anything involving money, customer communication, irreversible changes.
Key insight: The human gate should be at decision points, not everywhere. Too many approvals = approval fatigue = rubber stamping.
Failure mode: Blocking too often. If humans approve 99% automatically, the gate is probably in the wrong place.
Multiple agents independently assess, then reconcile.
Decision Needed
↓
[Agent A: Perspective 1]
[Agent B: Perspective 2]
[Agent C: Perspective 3]
↓
[Judge Agent]
↓
Final Decision (with reasoning)
When to use: Decisions where diverse perspectives genuinely help. Risk assessment. Ethical considerations. Complex tradeoffs.
Key insight: The perspectives must actually differ. Same model three times isn't debate. Different instructions, different priors, different emphases.
Failure mode: Artificial disagreement. If agents always agree, you don't need a committee. If they always disagree, the judge is just picking randomly.
Each agent handles one stage, passes to the next.
Raw Input
↓
[Extraction Agent] → Structured Data
↓
[Validation Agent] → Verified Data
↓
[Enrichment Agent] → Enhanced Data
↓
[Output Agent] → Final Response
When to use: When stages are truly sequential and benefit from specialization. Data processing. Document workflows.
Key insight: Each stage should add clear value. If a stage is just "check and pass through," eliminate it.
Failure mode: Too many stages. Each handoff adds latency and potential for error. Fewer stages is usually better.
Each agent should have:
Without contracts, agents can't reliably communicate.
One agent failing shouldn't crash the whole system. Handle failures gracefully:
Multi-agent = multi-cost. Track spending per agent, per pattern, per execution. Optimize the expensive parts. (See The Cost Problem in AI Nobody Talks About for more on this.)
Debugging multi-agent systems is hard. Log everything (see A Manifest for Better Logging):
Begin with a single agent. Add complexity only when you have evidence it helps.
Most systems I've seen start too complex. The best multi-agent systems evolved from single-agent systems that hit clear limitations.
Multi-agent systems aren't magic. They're architecture. Use them when the architecture genuinely serves the problem. Otherwise, a good single agent with the right tools will serve you better.