Mehmet Erturk | Multi-Agent Patterns That Actually Work

Multi-Agent Patterns

"Let's use multiple AI agents working together!"

Usually, this is a solution looking for a problem. Single agents with good tools solve most use cases.

But sometimes — sometimes — multiple agents genuinely work better. Here are the patterns I've found valuable.

When NOT to Use Multi-Agent

First, the negative space.

Don't use multi-agent for:

Simple request-response flows
Tasks a single agent with tools can handle
Impressive demos with no production requirement
Adding complexity to seem sophisticated

Multi-agent systems are harder to debug, more expensive to run, and more complex to maintain. Only use them when the benefits outweigh these costs.

When Multi-Agent Makes Sense

Use multi-agent when:

Different tasks require fundamentally different capabilities
Tasks can run in parallel for speed
You need human review at specific stages
Specialization genuinely improves quality
Error isolation matters (failure in one shouldn't cascade)

With that filter, here are patterns that actually work.

Pattern 1: Router/Dispatcher

The simplest multi-agent pattern. One agent routes to specialists.

User Query
    ↓
[Router Agent]
    ├─→ Sales Agent (if sales inquiry)
    ├─→ Support Agent (if support issue)
    └─→ Technical Agent (if technical question)

When to use: When you have distinct domains that need different instructions, tools, or knowledge bases.

Key insight: The router should be fast and cheap. Use a smaller model. Route based on clear signals.

Failure mode: Over-routing. If every query goes through a router before doing anything, you've added latency and cost for no benefit.

Pattern 2: Parallel Execution

Multiple agents work simultaneously, results get merged.

Research Query
    ↓
[Coordinator]
    ├─→ [Web Research Agent]
    ├─→ [Database Query Agent]
    └─→ [Document Search Agent]
    ↓ (all complete)
[Synthesizer Agent]
    ↓
Final Answer

When to use: When you need information from multiple sources and can query them concurrently.

Key insight: Parallelism only helps if the subtasks are truly independent. If Agent B needs Agent A's output, it's not parallelism.

Failure mode: Fan-out explosion. Spawning 20 parallel agents for a simple query is wasteful. Limit concurrency.

Pattern 3: Hierarchical (Supervisor-Worker)

A supervisor decomposes tasks, assigns to workers, aggregates results.

Complex Task
    ↓
[Supervisor Agent]
    ↓ (breaks into subtasks)
    ├─→ [Worker A] → subtask 1
    ├─→ [Worker B] → subtask 2
    └─→ [Worker C] → subtask 3
    ↓ (workers complete)
[Supervisor Agent]
    ↓ (aggregates and refines)
Final Output

When to use: Complex tasks that naturally decompose. When you need oversight over specialist work.

Key insight: The supervisor needs to be genuinely smarter/more capable than workers. Otherwise, why have hierarchy?

Failure mode: Over-decomposition. Breaking simple tasks into subtasks adds overhead. Let the supervisor handle simple things directly.

Pattern 4: Critic-Refiner

One agent generates, another critiques, iterate until good.

Task
    ↓
[Generator Agent]
    ↓
Draft Output
    ↓
[Critic Agent]
    ↓ (if not good enough)
Feedback → Generator → Draft → Critic → ... (loop)
    ↓ (if good enough)
Final Output

When to use: When quality matters more than speed. Code generation. Writing. Analysis.

Key insight: The critic must have clear criteria. "Is this good?" is useless. "Does this handle edge case X?" is useful.

Failure mode: Infinite loops. Always have a max iteration limit. Sometimes "good enough" after 3 iterations beats "perfect" after 10.

Pattern 5: Human-in-the-Loop

Agent proposes, human approves, agent continues.

Request
    ↓
[Agent proposes action]
    ↓
[Human Review Gate]
    ├─ Approved → Execute → Continue
    └─ Rejected → Agent revises → Human Review → ...

When to use: High-stakes decisions. Actions with real-world consequences. Anything involving money, customer communication, irreversible changes.

Key insight: The human gate should be at decision points, not everywhere. Too many approvals = approval fatigue = rubber stamping.

Failure mode: Blocking too often. If humans approve 99% automatically, the gate is probably in the wrong place.

Pattern 6: Debate/Committee

Multiple agents independently assess, then reconcile.

Decision Needed
    ↓
[Agent A: Perspective 1]
[Agent B: Perspective 2]
[Agent C: Perspective 3]
    ↓
[Judge Agent]
    ↓
Final Decision (with reasoning)

When to use: Decisions where diverse perspectives genuinely help. Risk assessment. Ethical considerations. Complex tradeoffs.

Key insight: The perspectives must actually differ. Same model three times isn't debate. Different instructions, different priors, different emphases.

Failure mode: Artificial disagreement. If agents always agree, you don't need a committee. If they always disagree, the judge is just picking randomly.

Pattern 7: Pipeline (Sequential Handoffs)

Each agent handles one stage, passes to the next.

Raw Input
    ↓
[Extraction Agent] → Structured Data
    ↓
[Validation Agent] → Verified Data
    ↓
[Enrichment Agent] → Enhanced Data
    ↓
[Output Agent] → Final Response

When to use: When stages are truly sequential and benefit from specialization. Data processing. Document workflows.

Key insight: Each stage should add clear value. If a stage is just "check and pass through," eliminate it.

Failure mode: Too many stages. Each handoff adds latency and potential for error. Fewer stages is usually better.

Implementation Notes

1. Clear Contracts

Each agent should have:

Defined input format
Defined output format
Clear success/failure criteria

Without contracts, agents can't reliably communicate.

2. Error Isolation

One agent failing shouldn't crash the whole system. Handle failures gracefully:

Timeouts per agent
Fallback behaviors
Clear error reporting

3. Cost Tracking

Multi-agent = multi-cost. Track spending per agent, per pattern, per execution. Optimize the expensive parts. (See The Cost Problem in AI Nobody Talks About for more on this.)

4. Observability

Debugging multi-agent systems is hard. Log everything (see A Manifest for Better Logging):

Which agent did what
What was passed between agents
Where time was spent
What decisions were made

5. Start Simple

Begin with a single agent. Add complexity only when you have evidence it helps.

Most systems I've seen start too complex. The best multi-agent systems evolved from single-agent systems that hit clear limitations.

Multi-agent systems aren't magic. They're architecture. Use them when the architecture genuinely serves the problem. Otherwise, a good single agent with the right tools will serve you better.