How to Run a Post-Mortem That Actually Improves Things
Most post-mortems are theater. Here's how I run them so they actually prevent the next incident.

I've been in the room for post-mortems that changed how teams operate. I've also sat through post-mortems that were complete wastes of time — checkbox exercises that made everyone feel better but changed nothing.
The difference? How you run them.
The Point of a Post-Mortem
It's not to assign blame. It's not to document what happened for compliance. It's to answer one question: How do we make sure this specific thing never happens again?
That's it. Everything else is noise.
A Real Example
Let me walk through an actual incident I dealt with.
What happened: E-commerce platform went down during a flash sale. 2 hours of downtime. Rough estimate: $150K in lost revenue.
Timeline:
- 14:30 — Traffic spike starts as sale goes live
- 14:35 — Response times degrade
- 14:42 — Database connections maxed out
- 14:45 — Site goes unresponsive
- 14:50 — On-call gets paged
- 15:10 — Root cause identified (connection pool exhaustion)
- 15:30 — Temporary fix deployed (increased pool size)
- 16:45 — Full recovery confirmed
What actually went wrong:
- No load testing done for the flash sale scenario
- Connection pool was sized for normal traffic, not 10x spike
- Monitoring didn't alert on connection pool saturation (see A Manifest for Better Logging)
- Runbook for "database overwhelmed" didn't exist
The Post-Mortem Format I Use
Skip the 20-page document. I use a simple template:
INCIDENT: [Name]
DATE: [When]
DURATION: [How long]
IMPACT: [What broke, who was affected]
TIMELINE:
[Bullet points of what happened when]
ROOT CAUSE:
[One sentence. Not symptoms — the actual cause.]
CONTRIBUTING FACTORS:
[What made it worse or slower to fix]
ACTION ITEMS:
[Specific, assigned, with deadlines]
That's it. One page.
The Meeting
Keep it short. 30-45 minutes max.
Who's in the room:
- People who were directly involved
- The person who'll own the action items
- One person to take notes
That's it. No executives unless they were hands-on-keyboard. No one "attending to stay informed."
How it runs:
- Walk through the timeline together (10 min)
- Identify root cause vs symptoms (10 min)
- Generate action items (15 min)
- Assign owners and deadlines (5 min)
No blame. No "who made the mistake." Systems fail, not people. If one person's error can take down production, you have a system problem.
Action Items That Actually Get Done
Most post-mortem action items never happen. They go into a backlog and die.
My rules:
- Maximum 3 action items per incident
- Each one has an owner (a person, not a team)
- Each one has a deadline (within 2 weeks, or it won't happen)
- Each one gets tracked in whatever system the team actually looks at
Example from that flash sale incident:
- Add connection pool monitoring — Owner: Platform lead — Due: 3 days
- Create flash sale load test scenario — Owner: QA lead — Due: 1 week
- Update runbook for DB saturation — Owner: On-call rotation lead — Due: 1 week
All three got done. Next flash sale? No issues.
Common Mistakes
The blame game: "John should have caught this in code review." Cool. Now John feels terrible and nothing improved. Focus on the system: why didn't tests catch it? Why didn't monitoring alert?
Too many action items: 15 action items = 0 action items. Pick the 2-3 that would have the biggest impact.
Vague actions: "Improve monitoring" is not an action item. "Add alert for connection pool > 80%" is an action item.
No follow-up: If you don't check whether actions got done, they won't. I add a calendar reminder for 2 weeks after every post-mortem.
When to Skip the Post-Mortem
Not every incident needs a formal review. If the fix was obvious and already applied, a quick Slack thread is fine.
Post-mortem when:
- Multiple people were involved in the response
- Impact was significant
- Root cause wasn't immediately obvious
- Same or similar incident happened before
The goal is learning, not bureaucracy. If you're running post-mortems for trivial incidents, people will start treating all post-mortems as trivial.