PM Crisis Management
(2026 Edition)
5 first-60-min moves, 5 hours 1-4 moves, 5 post-incident steps, and 5 communication rules for handling production incidents.
Build PM Incident Instincts Daily — Free →First 60 Minutes
Assess severity (P0: users blocked, P1: degraded, P2: minor)
Assemble the right team in one chat (eng lead, ops, CS, comms)
Assign a single incident commander — not you unless necessary
Communicate early: 'We're aware, investigating, will update in 30 min'
Start a log of decisions and timestamps in the incident channel
Hours 1–4
Rollback if the incident is caused by a deploy — fast recovery beats root cause hunting
Keep communication loop open with customer-facing teams — every 30 min
Don't speculate publicly about cause — 'investigating' is better than a wrong hypothesis
Protect engineers from stakeholder interruptions — they're fixing, you're communicating
Track user impact if possible (affected users, revenue blocked) — data for post-mortem
Post-Incident (24–72 hours)
Resolve the incident and confirm recovery with real user data
Communicate to customers — what happened, what we did, what we'll prevent
Run a blameless post-mortem within 48 hours
Assign prevention owners and deadlines — specific, not generic
Share post-mortem broadly — signals you learned, not hid
5 Communication Rules
Acknowledge first, diagnose second — users want to know you know
Be specific about impact, vague about root cause (until certain)
Set expectations: 'next update in 30 min' — then hit that mark
Apologise if you caused it, don't over-apologise — professionalism matters
Never blame individuals publicly — blame systems, fix systems
FAQ
What's the PM's role during a production incident?
Communication and coordination — NOT engineering. Your job: assemble the team, ensure the right people are engaged, communicate to affected parties (customers, leadership, CS), track decisions. Engineering owns the fix. PMs who try to engineer the fix themselves get in the way. The discipline is staying in your lane while providing air cover.
How do PMs rebuild trust after a major incident?
Three things: (1) thorough, honest post-mortem shared publicly, (2) concrete prevention measures shipped within 30 days, (3) consistent behaviour going forward — no repeat of the same mistake. Trust lost in 1 incident takes 6–12 months of consistent reliability to fully rebuild. The PMs who handle incidents well often come out stronger than before — the incident becomes evidence of their judgment under pressure.
Train PM Judgment Under Pressure Daily
Daily scenarios on hard calls, fast decisions, and communicating under stress.
Start Free Trial →