Incident Response
The process of detecting, responding to, and resolving system incidents or outages.
The process of detecting, responding to, and resolving system incidents or outages.
## "Put the Fire Out" **Incident Response** is the tactical, immediate action taken when things go wrong. It is the firefighter kicking down the door. ### The Lifecycle of Response 1. **Detect (T0)**: Monitoring alerts the team. 2. **Acknowledge**: On-call engineer responds. 3. **Mobilize**: Incident Commander assigned. Channel created. 4. **Triage**: Assess severity (SEV level). 5. **Mitigate**: Stop the bleeding (rollback, scaling). 6. **Resolve**: Restore full service. ### Speed vs. Accuracy The goal of incident response is **Mitigation**, not necessarily fixing the root cause. If a server is crashing, reboot it to get users back online. Debug *why* it crashed later (in the Post-Incident Review). **Want to understand the difference between incident response and incident management?** [Read our deep dive: Incident Management vs Incident Response — What Teams Get Wrong](/blog/incident-management-vs-incident-response).
ExThe Black Friday Crash
"Traffic spiked 10x on Black Friday, crashing the checkout service."
Why Incident Response Matters
Good incident response minimizes downtime, customer impact, and team stress.
Every organization will face incidents. The difference is how well you respond.