Runbook
A step-by-step guide for handling specific operational tasks or incidents.
A step-by-step guide for handling specific operational tasks or incidents.
## The "Checklist" A **Runbook** is a recipe. It assumes the reader is smart but stressed. It focuses on **Action**. ### Elements of a Good Runbook 1. **Triggers**: "Use this when Alert X fires." 2. **Impact**: "This issue causes 500 errors on checkout." 3. **Steps**: * 1. Check Dashboard Y. * 2. If CPU > 90%, run command Z. * 3. If not, escalate to Database Team. 4. **Verification**: "How do I know it's fixed?" ### Runbook vs. Documentation * **Docs**: "Here is how the system works." (Read this on Tuesday morning). * **Runbook**: "Here is how to fix the system." (Read this at 3 AM on Saturday).
ExThe "Restart" Runbook
"A complex microservice required a specific restart order (DB -> Cache -> App)."
Why Runbook Matters
Runbooks reduce cognitive load during incidents. Follow the steps instead of figuring it out live.
Good runbooks enable on-call success and faster incident resolution.