On-Call Rotation
A schedule that determines which engineer is responsible for answering alerts during a specific time period.
A schedule that determines which engineer is responsible for answering alerts during a specific time period.
## The Watchtower of Reliability An **On-Call Rotation** is the mechanism that ensures software is reliable 24/7. It is a schedule where engineers take turns being "on duty," carrying a pager (or phone) to respond to system alerts. ### The "You Build It, You Run It" Philosophy In modern DevOps culture, the developers who write the code are also on-call for it. This creates a powerful feedback loop: 1. Dev ships buggy code. 2. Dev gets woken up at 3am. 3. Dev fixes the bug *and* adds safeguards so they sleep next time. 4. System reliability improves. ### Types of Rotations - **Weekly**: One person is primary for 7 days. Simple, but high burnout risk if the week is bad. - **Daily**: Shifts rotate every 24 hours. Good for high-stress services. - **Follow-the-Sun**: Teams in London, NY, and Sydney hand off shifts so everyone works during *their* daytime.
ExThe Burnout Spiral
"A team of 4 had a weekly rotation. One engineer quit. The remaining 3 were on-call every 3 weeks."
Why On-Call Rotation Matters
Ensures 24/7 coverage without burning out a single individual.
Creates clear accountability for who owns the system health at any given moment.
Incentivizes engineers to build reliable systems (since they are the ones woken up).