Error Budget
The amount of unreliability a service can have before violating its SLO.
The amount of unreliability a service can have before violating its SLO.
## "Permission to Fail" An **Error Budget** is the allowable amount of downtime you can experience in a month without making your users unhappy. ### The Philosophy 100% reliability is expensive and slows you down. If your users are happy with 99.9% reliability, then aim for 99.9%. The remaining **0.1% is your budget**. You can spend this budget on: * Risky feature launches. * System experiments. * Chaos engineering. ### The Error Budget Policy The real power comes from the **policy**: what happens when you run out of budget? * **Budget > 0**: Ship features fast. * **Budget < 0**: Stop feature work. Focus on reliability (sprints, freezes) until the budget refills.
ExThe Feature Freeze
"A team pushed bad code and caused a 4-hour outage, burning 100% of their quarterly Error Budget."
Why Error Budget Matters
Error budgets prevent perfectionism. If you have budget left, you can take risks. If not, stabilize.
Error budgets turn reliability into a strategic tradeoff, not a binary goal.
The Formula
Error Budget = 100% - SLO