Chaos Engineering
The practice of intentionally injecting failures into systems to build resilience.
The practice of intentionally injecting failures into systems to build resilience.
## "Breaking Things on Purpose" **Chaos Engineering** is the discipline of experimenting on a system to build confidence in its capability to withstand turbulent conditions. ### The Scientific Method It is not just "breaking stuff." It follows a process: 1. **Hypothesis**: "If we kill the checkout service, the site will still serve the homepage." 2. **Experiment**: Kill the checkout service. 3. **Observation**: Did the homepage load? Or did the whole site crash? 4. **Learning**: Fix the dependency. ### Principles * **Minimize Blast Radius**: Don't take down the whole site. Start with 1% of users. * **Stop Button**: Always have a big red button to stop the experiment instantly. * **Production**: Staging is not Production. Eventually, you must test in Prod.
ExChaos Monkey
"Netflix created Chaos Monkey to randomly kill servers in production."
Why Chaos Engineering Matters
Systems will fail. Better to fail on your terms than during a real incident.
Chaos engineering exposes weaknesses before customers do.