Observability
The measure of how well internal states of a system can be inferred from knowledge of its external outputs.
The measure of how well internal states of a system can be inferred from knowledge of its external outputs.
## "Why is it broken?" **Observability (o11y)** is the ability to answer *new* questions about your system without shipping new code. ### Monitoring vs. Observability * **Monitoring**: Tells you **when** something is wrong. ("Alert: CPU is 99%"). * **Observability**: Tells you **why** it is wrong. ("It's because User 123 sent a malformed JSON payload that triggered a regex loop in the Search Service"). ### The Three Pillars 1. **Metrics**: Aggregates. "Are we slow?" (Cheap, fast). 2. **Logs**: Events. "What happened?" (Detailed, expensive). 3. **Traces**: Context. "Where did the time go?" (Connects the dots).
ExThe Mystery Latency
"A checkout API was intermittently slow (5s response time) for 1% of requests. Metrics showed "High Average Latency" but not why."
Why Observability Matters
Observability goes beyond monitoring. It helps you ask questions you didn't know to ask.
Good observability means faster MTTD because you can see what's happening in your systems in real-time.