The Four Golden Signals
The four key metrics that represent the health of a system: Latency, Traffic, Errors, and Saturation.
The four key metrics that represent the health of a system: Latency, Traffic, Errors, and Saturation.
## "The Vital Signs" Just as a doctor checks heart rate and blood pressure, an SRE checks the **Four Golden Signals**. ### 1. Latency The time it takes to service a request. * *Tip*: Distinguish between success latency (fast) and error latency (could be very fast or very slow). ### 2. Traffic A measure of how much demand is being placed on your system. * Web: Requests per second (RPS). * Audio: Concurrent streams. ### 3. Errors The rate of requests that fail. * Explicit: HTTP 500s. * Implicit: HTTP 200s with "Success: False" body (content errors). ### 4. Saturation How "full" your service is. * CPU usage, Memory, Disk I/O. * Once saturation hits 100%, performance degrades rapidly (latency spikes).
ExThe Slow Disk
"A service was slow, but CPU and Memory were low. No errors were firing."
Why Four Golden Signals Matters
Standardized by Google SRE, these signals give you a high-level view of any system's health.
Monitoring these four signals is often enough to detect most user-facing incidents.