Subject Matter Expert (SME)
The technical specialist responsible for diagnosing and fixing the specific service or component causing the incident.
The technical specialist responsible for diagnosing and fixing the specific service or component causing the incident.
## The Hands on the Keyboard If the **Incident Commander** is the conductor, the **Subject Matter Expert (SME)** is the soloist. They are the engineer who knows *exactly* why the Redis queue is backing up or why the load balancer is throwing 502s. ### Responsibilities - **Diagnosis**: finding the root cause. - **Mitigation**: stopping the bleeding (e.g., adding capacity, rolling back). - **Communication**: telling the IC what they are doing *before* they do it. ### How to be a Great SME The best SMEs don't just fix things; they **communicate**. * **Bad SME**: Goes silent for 20 minutes, then says "Fixed it." * **Good SME**: "I suspect a bad migration. I am going to verify the schema version. (5 mins later) Verified. I requested permission to rollback." ### One Area of Focus An SME should focus on **one** thing. If the incident spans multiple services (e.g., Database + Frontend), you need multiple SMEs. The IC coordinates them; the SMEs fix the systems.
ExThe Hero Trap
"A senior engineer tried to fix a complex cascading failure alone. They worked for 12 hours straight without sleep."
Why Subject Matter Expert Matters
SMEs have the deep context that the IC lacks.
They are the only ones allowed to touch production during an incident.
They translate "it's broken" into "the connection pool is exhausted".