404 — Incident Response with Observability

Advanced

Learn to use observability tools for rapid incident detection, triage, investigation, and resolution. Build runbooks, practice structured postmortems, and foster a blameless culture.

Learning Objectives

Follow a structured incident response workflow

Use observability tools for rapid investigation

Write effective postmortems

Create incident response runbooks

Step 1

Incident response workflow overview

Understand the five phases of incident response: detect, triage, investigate, mitigate, and review. Every incident follows this cycle.

Commands to Run

cat <<'EOF'
=== INCIDENT RESPONSE WORKFLOW ===

  ┌──────────┐
  │  DETECT   │ ← Alerts fire, users report issues
  └────┬─────┘
       ▼
  ┌──────────┐
  │  TRIAGE   │ ← How bad? Who is affected? What severity?
  └────┬─────┘
       ▼
  ┌─────────────┐
  │ INVESTIGATE  │ ← Metrics → Logs → Traces (drill down)
  └────┬────────┘
       ▼
  ┌──────────┐
  │ MITIGATE  │ ← Rollback, scale up, toggle feature flag
  └────┬─────┘
       ▼
  ┌──────────┐
  │  REVIEW   │ ← Postmortem, action items, learn
  └──────────┘
EOF

What This Does

The biggest mistake during incidents is skipping triage and jumping straight to investigation. Triage determines severity, identifies who needs to be involved, and decides the communication strategy. A P1 incident (customer-facing outage) requires different actions than a P3 (degraded internal tool). Following the workflow ensures nothing is missed and response is coordinated.

Expected Outcome

You see the five-phase incident response workflow displayed as a flowchart.

Pro Tips

1
Print this workflow and keep it near your desk — during high-stress incidents, having a checklist prevents mistakes
2
The first goal in mitigation is to STOP THE BLEEDING, not to find the root cause

Was this step helpful?