204 — Alerting with Alertmanager

Intermediate

Set up Prometheus Alertmanager for production alerting. Write alerting rules based on symptoms, configure routing and receivers, understand grouping, inhibition, and silencing, and learn best practices to avoid alert fatigue.

Learning Objectives

Write effective Prometheus alerting rules

Configure Alertmanager routing and receivers

Apply symptom-based alerting to reduce fatigue

Test and verify alert delivery

Step 1

Alertmanager architecture

Understand how alerting works in the Prometheus ecosystem. Prometheus evaluates alert rules, Alertmanager handles notification routing, deduplication, grouping, and silencing.

Commands to Run

cat <<'EOF'
=== ALERTING ARCHITECTURE ===

  Prometheus              Alertmanager           Receivers
  +-----------+          +---------------+      +--------+
  | Alert     |  fires   | Route         |      | Slack  |
  | Rules     | -------> | Group         | ---> | Email  |
  | (PromQL)  |          | Deduplicate   |      | PagerD |
  +-----------+          | Silence       |      | Webhook|
                         | Inhibit       |      +--------+
                         +---------------+

=== ALERT STATES ===
inactive  -> Alert condition is false
pending   -> Alert condition is true, waiting for "for" duration
firing    -> Alert has been true for the full "for" duration -> sent to Alertmanager
EOF

What This Does

Prometheus and Alertmanager have separate responsibilities. Prometheus evaluates PromQL conditions and decides when to fire alerts. Alertmanager decides who to notify, how to group related alerts, when to suppress duplicates, and when to silence known issues. This separation means you can restart Alertmanager without losing alert state in Prometheus, and vice versa.

Expected Outcome

You see the architecture diagram showing the flow from Prometheus alerting rules through Alertmanager to receivers, plus the three alert states.

Pro Tips

1
The 'for' duration prevents flapping alerts — the condition must be true continuously for the full duration
2
Alertmanager runs as a separate service — it can be shared across multiple Prometheus instances
3
Alertmanager has its own web UI for viewing and silencing alerts

Was this step helpful?