104 — Your First Metrics & Dashboard

Beginner

Learn Prometheus metric types and PromQL, then build a Grafana dashboard from scratch. You'll create panels for request rate, error rate, and latency percentiles, and assemble a Four Golden Signals dashboard. Starting state: ~/observability-lab/ running with demo app from Lesson 103. After this lesson: a Four Golden Signals dashboard in Grafana, completing Module 1.

Learning Objectives

Understand the four Prometheus metric types (counter, gauge, histogram, summary)

Write basic PromQL queries including rate() and histogram_quantile()

Build a Grafana dashboard with panels for request rate, error rate, and latency

Visualize the Four Golden Signals on a single dashboard

Step 1

Understand Prometheus metric types

Prometheus has four core metric types. Understanding them is essential for writing correct queries.

Commands to Run

cat <<'EOF'
=== PROMETHEUS METRIC TYPES ===

1. COUNTER — only goes up (resets on restart)
   Example: http_server_request_duration_seconds_count = 15,234
   Use for: request counts, error counts, bytes sent
   Query with: rate() to get per-second change

2. GAUGE — goes up and down
   Example: temperature_celsius = 21.5
   Use for: CPU usage, memory, queue depth, active connections
   Query directly: no rate() needed

3. HISTOGRAM — counts observations in configurable buckets
   Example: http_server_request_duration_seconds_bucket{le="0.5"} = 4,891
   Use for: latency distributions, request sizes
   Query with: histogram_quantile() for percentiles

4. SUMMARY — similar to histogram but calculates quantiles client-side
   Example: rpc_duration_seconds{quantile="0.99"} = 0.23
   Use for: pre-calculated percentiles (less flexible than histograms)
   Generally prefer histograms over summaries
EOF

What This Does

Counters and histograms are the most important types for monitoring services.

Counters track things that accumulate (requests, errors, bytes).

Histograms track distributions (latency, sizes) by bucketing observations.

The key rule: never apply rate() to a gauge, and always apply rate() to a counter before graphing it — a raw counter value (15,234 total requests) isn't useful, but the rate (120 requests/second) is.

Expected Outcome

You see all four metric types printed with examples and usage guidelines.

Pro Tips

1
When in doubt, use a histogram for anything related to latency or size
2
Counters reset to 0 on process restart — rate() handles this automatically
3
Gauges are the only type where the raw value is directly meaningful

Was this step helpful?