403 — Cardinality, Sampling & Retention

Advanced

Learn to manage observability costs and performance by controlling metric cardinality, configuring trace and log sampling, setting retention policies, and capacity planning for your monitoring infrastructure.

Learning Objectives

Identify and fix high-cardinality metrics

Configure trace and log sampling

Set appropriate retention policies

Optimize observability infrastructure costs

Step 1

What is cardinality and why it matters

Cardinality is the number of unique time series a metric produces. High cardinality is the number one cause of Prometheus performance problems and observability cost overruns.

Commands to Run

cat <<'EOF'
=== UNDERSTANDING CARDINALITY ===

A metric with labels creates one time series per unique label combination:

http_requests_total{method="GET", status="200", endpoint="/api/users"}
http_requests_total{method="POST", status="201", endpoint="/api/users"}
http_requests_total{method="GET", status="404", endpoint="/api/orders"}

Cardinality = methods(4) × statuses(5) × endpoints(20) = 400 series

Now add a high-cardinality label:
http_requests_total{..., user_id="abc123"}

With 100,000 users:
Cardinality = 4 × 5 × 20 × 100,000 = 40,000,000 series!

That single label turned 400 series into 40 MILLION.
EOF

What This Does

Cardinality grows as the product of all unique label values.

Each time series consumes memory in Prometheus (about 1-2 KB per series).

At 40 million series, that is 40-80 GB of RAM just for one metric.

High cardinality causes slow queries, high memory usage, and eventually crashes.

The rule of thumb: never use unbounded values (user IDs, request IDs, email addresses) as metric labels.

Expected Outcome

You see a worked example showing how adding a user_id label explodes cardinality from 400 to 40 million series.

Pro Tips

1
Keep total active series under 1 million for a single Prometheus instance
2
Use labels with bounded, low-cardinality values: method, status_code, region, service_name

Was this step helpful?