Learn to manage observability costs and performance by controlling metric cardinality, configuring trace and log sampling, setting retention policies, and capacity planning for your monitoring infrastructure.
Cardinality is the number of unique time series a metric produces. High cardinality is the number one cause of Prometheus performance problems and observability cost overruns.
cat <<'EOF'
=== UNDERSTANDING CARDINALITY ===
A metric with labels creates one time series per unique label combination:
http_requests_total{method="GET", status="200", endpoint="/api/users"}
http_requests_total{method="POST", status="201", endpoint="/api/users"}
http_requests_total{method="GET", status="404", endpoint="/api/orders"}
Cardinality = methods(4) × statuses(5) × endpoints(20) = 400 series
Now add a high-cardinality label:
http_requests_total{..., user_id="abc123"}
With 100,000 users:
Cardinality = 4 × 5 × 20 × 100,000 = 40,000,000 series!
That single label turned 400 series into 40 MILLION.
EOFCardinality grows as the product of all unique label values. Each time series consumes memory in Prometheus (about 1-2 KB per series). At 40 million series, that is 40-80 GB of RAM just for one metric. High cardinality causes slow queries, high memory usage, and eventually crashes. The rule of thumb: never use unbounded values (user IDs, request IDs, email addresses) as metric labels.
You see a worked example showing how adding a user_id label explodes cardinality from 400 to 40 million series.