303 β€” Distributed Tracing with Grafana Tempo

Intermediate

Deploy Grafana Tempo to collect and query distributed traces, configure the OTel Collector to export traces via OTLP, learn to read span waterfalls, apply sampling strategies, and troubleshoot slow requests using trace data.

Learning Objectives

1
Deploy and configure Grafana Tempo
2
Search and filter traces effectively
3
Read span waterfalls and identify bottlenecks
4
Apply sampling strategies for production
Step 1

Distributed tracing concepts

Understand the core concepts of distributed tracing β€” traces, spans, and context propagation β€” and why they matter when debugging microservices.

Commands to Run

cat <<'EOF'
=== DISTRIBUTED TRACING CONCEPTS ===

A TRACE is the full journey of a single request across services.
A SPAN is one unit of work within that trace (one function, one HTTP call).

=== ANATOMY OF A TRACE ===

  User clicks "Place Order"
  β”‚
  β”œβ”€ SPAN: api-gateway /checkout         [200ms total]
  β”‚   β”œβ”€ SPAN: order-api.createOrder  [150ms]
  β”‚   β”‚   β”œβ”€ SPAN: postgres INSERT        [ 20ms]
  β”‚   β”‚   └─ SPAN: payment-api.charge     [100ms]  ← slow!
  β”‚   β”‚       └─ SPAN: stripe-sdk.request [ 95ms]
  β”‚   └─ SPAN: notification.sendEmail     [ 30ms]
  β”‚
  Trace ID: 4bf92f3577b34da6a3ce929d0e0e4736

=== CONTEXT PROPAGATION ===

How does Service B know it is part of the same trace as Service A?

  Service A  ──HTTP──>  Service B  ──gRPC──>  Service C
    β”‚                     β”‚                      β”‚
  Injects               Extracts               Extracts
  traceparent            traceparent             traceparent
  header                 header                  header

W3C Trace Context header:
  traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
               version-trace_id-parent_span_id-trace_flags

The OTel SDK handles context propagation automatically.
EOF

What This Does

Distributed tracing solves the fundamental problem of microservices debugging: when a request crosses 5 or 10 services, how do you know which service is slow? A trace gives you a complete timeline. Each service creates spans (units of work), and the OTel SDK propagates a unique trace_id via HTTP headers (W3C Trace Context) so all spans from the same request can be assembled into a single trace. Context propagation happens automatically when you instrument your services with OpenTelemetry.

Expected Outcome

You see a visual representation of a trace as a tree of spans, the W3C Trace Context header format, and how context propagation links spans across services.

Pro Tips

  • 1
    A span always has a start time, duration, service name, operation name, and optional attributes
  • 2
    The W3C Trace Context standard (traceparent header) is supported by all major observability vendors
Was this step helpful?

All Steps (0 / 10 completed)