501 — Full Observability Stack Project

Advanced

Build a complete 7-service observability stack from scratch — OpenTelemetry Collector, Prometheus, Grafana, Loki, Tempo, Alertmanager, and a demo app — then debug a simulated production incident using metrics, logs, and traces together.

Learning Objectives

1
Build a complete 7-service observability stack
2
Instrument an application with OpenTelemetry
3
Configure Prometheus, Loki, and Tempo backends
4
Build Golden Signals dashboards in Grafana
5
Debug a production incident using all three observability pillars
6
Apply the investigation workflow: metrics → logs → traces
Step 1

Project setup and architecture review

Create the project directory structure and review the architecture diagram showing how all 7 services connect. You will build an OpenTelemetry Collector at the center, receiving telemetry from a demo app and exporting to Prometheus (metrics), Loki (logs), and Tempo (traces), with Alertmanager for notifications and Grafana for visualization.

Commands to Run

mkdir -p ~/observability-capstone/{app,otel,prometheus,grafana/{provisioning/datasources,provisioning/dashboards,provisioning/plugins,provisioning/notifiers,provisioning/alerting,dashboards},loki,tempo,alertmanager}
cd ~/observability-capstone && find . -type d | sort
cat << 'EOF'
=== ARCHITECTURE DIAGRAM ===

  ┌──────────────┐
  │  Demo App    │  (Node.js/Express)
  │  :4000       │
  └──────┬───────┘
         │ OTLP (gRPC :4317)
         v
  ┌──────────────────┐
  │  OTel Collector   │
  │  :4317 / :4318    │
  └──┬──────┬──────┬──┘
     │      │      │
     v      v      v
  ┌─────┐ ┌────┐ ┌─────┐
  │Prom │ │Loki│ │Tempo│
  │:9090│ │:3100│ │:3200│
  └──┬──┘ └──┬─┘ └──┬──┘
     │       │      │
     v       v      v
  ┌─────────────────────┐
  │      Grafana        │
  │      :3001          │
  └─────────────────────┘
         ^
         │
  ┌──────────────┐
  │ Alertmanager │
  │ :9093        │
  └──────────────┘
EOF

What This Does

The architecture follows the OpenTelemetry best practice: your app sends all telemetry to a single OTel Collector via OTLP (OpenTelemetry Protocol). The Collector then routes metrics to Prometheus, logs to Loki, and traces to Tempo. Grafana connects to all three backends for unified dashboards. Alertmanager receives firing alerts from Prometheus and handles notification routing.

Expected Outcome

You see the directory tree with folders for each service, and the architecture diagram showing how all 7 services connect via network ports.

Pro Tips

  • 1
    Keep this architecture diagram handy — every configuration file you write maps to an arrow in this diagram
  • 2
    All services communicate over a shared Docker network, so you can use service names as hostnames
Was this step helpful?

All Steps (0 / 15 completed)