Monitoring & Observability

From first metric to full observability stack

Master modern observability with OpenTelemetry, Prometheus, Grafana, Loki, and Tempo. Learn to instrument applications, build dashboards, correlate signals, define SLOs, and debug production systems.

5 modules
18 lessons
186 steps
540 min total
Start with Lesson 1
What You Need

All lessons run locally using Docker containers. No cloud accounts or paid services required.

  • βœ”Docker Desktop installed and running
  • βœ”Basic terminal / command-line knowledge
  • βœ”A code editor (VS Code recommended)
Stack Overview
  • OpenTelemetry β€” instrumentation
  • Prometheus β€” metrics
  • Grafana β€” dashboards
  • Loki β€” logs
  • Tempo β€” traces
Time Estimate
  • ~9 hours across 18 lessons
  • Work at your own pace, one module at a time
  • Each lesson includes commands, explanations & expected outcomes

Learning Path

Progress through these modules to master monitoring and observability. Click any module to expand and view lessons.

5 modules β€’ 1 open

1

101 β€” Introduction to Observability

Understand the three pillars of observability (metrics, logs, traces), learn the Four Golden Signals, RED and USE methods, and see why observability matters for modern systems. Starting state: nothing required. After this lesson: conceptual foundation for the course, plus a quick taste of real Prometheus metrics.

Beginner20 minutes9 steps
Required
Basic understanding of web applicationsFamiliarity with the terminal
2

102 β€” Setting Up Your Observability Lab

Build a local observability stack with Docker Compose. You'll run Prometheus for metrics collection, Grafana for visualization, and an OpenTelemetry Collector to receive and route telemetry data. Starting state: Docker installed, no prior lab. After this lesson: ~/observability-lab/ running with Prometheus (:9090), Grafana (:3001), and OTel Collector (:4317/:4318/:8889). This lab directory is used for every lesson from 102 through 405.

Beginner25 minutes10 steps
Required
Docker installed and runningLesson 101 complete
3

103 β€” OpenTelemetry Fundamentals

Instrument a Node.js application with the OpenTelemetry SDK. You'll learn the OTel architecture, add auto-instrumentation for traces and metrics, configure exporters to send data to your OTel Collector, and see real telemetry flowing through your observability stack. Starting state: ~/observability-lab/ running from Lesson 102 (Prometheus, Grafana, OTel Collector). After this lesson: a demo Node.js app at ~/observability-lab/app/ sending metrics and traces to your stack.

Beginner25 minutes10 steps
Required
Lesson 102 complete (observability lab running)Node.js 18+ installed
4

104 β€” Your First Metrics & Dashboard

Learn Prometheus metric types and PromQL, then build a Grafana dashboard from scratch. You'll create panels for request rate, error rate, and latency percentiles, and assemble a Four Golden Signals dashboard. Starting state: ~/observability-lab/ running with demo app from Lesson 103. After this lesson: a Four Golden Signals dashboard in Grafana, completing Module 1.

Beginner25 minutes11 steps
Required
Lesson 103 complete (instrumented app has generated metrics)
1

201 β€” PromQL Queries & Metric Types

Master the Prometheus Query Language from selectors and matchers to advanced aggregations. Learn the difference between instant and range vectors, use rate() and histogram_quantile(), and build production-ready dashboard queries. Starting state: ~/observability-lab/ running with Prometheus, Grafana, OTel Collector, and demo app from Module 1. After this lesson: you can write PromQL queries for any metric in your stack.

Intermediate30 minutes11 steps
Required
Monitoring Module 1 completeLab stack running (Prometheus, Grafana, demo app)
2

202 β€” Instrumentation Patterns & Custom Metrics

Add custom application metrics to your existing demo app using the OpenTelemetry SDK. You'll create counters, gauges, and histograms for business logic, verify them in Prometheus, and build Grafana dashboards. Starting state: ~/observability-lab/ running from Module 1 (Prometheus, Grafana, OTel Collector, demo app code at ~/observability-lab/app/). After this lesson: demo app enhanced with custom metrics (app_http_requests_total, app_orders_processed_total, app_active_connections, app_http_request_duration_seconds) and new endpoints (/order, /slow, /error).

Intermediate30 minutes10 steps
Required
Module 1 complete (Lessons 101-104)Lab stack running: cd ~/observability-lab && docker compose up -d
3

203 β€” Exporters, Blackbox Monitoring & Recording Rules

Deploy Node Exporter for host metrics and Blackbox Exporter for synthetic endpoint monitoring. Write Prometheus recording rules to pre-compute expensive queries for fast dashboards and reliable alerting.

Intermediate30 minutes10 steps
Required
Lesson 202 completeLab stack running (Prometheus, Grafana, demo app)
4

204 β€” Alerting with Alertmanager

Set up Prometheus Alertmanager for production alerting. Write alerting rules based on symptoms, configure routing and receivers, understand grouping, inhibition, and silencing, and learn best practices to avoid alert fatigue.

Intermediate30 minutes10 steps
Required
Lesson 203 completeLab stack running (Prometheus, Grafana, Node Exporter, Blackbox Exporter)
1

301 β€” Structured Logging for Observability

Learn why structured JSON logs are essential for observability, how to include trace context for correlation, and how to configure logging levels, context propagation, and the OpenTelemetry log bridge API.

Intermediate30 minutes10 steps
Required
Monitoring Module 2 completeNode.js installed locallyObservability lab running (docker compose up)
2

302 β€” Centralized Logging with Grafana Loki

Deploy Grafana Loki for centralized log aggregation, configure the OTel Collector to export logs to Loki, learn LogQL query language, and build log exploration dashboards in Grafana.

Intermediate30 minutes10 steps
Required
Lesson 301 completeDocker and docker compose installedObservability lab running (docker compose up)
3

303 β€” Distributed Tracing with Grafana Tempo

Deploy Grafana Tempo to collect and query distributed traces, configure the OTel Collector to export traces via OTLP, learn to read span waterfalls, apply sampling strategies, and troubleshoot slow requests using trace data.

Intermediate30 minutes10 steps
Required
Lesson 302 completeDocker and docker compose installedObservability lab running (docker compose up)
4

304 β€” Signal Correlation: Metrics, Logs & Traces

Master the art of cross-signal correlation in Grafana β€” link metrics to traces via exemplars, navigate from logs to traces via trace_id, and debug incidents using all three observability pillars together.

Advanced35 minutes10 steps
Required
Lesson 303 completePrometheus, Loki, Tempo, and Grafana runningOTel Collector configured for all three signals
1

401 β€” SLIs, SLOs & Error Budgets

Learn to define Service Level Indicators (SLIs), set Service Level Objectives (SLOs), and manage error budgets to make data-driven reliability decisions.

Advanced25 minutes10 steps
Required
Monitoring Module 3 completeFamiliarity with Prometheus and PromQLUnderstanding of metrics, logs, and traces
2

402 β€” Dashboard Design & Alert Quality

Learn dashboard design principles using Google's Golden Signals and Brendan Gregg's USE method layouts, choose the right visualizations, measure alert quality, and provision dashboards as code.

Advanced25 minutes10 steps
Required
Lesson 401 completeFamiliarity with Grafana basicsUnderstanding of SLIs and SLOs
3

403 β€” Cardinality, Sampling & Retention

Learn to manage observability costs and performance by controlling metric cardinality, configuring trace and log sampling, setting retention policies, and capacity planning for your monitoring infrastructure.

Advanced25 minutes10 steps
Required
Lesson 402 completeExperience with Prometheus and GrafanaUnderstanding of OTel Collector basics
4

404 β€” Incident Response with Observability

Learn to use observability tools for rapid incident detection, triage, investigation, and resolution. Build runbooks, practice structured postmortems, and foster a blameless culture.

Advanced25 minutes10 steps
Required
Lesson 403 completeExperience with Grafana dashboards and Prometheus alertsUnderstanding of metrics, logs, and traces correlation
5

405 β€” Observability as Code

Learn to version-control all monitoring configuration β€” Grafana dashboards, Prometheus rules, Alertmanager routing, and OTel Collector pipelines β€” using GitOps workflows and Terraform.

Advanced25 minutes10 steps
Required
Lesson 404 completeFamiliarity with Git version controlUnderstanding of Prometheus, Grafana, and OTel Collector configuration

Ready to Master Observability?

Work through these lessons at your own pace. Each step includes commands, explanations, and expected outcomes.