Observability in Modern DevOps: Logs, Metrics, and Traces

The Three Pillars

Observability is about understanding your system from the outside. The three pillars are logs, metrics, and traces — each provides a different lens on your application behavior.

Structured Logging

Plain text logs are hard to query. Use structured JSON logging:

logger.info({
  event: "user_login",
  userId: user.id,
  duration_ms: Date.now() - start,
  ip: req.ip
});

This makes logs searchable and alertable in tools like Loki, Elasticsearch, or Cloud Logging.

Metrics with Prometheus

Track business and technical metrics — request rates, error rates, latency percentiles (p50, p95, p99). Expose a /metrics endpoint and scrape with Prometheus.

Distributed Tracing

In a microservices architecture, a single request touches many services. Distributed tracing (OpenTelemetry, Jaeger) shows the full journey and where time is spent.

Alerting Strategy

Alert on symptoms not causes. "Error rate > 1%" is a good alert. "CPU > 80%" is not — CPU being high is not necessarily a problem for users.

Golden Signals

Google SRE recommends monitoring four golden signals: Latency, Traffic, Errors, and Saturation. Start with these before adding more complexity.