What is observability?
Observability is the ability to understand a system’s internal state from external outputs. It includes three pillars: metrics (numerical data), logs (event records), and traces (request paths).
What are metrics in monitoring?
Metrics are numerical measurements collected over time, like CPU usage, request rate, error rate. They enable alerting, trend analysis, and capacity planning.
What is logging?
Logging records events and state changes in the system. Logs help debug issues, audit activity, and understand system behavior. Centralized logging aggregates logs from multiple sources.
What is distributed tracing?
Distributed tracing tracks requests across multiple services, showing the complete path and timing. Tools like Jaeger or Zipkin help identify bottlenecks in microservices.
What are SLIs
SLOs
What is alerting?
Alerting automatically notifies teams when metrics exceed thresholds or anomalies occur. Good alerts are actionable, low noise, and prioritized by severity.
What is the difference between push and pull monitoring?
Push: services actively send metrics to monitoring system. Pull: monitoring system scrapes metrics from services. Prometheus uses pull; StatsD uses push.
What is a health check endpoint?
A health check endpoint (like /health) returns the service’s status. Load balancers and orchestrators use it to determine if a service is ready to receive traffic.
What is log aggregation?
Log aggregation collects logs from multiple sources into a central system for searching and analysis. Tools: ELK stack (Elasticsearch, Logstash, Kibana), Splunk, or Grafana Loki.
What is anomaly detection in monitoring?
Anomaly detection uses statistical methods or machine learning to identify unusual patterns in metrics automatically, rather than relying only on static thresholds.