Monitoring and Operations Flashcards

Question 1

Q

What is observability?

Answer

A

Observability is the ability to understand a system’s internal state from external outputs. It includes three pillars: metrics (numerical data), logs (event records), and traces (request paths).

Question 2

Q

What are metrics in monitoring?

Answer

A

Metrics are numerical measurements collected over time, like CPU usage, request rate, error rate. They enable alerting, trend analysis, and capacity planning.

Question 3

Q

What is logging?

Answer

A

Logging records events and state changes in the system. Logs help debug issues, audit activity, and understand system behavior. Centralized logging aggregates logs from multiple sources.

Question 4

Q

What is distributed tracing?

Answer

A

Distributed tracing tracks requests across multiple services, showing the complete path and timing. Tools like Jaeger or Zipkin help identify bottlenecks in microservices.

Question 5

Q

What are SLIs

Question 6

Q

What is alerting?

Answer

A

Alerting automatically notifies teams when metrics exceed thresholds or anomalies occur. Good alerts are actionable, low noise, and prioritized by severity.

Question 7

Q

What is the difference between push and pull monitoring?

Answer

A

Push: services actively send metrics to monitoring system. Pull: monitoring system scrapes metrics from services. Prometheus uses pull; StatsD uses push.

Question 8

Q

What is a health check endpoint?

Answer

A

A health check endpoint (like /health) returns the service’s status. Load balancers and orchestrators use it to determine if a service is ready to receive traffic.

Question 9

Q

What is log aggregation?

Answer

A

Log aggregation collects logs from multiple sources into a central system for searching and analysis. Tools: ELK stack (Elasticsearch, Logstash, Kibana), Splunk, or Grafana Loki.

Question 10

Q

What is anomaly detection in monitoring?

Answer

A

Anomaly detection uses statistical methods or machine learning to identify unusual patterns in metrics automatically, rather than relying only on static thresholds.

Monitoring and Operations Flashcards

(10 cards)