Monitoring and Operations Flashcards

(10 cards)

1
Q

What is observability?

A

Observability is the ability to understand a system’s internal state from external outputs. It includes three pillars: metrics (numerical data), logs (event records), and traces (request paths).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are metrics in monitoring?

A

Metrics are numerical measurements collected over time, like CPU usage, request rate, error rate. They enable alerting, trend analysis, and capacity planning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is logging?

A

Logging records events and state changes in the system. Logs help debug issues, audit activity, and understand system behavior. Centralized logging aggregates logs from multiple sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is distributed tracing?

A

Distributed tracing tracks requests across multiple services, showing the complete path and timing. Tools like Jaeger or Zipkin help identify bottlenecks in microservices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are SLIs

A

SLOs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is alerting?

A

Alerting automatically notifies teams when metrics exceed thresholds or anomalies occur. Good alerts are actionable, low noise, and prioritized by severity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between push and pull monitoring?

A

Push: services actively send metrics to monitoring system. Pull: monitoring system scrapes metrics from services. Prometheus uses pull; StatsD uses push.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a health check endpoint?

A

A health check endpoint (like /health) returns the service’s status. Load balancers and orchestrators use it to determine if a service is ready to receive traffic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is log aggregation?

A

Log aggregation collects logs from multiple sources into a central system for searching and analysis. Tools: ELK stack (Elasticsearch, Logstash, Kibana), Splunk, or Grafana Loki.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is anomaly detection in monitoring?

A

Anomaly detection uses statistical methods or machine learning to identify unusual patterns in metrics automatically, rather than relying only on static thresholds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly