Why is Amazon CloudWatch essential for managing AWS resources, and what key problem does it solve?
It provides visibility into AWS resources and applications by collecting metrics in real time. It helps answer: • When to scale (e.g., launch EC2) • If performance/availability is degrading • How resources are being used
What are CloudWatch metrics, and give examples of how they are used in practice?
Metrics are measurable variables for resources and applications. Examples: • EC2 CPU utilisation • Load balancer latency • SQS queue length • Billing metrics Used to monitor performance and trigger actions.
Explain how CloudWatch alarms work, including thresholds and evaluation.
• Alarm monitors a metric • A threshold is defined (e.g., CPU > 80%) • Evaluated over a time period (evaluation periods) • Triggered when condition is met for required data points Supports: • Static thresholds • Anomaly detection • Metric math expressions
What actions can CloudWatch alarms trigger, and why is this important for system reliability?
Actions include: • Sending notifications (SNS) • Triggering Auto Scaling • Executing automated responses This enables automatic reaction to issues, improving reliability and reducing downtime.
Explain the difference between CloudWatch metrics, alarms, and events, and how they work together.
• Metrics → raw performance data • Alarms → evaluate metrics against conditions • Events → detect changes in AWS environment and trigger workflows Events can route actions to: • Lambda • SNS • SQS • ECS • Step Functions Together they enable monitoring → detection → automated response pipeline.