reliability
Reliability is the ability of a system to function correctly for a specified period. All components (hardware, software, network) must work together — like a car: if one component fails (e.g., ignition), the system fails.
MTBF and MTTR
• MTBF (Mean Time Between Failures): Average time a system runs before failing • MTTR (Mean Time To Repair): Time taken to recover from failure Higher MTBF + Lower MTTR = Better reliability
MTBF and MTTR calculation
MTBF = 96 hours MTTR = 72 hours Indicates poor reliability due to long recovery time.
availability
Availability = percentage of time system is operational Measured as uptime over total time (e.g., 99.9% availability).
five nines availability
99.999% availability Means extremely low downtime (~minutes per year), used for critical systems.
high availability
High availability Achieved using: • Redundancy • Failover systems • Distributed resources
high availability with downtime
Availability cannot be perfect — influenced by: • Fault tolerance (redundancy) • Scalability (handling load) • Recoverability (restoring quickly after failure)