What is the fundamental goal of fault tolerance in a system?
To allow the system to continue operating correctly and provide its services even in the presence of hardware or software faults.
Differentiate between Availability and Reliability with an example.
Availability measures if a system is ready for use at a given moment (e.g., 99.99% uptime). Reliability measures continuous operation without failure over a long period (e.g., a server that never crashes but is taken down for maintenance is reliable but not 100% available).
A system controlling a nuclear power plant fails. Which dependability attribute—Safety or Maintainability—is most critical in this context, and why?
Safety. It measures how safe failures are. In a nuclear plant, a failure must not lead to a catastrophic event, even if the system halts.
Describe the relationship between a Fault, an Error, and a Failure.
A Fault (e.g., a software bug) is the cause. It leads to an Error (an internal, incorrect system state). The error may result in a Failure (the external, observable deviation from the specified behavior).
What is the key difference between Fault Avoidance and Fault Removal?
Fault Avoidance is proactive, preventing faults from being introduced during the design phase. Fault Removal is reactive, finding and removing faults before the system enters service.
Explain the fault tolerance level known as Graceful Degradation.
The system continues to operate in the presence of faults, but with a reduced level of functionality or performance. It’s a “fail-soft” approach.
What is the primary purpose of RAID technology?
To combine multiple physical disk drives into a single logical unit to improve performance, provide fault tolerance, or both.
Describe the Mirroring technique used in RAID.
Mirroring is the 100% duplication of data onto two or more disks. It provides high data redundancy but reduces usable storage capacity by half.
Describe the Striping technique used in RAID.
Data is split into blocks (stripes) and written across multiple disks in parallel. This improves performance but, on its own, provides no redundancy.
Why does RAID 0 offer no fault tolerance?
Because it uses only striping without any form of redundancy (like mirroring or parity). The failure of any single disk results in the loss of all data, as parts of every file are on every disk.
What is the main disadvantage of RAID 1 (Mirroring)?
A 50% loss of usable storage capacity, as all data is duplicated.
How does Parity provide fault tolerance in RAID?
Parity calculates an extra data block from the original data blocks. If one disk fails, the missing data can be mathematically reconstructed using the remaining data blocks and the parity block.
A RAID 5 array with four 1TB disks has a total raw storage of 4TB. What is its usable capacity?
3TB. In RAID 5, the usable capacity is (n-1) * disk_size, where n is the number of disks. One disk’s worth of space is used for parity.
What key advantage does RAID 6 have over RAID 5?
RAID 6 can withstand the simultaneous failure of two disks without data loss, whereas RAID 5 can only withstand a single disk failure.
What is the function of a Hot Spare in a RAID configuration?
A hot spare is an extra, unused disk that automatically takes over and begins rebuilding data immediately when an active disk in the array fails.
Differentiate between RAID 10 and RAID 0+1.
RAID 10 is a “stripe of mirrors” (data is striped across multiple mirrored pairs). RAID 0+1 is a “mirror of stripes” (two RAID 0 stripes are mirrored). Their fault tolerance characteristics differ: in RAID 10, one disk from each mirrored pair can fail safely.
Why is it critical to perform tape backups even when using a fault-tolerant RAID system?
RAID protects against physical disk failure. Tape backups protect against other data loss scenarios like human error, software bugs, corruption, or multiple simultaneous disk failures that exceed the RAID’s redundancy.
What is a primary disadvantage of Software RAID compared to Hardware RAID?
It consumes host system resources (CPU, memory, and bus bandwidth) to perform RAID calculations, which can impact the performance of other applications running on the same system.
In storage management, what is the problem with over-provisioning storage?
It leads to extremely low storage utilization, wasting capital on purchased but unused storage and increasing administrative overhead to manage this unused capacity.
What is the key reason Allocation-Based storage capacity management often fails?
It assumes that 100% of the allocated storage (e.g., a LUN) will be 100% used by the application, which is almost never true in practice, leading to low utilization rates (~36%).
What is one financial downside of purchasing storage too far in advance of need?
It fails to take advantage of constantly falling storage prices (approx. 25% per year), meaning you pay more for the same capacity than if you had waited.
Differentiate between Lossless and Lossy compression.
Lossless compression reduces file size without losing any original data (e.g., ZIP, PNG). Lossy compression permanently removes some data to achieve a smaller size, sacrificing quality (e.g., JPEG, MP3).
Why is it generally ineffective to try to compress a JPEG or MP3 file further using a tool like WinZip?
These formats are already compressed using lossy algorithms. They contain very little redundant data left for a standard compression tool to exploit, resulting in minimal size reduction.
Besides saving space, name one significant advantage of file compression.
It allows for password protection of archives, adding a layer of security when storing or transferring sensitive data.