What is data compression? What benefit does it provide?
Data compression is the process of encoding information using fewer bits. It allows for smaller storage and network transmission overhead.
What are the two main categories of data compression types? How do they differ?
Lossless and lossy. Lossless indicates no data is lost during compression (e.g. ZIP). Lossy comes with some type of data loss or degradation, especially with media files.
What are some downsides of using compression?
Generally computational processing overhead for handling compression and decompression; Lossy compression can degrade quality.
What’s data duplication?
The process of identifying and removing duplicate copies of data within a system.
What types of systems most commonly implement deduplication processes?
Typically any storage oriented ones (e.g. backup software)
What two phases are there to the deduplication process commonly?
Identify the duplicates to remove redundant copies which are typically replaced with references to the original copy.
What are two pros of deduplication processes? What about cons?
PROS
Significantly reduces storage costs in environments with lots of redundant files.
Improves efficiency of storage oriented processes like backups.
CONS
Deduplication only affects exact matches and identifying these can be an expensive process.
What are the key differences between compression and deduplication with regard to: method, scope, and restoration?
METHOD
Compression eliminates redundant information within a file to reduce size. Deduplication removes redundant files or data blocks entirely.
SCOPE
Compression typically affects a single file, deduplication is typically applied to a larger dataset or storage system.
RESTORATION
Compressed data can typically be restored to its original form (sans lossy compression) while deduplication relies on existing references.