What is a key-value store?
A distributed data store that supports two core operations: put(key, value) and get(key), optimized for speed, scalability, and simplicity.
Core functional operations of a key-value store?
Put: Store a value for a given key
Get: Retrieve the value associated with a key
What does configurable service mean in a key-value store?
The system allows applications to choose different consistency models (e.g., strong vs eventual) at store creation time, balancing consistency, availability, cost, and performance.
Can consistency configuration change at runtime?
❌ No. Consistency settings are fixed when a key-value store instance is created.
What does “ability to always write” mean?
The system accepts write operations at all times, even during failures or network partitions, prioritizing availability over consistency.
Which CAP property does “always write” favor?
Availability (A) over Consistency (C).
The CAP theorem states that a distributed system can only guarantee two out of three properties simultaneously: Consistency, Availability, and Partition tolerance.
Why is “always write” sometimes a functional requirement?
In applications like shopping carts, writes must succeed for correct behavior (e.g., users must always add items), as seen in systems inspired by Amazon.
What is hardware heterogeneity?
The ability to add servers with different capacities without upgrading existing ones, while still balancing load correctly.
What architecture supports hardware heterogeneity?
A peer-to-peer architecture with no distinguished nodes.
Key non-functional requirements of a key-value store?
Scalability
Fault tolerance
What does scalability mean in this context?
Operate on tens of thousands of servers
Support incremental scaling
Add/remove nodes with minimal service disruption
What does fault tolerance mean?
The system continues operating despite server or component failures.
Key differences between key-value stores and traditional databases?
-Limited queries (no joins or filters)
-Optimized for speed and scale
-Often eventually consistent
-Handle unstructured data
When are key-value stores most advantageous?
-Caching
-Session management
-Shopping carts
-Real-time analytics
-Massive, fast lookups
Why not run a key-value store on a single server?
Limited scalability
Single point of failure
Poor availability
Basic assumptions in this design?
Trusted data centers
Auth handled externally
Communication over HTTPS
What does the get(key) API do?
Retrieves the value(s) associated with a key. Under weak consistency, multiple versions may be returned.
What does the put(key, value) API do?
Stores the value for a key. The system decides data placement and maintains metadata (e.g., versions).
Why store metadata with values?
To support versioning, replication, and conflict resolution.
When should hashes be generated for data integrity checks?
After compression, before encryption
➡️ Compress → Hash → Encrypt
Why not hash encrypted data?
Encryption introduces randomness (IVs/salts), making hashes unreliable for integrity checks.
What data types do key-value stores support?
Key: Unique identifier (often hashed)
Value: Arbitrary binary data
How does Dynamo-style systems assign data to nodes?
By hashing the key (e.g., MD5) to produce an identifier used for data partitioning across nodes.