CAP Theorem
Can only have 2 of the following
What does consistency in the CAP theorem mean?
Every read receives the most recent write or an error
This is different from the consistncy in ACID. ACID consistency means every transaction brings the database from one valid state to another.
What does availability mean?
Every request receives a non-error response
What is partition tolerance?
The ability of a distributed database to continue processing data even if a network partition causes communication errors between subsystems.
5 Nines Uptime
About 5 minutes downtime per year
3 Nines Uptimes
< 9 hours of downtime per year
System Design
Redundancy
Replication
Process of synchronising state between nodes
Active replication - each message goes to each node
Passive replication - leader/follower relationship (eventual consistency). All writes go to the leader but reads are allowed from the followers. If the leader goes down, a follower gets promoted.
What’s wrong with key % n?
What is Consistent Hashing?
How does Consistent Hashing Work?
As a typical hash function, consistent hashing maps a key to an integer. Suppose the output of the hash function is in the range of [0, 256). Imagine that the integers in the range are placed on a ring such that the values are wrapped around.
Here’s how consistent hashing works:
Given a list of cache servers, hash them to integers in the range.
To map a key to a server,
Hash it to a single integer.
Move clockwise on the ring until finding the first cache it encounters.
To add a new server, say D, keys that were originally residing at C will be split. Some of them will be shifted to D, while other keys will not be touched.
To remove a cache or, if a cache fails, say A, all keys that were originally mapped to A will fall into B, and only those keys need to be moved to B; other keys will not be affected.
For load balancing, as we discussed in the beginning, the real data is essentially randomly distributed and thus may not be uniform. It may make the keys on caches unbalanced.
To handle this issue, we add “virtual replicas” for caches. Instead of mapping each cache to a single point on the ring, we map it to multiple points on the ring, i.e. replicas. This way, each cache is associated with multiple portions of the ring.
If the hash function “mixes well,” as the number of replicas increases, the keys will be more balanced.