Why do distributed systems need data replication?
A single node can’t provide high availability, scalability, and low latency under failures. Replication spreads copies across nodes to keep systems fast and resilient.
What is data replication?
Keeping multiple copies of the same data across different nodes (often geographically distributed) to improve availability, performance, and read scalability.
What are the main benefits of replication?
Lower latency (data closer to users)
Higher availability (survives node failures)
Higher read throughput (more replicas serve reads)
What problems does replication introduce?
Keeping replicas consistent
Handling failed nodes
Choosing sync vs async replication
Managing replication lag
Handling concurrent writes
Picking a consistency model
How does synchronous replication work?
The primary waits for acknowledgments from all replicas before confirming the write to the client.
Trade-off: Strong consistency ✅, high latency ❌
How does asynchronous replication work?
The primary does not wait for replicas before responding to the client.
Trade-off: Low latency & high availability ✅, possible data loss ❌
What is primary-secondary replication?
One primary handles all writes, secondaries replicate data and serve reads.
Best for: Read-heavy workloads
Weakness: Primary bottleneck + write scalability limits
What happens if the primary node fails?
A secondary is promoted via:
-Manual failover (operator decides)
-Automatic leader election
What are common replication techniques?
Statement-based: Replicates SQL statements (simple, risky with nondeterminism)
WAL shipping: Replicates transaction logs (durable, tightly coupled) - WAL=write ahead log. Preserves consistency because it replicates actual changes instead of SQL, safe with nondeterministic functions
Logical (row-based): Replicates row changes (flexible, portable)
What issues arise with async replication?
Lost writes if primary crashes
Read-after-write inconsistency
Mitigation: Read user-modified data from the leader
What is multi-leader replication?
Multiple nodes accept writes and replicate to each other.
Pros: Better write scalability & offline support
Cons: Write conflicts are common
How does leaderless replication maintain consistency?
All nodes accept reads/writes. Consistency comes from quorums:
Write to w nodes
Read from r nodes
Guarantee correctness if w + r > n