Database Replication Flashcards

Question 1

Q

Why do distributed systems need data replication?

Answer

A

A single node can’t provide high availability, scalability, and low latency under failures. Replication spreads copies across nodes to keep systems fast and resilient.

Question 2

Q

What is data replication?

Answer

A

Keeping multiple copies of the same data across different nodes (often geographically distributed) to improve availability, performance, and read scalability.

Question 3

Q

What are the main benefits of replication?

Answer

A

Lower latency (data closer to users)
Higher availability (survives node failures)
Higher read throughput (more replicas serve reads)

Question 4

Q

What problems does replication introduce?

Answer

A

Keeping replicas consistent
Handling failed nodes
Choosing sync vs async replication
Managing replication lag
Handling concurrent writes
Picking a consistency model

Question 5

Q

How does synchronous replication work?

Answer

A

The primary waits for acknowledgments from all replicas before confirming the write to the client.
Trade-off: Strong consistency ✅, high latency ❌

Question 6

Q

How does asynchronous replication work?

Answer

A

The primary does not wait for replicas before responding to the client.
Trade-off: Low latency & high availability ✅, possible data loss ❌

Question 7

Q

What is primary-secondary replication?

Answer

A

One primary handles all writes, secondaries replicate data and serve reads.
Best for: Read-heavy workloads
Weakness: Primary bottleneck + write scalability limits

Question 8

Q

What happens if the primary node fails?

Answer

A

A secondary is promoted via:
-Manual failover (operator decides)
-Automatic leader election

Question 9

Q

What are common replication techniques?

Answer

A

Statement-based: Replicates SQL statements (simple, risky with nondeterminism)
WAL shipping: Replicates transaction logs (durable, tightly coupled) - WAL=write ahead log. Preserves consistency because it replicates actual changes instead of SQL, safe with nondeterministic functions
Logical (row-based): Replicates row changes (flexible, portable)

Question 10

Q

What issues arise with async replication?

Answer

A

Lost writes if primary crashes
Read-after-write inconsistency
Mitigation: Read user-modified data from the leader

Question 11

Q

What is multi-leader replication?

Answer

A

Multiple nodes accept writes and replicate to each other.
Pros: Better write scalability & offline support
Cons: Write conflicts are common

Question 12

Q

How does leaderless replication maintain consistency?

Answer

A

All nodes accept reads/writes. Consistency comes from quorums:
Write to w nodes
Read from r nodes
Guarantee correctness if w + r > n

Database Replication Flashcards

(12 cards)