Database Partitioning Flashcards

Question 1

Q

In traditional databases, what are range queries?

Answer

A

Range queries: Retrieve records where a value falls between set limits, useful for ordered data access.

Question 2

Q

In traditional databases, what are secondary indices?

Answer

A

Secondary indices: Additional data structures that speed up queries on non-primary key columns.

Question 3

Q

In traditional databases, what are transactions with ACID properties?

Answer

A

Transactions with ACID properties: Ensure reliable operations where Atomicity means all-or-nothing execution, Consistency maintains data rules, Isolation prevents interference between transactions, and Durability guarantees changes persist.

Question 4

Q

Why do we partition (shard) data?

Answer

A

A single database node can’t handle growing data size, read/write traffic, and latency requirements. Partitioning spreads data and load across multiple nodes to improve scalability, throughput, and availability.

Question 5

Q

What is data partitioning?

Answer

A

Splitting a large dataset into smaller chunks (shards), each managed by a different node, so no single node becomes a bottleneck.

Question 6

Q

What’s the difference between vertical and horizontal sharding?

Answer

A

-Vertical sharding: split by columns or tables
-Horizontal sharding: split by rows (records) using a partition key

Question 7

Q

When is vertical sharding useful?

Answer

A

When tables have wide columns (e.g., blobs, large text) or when separating tables improves performance.
Trade-off: Harder joins; usually manual and application-aware.

Question 8

Q

What are the two main horizontal sharding strategies?

Answer

A

Key-range sharding (ranges of keys)
Hash-based sharding (hash(key) → partition)

Question 9

Q

What are the pros/cons of key-range vs hash-based sharding?

Answer

A

Key-range:
✅ Efficient range queries
❌ Risk of hotspots
Hash-based:
✅ Uniform distribution
❌ No range queries

Question 10

Q

Why is consistent hashing used?

Answer

A

To add/remove nodes with minimal data movement, avoiding massive reshuffling caused by hash mod n.

Question 11

Q

How do secondary indexes work with partitioning?

Answer

A

Local (by document): indexes per shard → slower reads (scatter-gather)
Global (by term): centralized indexes → faster reads, more complex writes

Question 12

Q

What is consistent hashing?

Answer

A

A way to distribute keys across nodes so that adding or removing a node moves only a small fraction of keys, instead of reshuffling everything.
Mental image: put nodes and keys on a circle (ring) and assign each key to the next node clockwise.

Question 13

Q

Why not just use hash(key) mod n instead of consistent hashing?

Answer

A

Because when n changes, almost all keys move.
Consistent hashing ensures that only keys belonging to the affected node move, making scaling and rebalancing cheap.

Database Partitioning Flashcards

(13 cards)