Database Partitioning Flashcards

(13 cards)

1
Q

In traditional databases, what are range queries?

A

Range queries: Retrieve records where a value falls between set limits, useful for ordered data access.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In traditional databases, what are secondary indices?

A

Secondary indices: Additional data structures that speed up queries on non-primary key columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In traditional databases, what are transactions with ACID properties?

A

Transactions with ACID properties: Ensure reliable operations where Atomicity means all-or-nothing execution, Consistency maintains data rules, Isolation prevents interference between transactions, and Durability guarantees changes persist.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why do we partition (shard) data?

A

A single database node can’t handle growing data size, read/write traffic, and latency requirements. Partitioning spreads data and load across multiple nodes to improve scalability, throughput, and availability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is data partitioning?

A

Splitting a large dataset into smaller chunks (shards), each managed by a different node, so no single node becomes a bottleneck.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What’s the difference between vertical and horizontal sharding?

A

-Vertical sharding: split by columns or tables
-Horizontal sharding: split by rows (records) using a partition key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When is vertical sharding useful?

A

When tables have wide columns (e.g., blobs, large text) or when separating tables improves performance.
Trade-off: Harder joins; usually manual and application-aware.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the two main horizontal sharding strategies?

A

Key-range sharding (ranges of keys)
Hash-based sharding (hash(key) → partition)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the pros/cons of key-range vs hash-based sharding?

A

Key-range:
✅ Efficient range queries
❌ Risk of hotspots
Hash-based:
✅ Uniform distribution
❌ No range queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is consistent hashing used?

A

To add/remove nodes with minimal data movement, avoiding massive reshuffling caused by hash mod n.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do secondary indexes work with partitioning?

A

Local (by document): indexes per shard → slower reads (scatter-gather)
Global (by term): centralized indexes → faster reads, more complex writes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is consistent hashing?

A

A way to distribute keys across nodes so that adding or removing a node moves only a small fraction of keys, instead of reshuffling everything.
Mental image: put nodes and keys on a circle (ring) and assign each key to the next node clockwise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why not just use hash(key) mod n instead of consistent hashing?

A

Because when n changes, almost all keys move.
Consistent hashing ensures that only keys belonging to the affected node move, making scaling and rebalancing cheap.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly