In traditional databases, what are range queries?
Range queries: Retrieve records where a value falls between set limits, useful for ordered data access.
In traditional databases, what are secondary indices?
Secondary indices: Additional data structures that speed up queries on non-primary key columns.
In traditional databases, what are transactions with ACID properties?
Transactions with ACID properties: Ensure reliable operations where Atomicity means all-or-nothing execution, Consistency maintains data rules, Isolation prevents interference between transactions, and Durability guarantees changes persist.
Why do we partition (shard) data?
A single database node can’t handle growing data size, read/write traffic, and latency requirements. Partitioning spreads data and load across multiple nodes to improve scalability, throughput, and availability.
What is data partitioning?
Splitting a large dataset into smaller chunks (shards), each managed by a different node, so no single node becomes a bottleneck.
What’s the difference between vertical and horizontal sharding?
-Vertical sharding: split by columns or tables
-Horizontal sharding: split by rows (records) using a partition key
When is vertical sharding useful?
When tables have wide columns (e.g., blobs, large text) or when separating tables improves performance.
Trade-off: Harder joins; usually manual and application-aware.
What are the two main horizontal sharding strategies?
Key-range sharding (ranges of keys)
Hash-based sharding (hash(key) → partition)
What are the pros/cons of key-range vs hash-based sharding?
Key-range:
✅ Efficient range queries
❌ Risk of hotspots
Hash-based:
✅ Uniform distribution
❌ No range queries
Why is consistent hashing used?
To add/remove nodes with minimal data movement, avoiding massive reshuffling caused by hash mod n.
How do secondary indexes work with partitioning?
Local (by document): indexes per shard → slower reads (scatter-gather)
Global (by term): centralized indexes → faster reads, more complex writes
What is consistent hashing?
A way to distribute keys across nodes so that adding or removing a node moves only a small fraction of keys, instead of reshuffling everything.
Mental image: put nodes and keys on a circle (ring) and assign each key to the next node clockwise.
Why not just use hash(key) mod n instead of consistent hashing?
Because when n changes, almost all keys move.
Consistent hashing ensures that only keys belonging to the affected node move, making scaling and rebalancing cheap.