What are the common use cases that event order are really important
Payments transaction.
Video conferencing.
Vehicle control systems.
Online gaming.
Sensor metrics publishing and changes capture.
Why we don’t want to store database in the application machines?
Bottleneck; independent scaling.
Typically application servers become a bottleneck to the system before the actual accessing to the data does.
What is write ahead log and why do we need it
Recovery from crash - data durability. RAM lose data after crash.
This is an append-only file to which every DB modification must be written before it can be committed to the DB. When the database comes back up after a crash, this log is used to restore the DB back to a consistent state.
Write ahead log is completely on disk so it’s durable, we can just replay the records in the WAL to repopulate the data.
Atomicity - revert some writes per errors.
Durability and consistency - replay the logs and rebuild the state.
———————
In database systems that use Write-Ahead Logging (WAL) and implement row-level locking, the behavior regarding when writes are recorded in the WAL relative to locking can vary depending on the specific database’s implementation and configuration. However, there are common patterns and practices that can be generalized to understand how most systems likely behave.
In summary, in systems using WAL and row-level locking, the write to the WAL occurs after the lock on the row is successfully acquired but before the transaction is committed. This sequence ensures that all changes logged in the WAL are guaranteed to be applied atomically and durably, maintaining the integrity and consistency of the database even in the face of system failures or crashes.
What is read and write time complexity of hash indexes? Why hash indexes are only stored in memory? Why range query is expensive for hash index?
Redis for key-value retrieval.
Riak for key-document retrieval.
PostgreSQL, MySQL for exact match queries.
Cassandra use it to map data to nodes using hash value of partition keys.
Basically hash map.
O(1) write and read for hash function, because memory or RAM is O(1) for random access by direct addressing.
Optimized for point queries, where we need to find rows with specific keys.
Subject to the requirement that all indexes can fit into memory. RAM is expensive, less, not durable.
Hash key is randomly and evenly distributed, and inherently lacking of order. Need to scan the entire hash table to find all keys that falls into the range.
Hash indexes are not good for hard disk due to random distribution of data blocks on disk, and therefore lack spatial locality, related data are not stored closer to each other. The disk arm or pointer needs to jump around.
Rehashing and redistribution leads to disk I/O overhead.
What is branching factor in B tree and how does it affect the time complexity of b-tree
MySQL, PostgreSQL, Oracle, MongoDB.
Number of children a node can have. Typically it is several hundred, depends on the storage to store page pointers and the range boundaries. Most databases can fit into B-tree of 3 or 4 levels. A 4 level tree of 4 KB pages with branching factor 500 can store up to 256 TB.
The higher the branching factor is, the less height the tree has, and therefore improving the read and write efficiency.
However, a large node can increase memory overhead, because some keys that act as separators can be stored in memory to speed up get guide to the correct leaf node.
DDIA 81
Why do we still need write ahead logs for B-tree indexes?
DDIA 82.
Operations requiring several pages to be overridden.
If the node crashes when you are partially done with page splits, you are ended up with a corrupted indexes.
How do we delete a record from the log structured file segments in disk, SST or simple unsorted segments
Append a special deletion record sometimes called a tombstone.
LSM tree + SST
Advantages over hash index + log structure files
Paper that proposed this idea came out at 1996.
Apache Cassandra - minimize disk I/O and maximize write throughput.
RocksDB, LevelDB, Google Bigtable
LSM tree is also in memory, same as hash indexes. It can be either red black trees or AVL trees. They are also called memtable.
Inorder traversal and write to disk as sorted string table when it gets full ( a few MBs). Every-time when we write memtable to SSTable, we can discard the corresponding WAL.
In Lucene, indexing engine, the mapping from term to postings list is kept in SSTable like sorted files, which are merged in the background as needed.
DDIA 75
LSM tree and SSTable optimization
Sparse index - the offsets of keys in the log segment.
Bloom filters - what keys does not appear in the database. Never provide false negative - if it says a key does not exist, it would definitely not exist.
Why is LSM tree + SST is a middle ground between hash index and b-tree
Not bounded by memory like hash indexes; can do range query that hash indexes are bad at
Better write performance than b tree while read performance not as good as b tree.
It requires extra CPU bandwidth for compaction
What is address index? Why do we need it sometimes?
Index maps to disk addresses. Like the sparse index we discussed for SSTable.
It’s a type of non clustered index. Don’t store entire row or duplicate data when we have indexes on different column to minimize the data we store.
What is multi-dimensional index and why do we need it?
Like a combo index; composite index.
Example: geospatial dataset, latitude and longitude.
For example, PostGIS implements geospatial indexes as R-trees using PostgreSQL’s Generalized Search Tree indexing facility.
An interesting idea is that multi-dimensional indexes are not just for geographic locations. For example, on an ecommerce website you could use a three-dimensional index on the dimensions (red, green, blue) to search for products in a certain range of colors, or in a database of weather observations you could have a two-dimensional index on (date, temperature) in order to efficiently search for all the observations during the year 2013 where the temperature was between 25 and 30℃. With a onedimensional index, you would have to either scan over all the records from 2013 (regardless of temperature) and then filter them by temperature, or vice versa. A 2D index could narrow down by timestamp and temperature simultaneously. This technique is used by HyperDex.
DDIA 87
What is atomicity
All or nothing principle. All writes for a transaction should succeed or none of them succeed
Example - exchange of money
DDIA 223
What is database consistency
If the transaction fails it fails gracefully, no invariants should be broken - foreign key constraint, unique constraint and check constraint.
Check constraint - specific rules beyond basic data types constraint; businesses rules within the database schema.
For example, employee hiring date always greater than date of birth.
In accounting, credits and debits across all accounts should be balanced.
Atomicity focus on transactions are executed as indivisible units;
consistency focus mainly on maintaining the integrity of database in line with its invariants, rules and constraints.
Application replies on database atomicity and isolation properties in order to achieve consistency, but it’s not up to the database alone.
DDIA 225
What is isolation
No race conditions.
Concurrently executing transactions are isolated from each other.
DDIA 225
What is Durability
No data lost.
Once a transaction has committed successfully, any data it has written will not be forgotten, even if there is a hardware fault or database crashes.
DDIA 226
What is dirty writes? Give an example
Overwrite uncommitted data.
Two transactions concurrently try to update the same object in a database.
If the earlier write is a part of a transaction that has not yet committed and the later write overwrites an uncommitted value.
DDIA 235
What is dirty read?
A transaction has not yet committed or aborted and another transaction see the uncommitted data.
DDIA 234
—————————
Dirty reads are problematic because they can lead to inconsistencies, unreliability, and potential data corruption in a database. When a transaction reads data that has been modified but not yet committed by another transaction, it risks basing its operations on temporary, potentially invalid data. This can result in several issues:
Scenario: Bank Account Transfers
Problem:
- Dirty Read: Transaction T2 reads the balance after the deduction but before the addition.
- Checking account balance: $500 (after deduction).
- Savings account balance: $1000 (before addition).
- Total balance seen by the user: $500 + $1000 = $1500.
- T1 is rolled back due to an error.
- Actual balance should remain $1000 + $1000 = $2000.
- User sees an incorrect total balance of $1500, leading to confusion and possible incorrect financial decisions.
Scenario: E-commerce Inventory System
Problem:
- Dirty Read: Transaction T2 reads the inventory level after the reduction but before the transaction is committed.
- Inventory before sale: 10 units.
- Inventory after uncommitted reduction: 9 units.
- T1 fails and rolls back, keeping the inventory at 10 units.
- T2 reports 9 units available.
- Customer places an order based on incorrect inventory information, potentially leading to overselling and customer dissatisfaction.
Scenario: Real-Time Data Analytics
Problem:
- Dirty Read: Transaction T2 reads the aggregated data that includes uncommitted sensor data.
- Sensor data before aggregation: 50 readings.
- Aggregated result after uncommitted inserts: 60 readings.
- T1 fails, and the uncommitted sensor data is rolled back.
- Report generated shows incorrect aggregated data, leading to inaccurate analytics and potentially flawed business decisions.
To avoid dirty reads, databases can employ higher isolation levels that prevent transactions from reading uncommitted data. Common isolation levels that prevent dirty reads include:
Dirty reads can lead to various issues, including data inconsistencies, unreliable applications, cascading rollbacks, and potential data corruption. By using appropriate isolation levels, databases can prevent dirty reads and ensure data integrity and consistency.
What is the downside of using the same row level locks to prevent dirty reads? What is the common approach used in practice
Long running write transaction can force read only transactions to wait until that long running transaction has completed.
Commonly used approach is that databases remember both old committed value and new value by the transaction that holds the lock. Read requests get the old value so that writer does not block reader.
DDIA 237
How do we deal with dead row level locks?
Database automatically detect deadlocks and abort one of them so that the others can make progress. The aborted transaction needs to be retried by the application.
DDIA 258
What does it mean that a database implements read committed isolation
A DB that blocks both dirty writes and dirty reads.
DDIA 236
What is read skew? Give an example
Also called non repeatable read.
We are reading committed data but there is a write executed in the middle of massive read, and therefore some invariant doesn’t hold anymore.
Long analytical read.
DDIA 238
How do we resolve a read skew problem
Snapshot isolation - DDIA 238
Assign a monotonic increasing number to each write. Store all the data overridden with a transaction number.
The transaction sees all the data that was committed at the start of the transaction. Subsequent transactions shouldn’t impact.
A key principle is writers never block readers and vice versa.
What is lost updates
Increment the counter.
Read-modify-write.
Atomic write operations.
DDIA 243
———————
A lost update occurs when two transactions read the same data and then both attempt to update it. The changes made by the first transaction are overwritten by the second transaction, resulting in a loss of the first update. This anomaly typically happens in a scenario where two transactions are trying to write to the same data item without proper synchronization.
Lost updates are not inherently prevented by read committed isolation. It is handled by repeatable reads isolation with 2 phase locking.