What is used as the primary key of a table in DynamoDB?
The partition key is part of the primary key of a table. It is required.
Optionally, the sort key can also be part of the primary key. This is equivalent to a multiple column primary key in MySQL (composite primary key)
What is the max size of a row (item) in a DynamoDB table?
400 kilobytes, which is less than 1 megabyte.
When should you have a sort key?
How do you find the node (or cluster) that a primary key belongs to?
Apply a consistent hashing algorithm to find the appropriate node/cluster.
Describe the indexes that can be used with DynamoDB
Global Secondary Index. It has a partition key (and optional sort key). The partition key should be different than the partition key of the main table. This is analogous to a secondary index in MySQL.
Local Secondary Index - has same partition key as main table, but a different sort key.
LSIs can only query a single partition. Also, LSIs have strongly consistent reads. LSIs share capacity with the main table, so they might experience throttling.
GSIs are more flexible. Any LSI can be modeled as a GSI.
GSIs can have 20 indexes, LSIs only 5.
Give example usage of GSI for a chat app. Main table has (partition key, sort key) =
(chat_id, timestamp)
You could have the the following GSI if you want to view all chat messages for a specific user.
(user_id, timestamp)
This way you can easily query for all chat messages for user_id = 123. and the results can be sorted by timestamp.
What are the primary ways of accessing data in DynamoDB?
What does this Mysql query look like in DynamoDB?
SELECT * FROM users WHERE user_id = 101
Node.js (yuck)
const params = {
TableName: ‘users’,
KeyConditionExpression: ‘user_id = :id’,
ExpressionAttributeValues: {
‘:id’: 101
}
};
dynamodb.query(params, (err, data) => {
if (err) console.error(err);
else console.log(data);
});
What is the CAP theorem?
In a distributed datastore, you can only choose 2 of the following:
1. Consistency - every read receives data from the most recent write
What is strongly consistent versus eventually consistent for a distributed DB? By default, is DynamoDB strongly or eventually consistent?
Strongly Consistent - every read operation always returns the most recent write.
In a distributed DB, this means all nodes must sync immediately for each write. This takes time. It also involves a consensus algorithm, locking. If there is a network partition (huh?), the system might sacrifice availability
Eventual Consistency - all replicas of that data will eventually converge to the same value. Reads might be stale or return inconsistent data.
When a write occurs, data is propagated to other nodes asynchronously. This means lower latency for reads/writes. It also means higher availability.
DynamoDB is eventually consistent by default.
What is a network partition (in the context of CAP theorem)?
A network partition is a temporary interruption in a distributed system’s network that prevents nodes from communicating with each other. This can happen due to network failures or disruptions, and it can divide the network into separate subnetworks, or partitions, that are unaware of each other’s existence
What kinds of applications need strong consistency?
What kinds can tolerate eventual consistency?
Banking, finance, inventory should have strong consistency
Social media, content delivery, gaming,
How does DynamoDB maintain consistency under the hood?