What are two design choices made by SQL databases that impact their scalability?
Which database will offer faster writes, Cassandra or MongoDB?
Write in MongoDB needs to update applicable indexes whereas its not required for Cassandra.
What are the challenges of sharding data while fetching data?
Note: Mongo does local sorting on each shard followed by merge sort in the primary shard.
Cassandra doesn’t allow queries where the partition is not filtered (not always??): TBD
You need to serve both real-time queries and analytical queries. How will you approach it?
In MongoDB, analytical queries can be served from replicas with indexes tuned for analytical queries. No need to use the real-time indexes on the replica.
In Cassandra, we need to duplicate data with a different Primary key.
Is the process of creating a sharding key in Mongo equivalent to creating a Partition or Primary key in Cassandra?
Partition Key.
What are the rules to create a good partition or sharding key?
High uniqueness so that data is evenly distributed.
Tactic: Make it a compound key with a monotonically increasing key (e.g. counter or ts) at the end of the compound key. This additional key not part of the actual data is called a surrogate key.
What are the two database-neutral steps while doing data modeling?
Conceptual data model (know your data, entities, and relationships)
Application Workflow & Access Patterns (know your queries, frequency & freshness)
List four different bugs which can come due to the parallelization of transactions (w/o isolation i.e. w/o serializing them).
What are different isolation levels defined by the SQL standard?
READ UNCOMMITTED (All anomalies, No Lock)
READ COMMITTED
REPEATABLE READ* (Lock on Row)
SERIALIZABLE (No anomaly, Lock on table)
Which database will offer faster reads, Cassandra or MongoDB?
Data in Cassandra, at any given time, can be spread across multiple SSL tables in disk and memtables in memory. For reading, we need to read and consolidate tables from multiple tables whereas we don’t have such a requirement for Mongo. Hence, MongoDB is supposed to be faster for read operations.