What is the primary purpose of an OLTP database?
To support high-concurrency, low-latency transactional workloads such as inserts, updates, and small, selective queries.
What is the primary purpose of an OLAP database or data warehouse?
To support analytical workloads that scan and aggregate large volumes of data for reporting and decision-making.
How do OLTP and OLAP systems typically differ in schema design?
OLTP systems are usually highly normalized to avoid redundancy; OLAP systems are often denormalized into star or snowflake schemas for fast reads.
What is normalization in relational databases?
The process of structuring tables to reduce redundancy and dependency anomalies by dividing data into related tables.
Why is normalization beneficial for OLTP systems?
It reduces duplication, simplifies updates, and helps maintain data integrity across many small transactions.
Why is denormalization common in data warehouses?
It reduces the number of joins needed for analytical queries, trading some redundancy for simpler and faster reads.
What is a relational database management system (RDBMS)?
A database system that stores data in tables with rows and columns and uses SQL and relational principles for querying and integrity.
What are common examples of OLTP RDBMS engines?
MySQL, PostgreSQL, SQL Server, and Oracle Database.
What is a distributed MPP data warehouse?
A massively parallel processing system where data is distributed across nodes and queries are executed in parallel for scalability.
What are common examples of data warehouse engines?
Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse, and distributed columnar systems like ClickHouse.
What is a primary key in a relational table?
A column or set of columns that uniquely identifies each row and is used to enforce uniqueness and relationships.
What is a foreign key?
A column in one table that references a primary key in another table, representing a relationship between the tables.
What is an index in a database?
A data structure that accelerates lookups and joins on specific columns at the cost of extra storage and write overhead.
Why should indexes be used carefully in write-heavy systems?
Indexes speed up reads but slow down inserts, updates, and deletes because they must be maintained with each write.
What is a clustered index conceptually?
An index that defines the physical order of rows in the table, often on the primary key in many systems.
What is a nonclustered index conceptually?
A separate structure that references table rows without changing their physical order, used to speed up queries on other columns.
What is a transaction in database systems?
A unit of work that is executed with ACID properties: atomicity, consistency, isolation, and durability.
What does isolation mean in the context of transactions?
The degree to which concurrent transactions are prevented from interfering with each other’s intermediate changes.
What is eventual consistency?
A model where all replicas will converge to the same value over time, but reads may temporarily see stale data after writes.
Why is eventual consistency common in distributed NoSQL systems?
It allows higher availability and partition tolerance by relaxing strict transactional guarantees.
What are the main categories of NoSQL databases?
Key-value stores, document stores, column-family stores, and graph databases.
When are key-value or document stores typically preferred over relational databases?
When flexible schemas, high write throughput, or low-latency simple access patterns are more important than complex joins and transactions.
Why are relational warehouses usually columnar?
Columnar storage matches analytical access patterns, enabling efficient scans, compression, and aggregation on selected columns.
What is a star schema in data warehousing?
A schema with a central fact table joined to multiple dimension tables, forming a star-like structure.