What is the primary goal of data modeling in analytics and warehousing?
To structure data so that it is easy, efficient, and unambiguous to query for reporting, analysis, and downstream applications.
How does data modeling in analytics differ from OLTP modeling?
Analytics modeling prioritizes read performance and simplicity for aggregations and slicing, while OLTP modeling prioritizes write performance and transaction integrity.
What is a star schema?
A schema with a central fact table joined to multiple dimension tables, forming a star-like pattern.
What is a snowflake schema?
A star schema variant where dimensions are further normalized into additional related tables, creating more joins and a snowflake-like structure.
Why are star schemas popular in data warehouses?
They provide a clear separation between facts and dimensions, simplify queries, and support efficient aggregations and BI tools.
What is a fact table?
A table that stores business events or measurements, typically with foreign keys to dimensions and numeric metrics to aggregate.
What is a dimension table?
A table that stores descriptive attributes about business entities such as customers, products, or dates.
What does the grain of a fact table mean?
The level of detail represented by each row, such as one row per order line, per daily metric per customer, or per event.
Why must the grain of a fact table be defined explicitly?
To avoid ambiguity, ensure measures are consistent, and design dimensions and keys that align with that level of detail.
What is a transactional fact table?
A fact table that records individual business transactions or events, such as orders or clicks.
What is a snapshot fact table?
A fact table that captures the state of a process or entity at a regular point in time, such as daily account balances.
What is an accumulating snapshot fact table?
A fact table that tracks the lifecycle of a process with multiple milestones, updating the row as the process advances.
What are conformed dimensions?
Dimensions that are shared consistently across multiple fact tables or data marts, using the same keys and semantics.
Why are conformed dimensions important?
They enable consistent reporting and cross-fact analysis across different subject areas, such as sales and marketing.
What is a junk dimension?
A dimension that groups low-cardinality flags and attributes that do not naturally belong in other dimensions to avoid cluttering fact tables.
What is a role-playing dimension?
A single physical dimension table used in multiple roles in a fact table, such as a Date dimension serving order_date, ship_date, and invoice_date.
What is a degenerate dimension?
A dimension attribute, such as an order number, stored directly in the fact table without a separate dimension table.
What is a slowly changing dimension (SCD)?
A dimension table where attribute values change over time and must be tracked with specific patterns.
What is SCD Type 1?
A pattern that overwrites old attribute values with new ones, not preserving history.
What is SCD Type 2?
A pattern that preserves history by adding new rows with new surrogate keys and effective dates or versioning.
When is SCD Type 1 appropriate?
When only the current value matters and historical attribute values are not needed for analysis.
When is SCD Type 2 appropriate?
When historical analysis requires knowing what attribute values were at the time of past events.
What is a surrogate key in dimensional modeling?
An artificial key, typically an integer, that uniquely identifies dimension rows and is used as the primary key.
Why are surrogate keys preferred over natural business keys in dimensions?
They are stable, independent of source systems, and support SCD Type 2 and other modeling needs cleanly.