Data Modeling & Warehouse Architectures Flashcards

(45 cards)

1
Q

What is the primary goal of data modeling in analytics and warehousing?

A

To structure data so that it is easy, efficient, and unambiguous to query for reporting, analysis, and downstream applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does data modeling in analytics differ from OLTP modeling?

A

Analytics modeling prioritizes read performance and simplicity for aggregations and slicing, while OLTP modeling prioritizes write performance and transaction integrity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a star schema?

A

A schema with a central fact table joined to multiple dimension tables, forming a star-like pattern.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a snowflake schema?

A

A star schema variant where dimensions are further normalized into additional related tables, creating more joins and a snowflake-like structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why are star schemas popular in data warehouses?

A

They provide a clear separation between facts and dimensions, simplify queries, and support efficient aggregations and BI tools.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a fact table?

A

A table that stores business events or measurements, typically with foreign keys to dimensions and numeric metrics to aggregate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a dimension table?

A

A table that stores descriptive attributes about business entities such as customers, products, or dates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the grain of a fact table mean?

A

The level of detail represented by each row, such as one row per order line, per daily metric per customer, or per event.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why must the grain of a fact table be defined explicitly?

A

To avoid ambiguity, ensure measures are consistent, and design dimensions and keys that align with that level of detail.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a transactional fact table?

A

A fact table that records individual business transactions or events, such as orders or clicks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a snapshot fact table?

A

A fact table that captures the state of a process or entity at a regular point in time, such as daily account balances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an accumulating snapshot fact table?

A

A fact table that tracks the lifecycle of a process with multiple milestones, updating the row as the process advances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are conformed dimensions?

A

Dimensions that are shared consistently across multiple fact tables or data marts, using the same keys and semantics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why are conformed dimensions important?

A

They enable consistent reporting and cross-fact analysis across different subject areas, such as sales and marketing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a junk dimension?

A

A dimension that groups low-cardinality flags and attributes that do not naturally belong in other dimensions to avoid cluttering fact tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a role-playing dimension?

A

A single physical dimension table used in multiple roles in a fact table, such as a Date dimension serving order_date, ship_date, and invoice_date.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a degenerate dimension?

A

A dimension attribute, such as an order number, stored directly in the fact table without a separate dimension table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a slowly changing dimension (SCD)?

A

A dimension table where attribute values change over time and must be tracked with specific patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is SCD Type 1?

A

A pattern that overwrites old attribute values with new ones, not preserving history.

20
Q

What is SCD Type 2?

A

A pattern that preserves history by adding new rows with new surrogate keys and effective dates or versioning.

21
Q

When is SCD Type 1 appropriate?

A

When only the current value matters and historical attribute values are not needed for analysis.

22
Q

When is SCD Type 2 appropriate?

A

When historical analysis requires knowing what attribute values were at the time of past events.

23
Q

What is a surrogate key in dimensional modeling?

A

An artificial key, typically an integer, that uniquely identifies dimension rows and is used as the primary key.

24
Q

Why are surrogate keys preferred over natural business keys in dimensions?

A

They are stable, independent of source systems, and support SCD Type 2 and other modeling needs cleanly.

25
What is a bridge table?
A table used to resolve many-to-many relationships between facts and dimensions, often for multi-valued attributes.
26
What is a data mart?
A subject-oriented subset of the data warehouse tailored to specific analytical areas such as finance or marketing.
27
What is the Kimball approach to data warehousing?
A bottom-up methodology that emphasizes dimensional modeling, conformed dimensions, and subject-area data marts feeding a bus architecture.
28
What is the Inmon approach to data warehousing?
A top-down methodology focusing on a centralized, normalized enterprise data warehouse from which data marts are derived.
29
What is a data vault model?
A modeling technique that separates hubs (business keys), links (relationships), and satellites (descriptive attributes) for flexible, audit-friendly warehouses.
30
When might a data vault be preferred?
When auditability, traceability, and flexibility in integrating many source systems are more important than simple BI models.
31
What is a common pattern for integrating data vault with dimensional models?
Use data vault as the raw, auditable storage layer and build dimensional star schemas on top for reporting and BI.
32
What is the medallion (bronze/silver/gold) architecture?
A layered approach where bronze holds raw data, silver holds cleaned and standardized data, and gold holds curated, business-ready tables.
33
How does the medallion architecture relate to modeling?
Bronze layers are minimally modeled, silver layers capture core entities and relationships, and gold layers implement dimensional or application-specific models.
34
What is a semantic layer in analytics?
An abstraction that defines business metrics, dimensions, and relationships in business terms, independent of physical storage details.
35
Why is a semantic layer useful?
It provides consistent definitions for metrics and dimensions across tools and teams, reducing ambiguity and duplication.
36
What is a factless fact table?
A fact table without numeric measures, used to record the occurrence of events or coverage relationships for counting and existence checks.
37
What is a grain mismatch problem between fact tables?
A situation where fact tables have different levels of detail that do not align, complicating joins and aggregations across them.
38
How can grain mismatches be mitigated?
By designing fact tables with clear, compatible grains, or by creating derived aggregate tables to align grains for common analyses.
39
What is an aggregate table in a warehouse?
A pre-aggregated table at a coarser grain than the base fact, used to speed up common queries.
40
Why must aggregate tables be kept consistent with base facts?
Inconsistencies lead to conflicting results, so pipelines and refresh logic must keep aggregates in sync.
41
What is referential integrity in a dimensional model?
The property that every foreign key in a fact table matches a valid primary key in a dimension table, often enforced logically or via checks.
42
Why is referential integrity important for analytics?
Broken references cause orphaned facts, missing dimension attributes, and misleading aggregates.
43
What is a data model diagram used for?
To visualize tables, keys, relationships, and cardinalities, helping stakeholders understand how data is structured and connected.
44
Why should data models be versioned and documented?
Models evolve over time; documentation and versioning help teams understand changes, avoid breaking consumers, and comply with governance.
45
What is a good one-sentence mental model for warehouse data modeling?
Choose a clear grain for each fact, attach well-designed conformed dimensions, and structure schemas so business questions can be answered with simple, performant queries.