PowerBI Flashcards

(91 cards)

1
Q

🧠 LEVEL 1 β€” POWER BI FOUNDATIONS (DETAILED MASTER CARDS)
πŸŸ₯ CORE ARCHITECTURE
πŸƒ Card 1 β€” Purpose of Power BI

Q: What fundamental problem does Power BI solve in the data ecosystem, and how does it differ from traditional reporting tools?

A

A:

Power BI solves the problem of self-service business intelligence:

Traditional reporting:

Static reports

IT-controlled

Slow updates

Limited interactivity

Power BI enables:

Interactive exploration

Real-time slicing/filtering

Integration of multiple sources

Democratized analytics

It shifts organizations from report consumption β†’ data exploration.

Visual cue: Static PDF report vs interactive dashboard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

πŸƒ Card 2 β€” End-to-End Workflow

Q: What is the full lifecycle of data inside Power BI from source to user interaction?

A

A:

Data acquisition (connect to sources)

Transformation (Power Query)

Loading into model (VertiPaq engine)

Modeling (relationships, measures)

Visualization (reports)

Publishing (Service)

Consumption (dashboards/apps/mobile)

Key insight:

Power BI is not just visualization β€” it is a complete BI pipeline.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

πŸƒ Card 3 β€” Why Power BI Desktop Is Central

Q: Why is Power BI Desktop considered the core authoring environment rather than the Service?

A

A:

Because Desktop contains:

Full Power Query engine

Data modeling tools

DAX authoring

Report creation

The Service is primarily for:

Distribution

Collaboration

Governance

Think:

Desktop = Development
Service = Deployment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

πŸŸ₯ DATASET / REPORT / DASHBOARD β€” DEEP
πŸƒ Card 4 β€” Dataset (Semantic Model)

Q: What is a dataset in Power BI from a semantic modeling perspective?

A

A:

A dataset is a semantic layer that translates raw data into business meaning.

It contains:

Tables

Relationships

Measures

Hierarchies

Metadata

It defines how users interpret data.

Important insight:

Reports do not analyze raw data β€” they query the semantic model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

πŸƒ Card 5 β€” Report Mechanics

Q: How does a report interact with the dataset at query time?

A

Q: How does a report interact with the dataset at query time?

A:

Each visual generates a query against the dataset using filter context.

User interactions:

Slicers

Cross-filtering

Drill-down

modify the query dynamically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

πŸƒ Card 6 β€” Dashboard Limitations

Q: Why are dashboards only available in the Service and not Desktop?

A

A:

Because dashboards are:

Aggregations of visuals from multiple reports

Designed for monitoring

Not for detailed analysis

They rely on cloud-based sharing features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

πŸŸ₯ STORAGE MODES β€” INTERNAL LOGIC
πŸƒ Card 7 β€” Import Mode Internals

Q: Why is Import mode typically faster than DirectQuery?

A

A:

Because data is stored in-memory using the VertiPaq columnar engine.

Benefits:

Compressed storage

Vectorized scans

No network latency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

πŸƒ Card 8 β€” DirectQuery Trade-offs

Q: What are the architectural trade-offs of DirectQuery?

A

A:

Advantages:

Real-time data

No duplication

Disadvantages:

Slower performance

Limited DAX functions

Dependency on source database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

πŸƒ Card 9 β€” Composite Models

Q: What problem do composite models solve?

A

A:

They allow combining:

Import tables

DirectQuery tables

in one model.

Useful when:

Some data needs real-time access

Other data benefits from in-memory speed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

πŸŸ₯ POWER QUERY β€” CONCEPTUAL FOUNDATION
πŸƒ Card 10 β€” Role Separation

Q: Why should data cleaning be done in Power Query instead of DAX?

A

A:

Power Query operates before data enters the model, making transformations:

Reusable

Efficient

Applied once

DAX transformations happen at query time β†’ expensive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

πŸƒ Card 11 β€” Immutable Pipeline Concept

Q: Why are Power Query steps considered an immutable transformation pipeline?

A

A:

Each step produces a new table without altering previous steps.

Benefits:

Traceability

Reproducibility

Debugging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

πŸŸ₯ DATA MODEL β€” WHY IT EXISTS
πŸƒ Card 12 β€” Need for Modeling Layer

Q: Why doesn’t Power BI simply visualize tables directly without modeling?

A

A:

Because raw tables:

Lack relationships

Lack business logic

Cause ambiguous aggregations

Modeling creates a coherent analytical structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

πŸƒ Card 13 β€” Semantic vs Physical Model

Q: What is the difference between the semantic model and the physical data source?

A

A:

Physical:

Raw tables in databases/files

Semantic:

Business interpretation layer:

Relationships

Measures

Hierarchies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

πŸŸ₯ VISUAL INTERACTION MECHANICS
πŸƒ Card 14 β€” Cross-Filtering Logic

Q: How does selecting one visual affect others on a report page?

A

A:

It modifies filter context applied to the dataset.

Other visuals recompute results based on new context.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

πŸƒ Card 15 β€” Drill-Down Concept

Q: What analytical problem does drill-down solve?

A

A:

Navigating between aggregation levels:

Year β†’ Quarter β†’ Month β†’ Day

Supports hierarchical analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

🧠 LEVEL 2 β€” DATA MODELING MASTERY (ELITE CARDS)
πŸŸ₯ WHY DATA MODELING MATTERS
πŸƒ Card 1 β€” Modeling vs Visualization

Q: Why is data modeling considered more important than visualization in Power BI?

A

A:

Because the model determines:

Correctness of aggregations

Query performance

Filter behavior

DAX complexity

Poor model β†’ impossible calculations
Good model β†’ simple measures

Expert insight:
Visualization errors often originate in model design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

πŸƒ Card 2 β€” Analytical Engine Dependency

Q: Why does the VertiPaq engine rely heavily on model structure?

A

A:

VertiPaq scans columns independently.

Efficient models:

Narrow fact tables

Low-cardinality dimensions

Clear relationships

Bad models:

Wide tables

High cardinality

Ambiguous joins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

πŸŸ₯ STAR SCHEMA β€” DEEP UNDERSTANDING
πŸƒ Card 3 β€” Star Schema Purpose

Q: What fundamental analytical problem does the star schema solve?

A

A:

Separating:

Measurements (facts)

Context (dimensions)

This enables:

Flexible slicing

Efficient aggregation

Clear semantics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

πŸƒ Card 4 β€” Fact Table Properties

Q: What characteristics define a proper fact table?

A

A:

Contains:

Numeric measures

Foreign keys to dimensions

High row count

Granular events

Should NOT contain:

Descriptive attributes

Repeated text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

πŸƒ Card 5 β€” Grain Concept (Critical)

Q: What is the grain of a fact table and why must it be defined explicitly?

A

A:

Grain = level of detail of each row.

Example:

One row per:

Transaction

Order line

Daily summary

Undefined grain causes:

Double counting

Inconsistent aggregations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

πŸƒ Card 6 β€” Dimension Table Role

Q: Why are dimension tables essential for slicing data?

A

A:

They provide descriptive attributes used for filtering:

Customer demographics

Product categories

Geographic hierarchy

Dimensions define analysis perspectives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

πŸŸ₯ RELATIONSHIPS β€” INTERNAL LOGIC
πŸƒ Card 7 β€” One-to-Many Relationship

Q: Why should relationships typically flow from dimension (one) to fact (many)?

A

A:

Because:

Dimensions filter facts

Not the reverse

This mirrors real-world logic:

Category β†’ Products β†’ Sales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

πŸƒ Card 8 β€” Filter Propagation

Q: How does filter propagation work across relationships?

A

A:

Filters applied to a dimension:

β†’ propagate to fact table
β†’ affect aggregations

Direction matters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

πŸƒ Card 9 β€” Bidirectional Relationships Risk

Q: Why can bidirectional filtering create ambiguity?

A

A:

It allows filters to travel multiple paths, causing:

Circular dependencies

Incorrect totals

Performance issues

Use sparingly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
πŸŸ₯ CARDINALITY β€” PERFORMANCE FACTOR πŸƒ Card 10 β€” Cardinality Definition Q: What is cardinality in the context of relationships?
A: The uniqueness of values in columns: One-to-many Many-to-one Many-to-many High cardinality β†’ lower compression.
26
πŸƒ Card 11 β€” High Cardinality Problems Q: Why do high-cardinality columns degrade performance?
A: Because compression relies on repeated values. Unique values β†’ larger memory footprint β†’ slower scans.
27
πŸŸ₯ MANY-TO-MANY β€” ADVANCED CASE πŸƒ Card 12 β€” Why Many-to-Many Is Dangerous Q: What analytical problems arise from many-to-many relationships?
A: Ambiguous filter paths Double counting Complex DAX Often indicates poor model design.
28
πŸƒ Card 13 β€” Bridge Table Solution Q: How does a bridge table resolve many-to-many ambiguity?
A: It introduces an intermediate dimension that: Normalizes relationships Controls filter flow Restores star structure
29
πŸŸ₯ SNOWFLAKE VS STAR πŸƒ Card 14 β€” Snowflake Trade-offs Q: Why is snowflake schema generally discouraged in Power BI?
A: Although normalized, it: Increases relationship complexity Requires more joins Reduces performance Star schema preferred for analytics.
30
πŸŸ₯ SURROGATE KEYS πŸƒ Card 15 β€” Purpose of Surrogate Keys Q: Why use surrogate keys instead of natural keys in dimension tables?
A: They: Ensure uniqueness Support slowly changing dimensions Improve performance Avoid business logic changes
31
πŸŸ₯ DATE TABLE β€” SPECIAL DIMENSION πŸƒ Card 16 β€” Importance of Date Dimension Q: Why must date information reside in a dedicated date table?
A: Because time intelligence requires: Continuous date range Hierarchies Fiscal calendars Custom attributes
32
πŸŸ₯ MODEL FAILURE DIAGNOSIS πŸƒ Card 17 β€” Symptoms of Bad Model Q: What are warning signs that a Power BI model is poorly designed?
A: Incorrect totals Need for complex DAX workarounds Slow performance Circular relationships Many-to-many everywhere
33
🧠 LEVEL 3 β€” DAX ENGINE DEEP DIVE (ELITE CARDS) πŸŸ₯ FOUNDATIONS β€” HOW DAX THINKS πŸƒ Card 1 β€” DAX vs Excel Formulas Q: Why is DAX fundamentally different from Excel formulas even though the syntax looks similar?
A: Excel: Cell-by-cell calculation Static references DAX: Columnar evaluation Context-dependent Query-based DAX formulas answer: β€œWhat value should be returned under the current filter context?” Not: β€œWhat is this cell equal to?”
34
πŸƒ Card 2 β€” Declarative Nature Q: Why is DAX considered a declarative language?
A: You specify what result you want, not how to compute it. The engine decides execution strategy using: Storage engine (VertiPaq) Formula engine
35
πŸŸ₯ EVALUATION CONTEXT β€” CORE THEORY πŸƒ Card 3 β€” Filter Context (Precise Definition) Q: What exactly is filter context in DAX from the engine’s perspective?
A: A set of column filters that restrict visible rows before aggregation. It originates from: Report filters Slicers Visual axes DAX functions Filter context determines: Which rows exist for calculation.
36
πŸƒ Card 4 β€” Row Context (Precise Definition) Q: What is row context internally and when does it exist?
A: Row context = current row pointer during iteration. Exists in: Calculated columns Iterators (SUMX, FILTER, etc.) Does NOT automatically filter tables.
37
πŸŸ₯ CONTEXT TRANSITION β€” MOST CONFUSING CONCEPT πŸƒ Card 6 β€” CALCULATE’s Hidden Role Q: Why is CALCULATE the most important function in DAX?
A: Because it modifies filter context. It can: Add filters Remove filters Replace filters Trigger context transition Nearly all advanced measures rely on CALCULATE.
38
πŸƒ Card 7 β€” Context Transition Mechanism Q: What happens internally during context transition?
A: Row context β†’ converted into equivalent filter context. This allows row-based values to affect aggregations. Without it: Measures inside row context ignore the current row.
39
πŸƒ Card 8 β€” Why Measures Ignore Row Context Q: Why does a measure not see row context unless wrapped in CALCULATE?
A: Measures evaluate only in filter context. Row context exists but is invisible to them.
40
πŸŸ₯ FILTER MANIPULATION πŸƒ Card 9 β€” ALL Function Purpose Q: What does ALL actually do internally?
A: Removes filters from specified columns or tables. Creates a new filter context with those filters cleared. Used for: Percent of total Baseline calculations
41
πŸƒ Card 10 β€” REMOVEFILTERS vs ALL Q: Why is REMOVEFILTERS often preferred over ALL?
A: It expresses intent clearly: Remove filters without returning a table. Improves readability and avoids confusion.
42
πŸƒ Card 11 β€” ALLEXCEPT Logic Q: What analytical scenario requires ALLEXCEPT?
A: When you want totals across all categories except certain grouping columns. Example: Total sales across all products but per region.
43
πŸŸ₯ ITERATORS β€” ADVANCED COMPUTATION πŸƒ Card 12 β€” Why Iterators Exist Q: Why do iterator functions (SUMX, AVERAGEX, etc.) exist?
A: Because simple aggregations cannot evaluate row-level expressions before aggregation. Iterators: Create row context Evaluate expression per row Aggregate results
44
πŸƒ Card 13 β€” Performance Cost of Iterators Q: Why are iterators slower than simple aggregations?
A: They invoke the formula engine row by row rather than using storage engine aggregations.
45
πŸŸ₯ TIME INTELLIGENCE β€” ENGINE VIEW πŸƒ Card 14 β€” Time Functions as Filter Rewriters Q: How do time intelligence functions operate internally?
A: They modify filter context on the date table. Example: SAMEPERIODLASTYEAR shifts date filters backward one year.
46
πŸŸ₯ MEASURES VS CALCULATED COLUMNS β€” ENGINE DIFFERENCE πŸƒ Card 15 Q: Why are measures more memory-efficient than calculated columns?
A: Measures: Computed at query time Not stored Calculated columns: Stored physically Increase model size
47
πŸŸ₯ FORMULA ENGINE VS STORAGE ENGINE πŸƒ Card 16 β€” Dual Engine Architecture Q: What are the roles of the Formula Engine and Storage Engine?
A: Storage Engine: Retrieves compressed data Performs scans and aggregations Formula Engine: Executes DAX logic Handles complex operations Performance depends on minimizing Formula Engine work.
48
🧠 CARD SET β€” SERVICE CORE ARCHITECTURE πŸŸ₯ Card 1 β€” What is the Power BI Service? Front (Visual Cue) ☁️ Cloud icon + sharing arrows + browser window
Back (Advanced) Power BI Service is the cloud SaaS platform where BI content is: * Published * Shared * Collaborated on * Consumed It enables organizational analytics distribution. Key roles: Authoring workflow Power BI Desktop β†’ Publish β†’ Service Consumption workflow Service β†’ Dashboards β†’ Apps β†’ Users Key capabilities: * Sharing reports * Creating dashboards * Managing workspaces * Data refresh * Security & permissions The Service is the collaboration and distribution layer of Power BI. πŸ‘‰ Desktop = Development πŸ‘‰ Service = Deployment + Consumption Publishing reports to the cloud enables sharing and collaboration.
49
πŸŸ₯ Card 2 β€” Core Content Types in the Service Front πŸ“¦ 3 stacked boxes labeled: Dataset, Report, Dashboard
Back πŸ”Ή Dataset (Semantic Model) The structured data model used by reports. Contains: * Tables * Relationships * Measures * Calculations Acts as the single source of truth.
50
πŸ”Ή Report
Multi-page interactive document built from a dataset. Features: * Visualizations * Filters * Drillthrough * Slicers Reports answer specific analytical questions.
51
πŸ”Ή Dashboard
Single-page canvas composed of pinned visuals from reports. Purpose: * Executive overview * Monitoring KPIs * Cross-report summary Dashboards aggregate visuals from multiple reports
52
πŸŸ₯ Card 3 β€” Relationship Between Dataset, Report, Dashboard Front Flow diagram: Dataset β†’ Report β†’ Dashboard
Back Hierarchy: Dataset β†’ feeds β†’ Reports β†’ feed β†’ Dashboards Key principle: MANY reports can use ONE dataset ONE dashboard can use visuals from MANY reports Dashboards = curated insight layer
53
🧠 CARD SET β€” WORKSPACES & COLLABORATION πŸŸ₯ Card 4 β€” What is a Workspace? Front 🏒 Folder icon labeled β€œTeam”
Back A workspace is a collaboration container for BI assets. Contains: * Datasets * Reports * Dashboards * Dataflows Workspaces enable: * Team collaboration * Access control * Version management Personal vs Organizational: My Workspace Private sandbox for individual use App Workspace Team-shared environment Workspaces allow teams to collaborate on reports.
54
πŸŸ₯ Card 5 β€” Workspace Roles & Permissions Front πŸ‘₯ Pyramid of roles
Back Typical role hierarchy: Admin Full control (including permissions) Member Can edit content Contributor Can publish & update Viewer Read-only access Enterprise BI depends heavily on role governance.
55
🧠 CARD SET β€” APPS & DISTRIBUTION πŸŸ₯ Card 6 β€” What is a Power BI App? Front πŸ“± App icon with lock symbol
Back Apps are packaged BI content distributed to end users. They provide: * Controlled distribution * Read-only experience * Versioned releases Apps are ideal for: * Business users * Executives * Non-technical stakeholders Apps provide curated consumption experience.
56
πŸŸ₯ Card 7 β€” Workspace vs App Front Split diagram: Left: Workspace (tools) Right: App (consumer view)
Back Workspace = Development & collaboration App = Consumption & distribution Workspace users build content App users consume content Think: Workspace = Factory App = Finished product
57
🧠 CARD SET β€” DATA CONNECTIVITY & REFRESH πŸŸ₯ Card 8 β€” Data Refresh in the Service Front πŸ”„ Clock + database
Back Power BI Service maintains up-to-date data through refresh mechanisms: Scheduled Refresh Automatic updates at intervals Manual Refresh Triggered by user Real-time options * DirectQuery * Live Connection These allow near real-time analytics.
58
πŸŸ₯ Card 9 β€” Gateway (Critical Concept) Front 🏒 On-prem server β†’ ☁️ cloud arrow
Back Gateway enables secure connection between: On-premises data sources ↔ Power BI Service Use cases: * SQL Server in corporate network * Local databases * Internal APIs Gateway is essential for enterprise deployments.
59
🧠 CARD SET β€” SECURITY MODEL πŸŸ₯ Card 10 β€” Row-Level Security (RLS) Front πŸ” Table with filtered rows
Back RLS restricts data visibility based on user identity. Example: Manager sees all regions Sales rep sees only their region Security is enforced inside the dataset. RLS ensures users only see authorized data.
60
🧠 CARD SET β€” CONSUMPTION EXPERIENCE πŸŸ₯ Card 11 β€” Interactivity in Reports Front πŸ–±οΈ clicking chart β†’ other charts change
Back Power BI Service supports: * Cross-filtering * Cross-highlighting * Drill-down * Drill-through These create exploratory analytics.
61
πŸŸ₯ Card 12 β€” Mobile Consumption Front πŸ“± Dashboard on phone
Back Power BI Mobile allows access to dashboards anywhere. Mobile BI is crucial for decision-makers on the move.
62
🧠 Power BI Service β€” Architecture, Governance, Enterprise Deployment 🧠 SERVICE DEEP DIVE β€” ELITE FLASHCARDS πŸŸ₯ TENANT & GOVERNANCE ARCHITECTURE πŸƒ Card 1 β€” Tenant Concept Front (Visual Cue) 🌐 Organization boundary around many workspaces
Back A tenant is the organizational container for all Power BI activity. It defines: * Users * Security policies * Licensing * Data governance rules * Admin settings Everything in Power BI Service exists inside a tenant. Enterprise insight: Tenant configuration determines what users are allowed to do.
63
πŸƒ Card 2 β€” Why Governance Is Critical Front βš–οΈ Scale vs Control diagram
Back Without governance: * Dataset sprawl * Duplicate metrics * Security risks * Inconsistent reporting Governance ensures: Single source of truth Controlled sharing Compliance Enterprise BI success depends more on governance than dashboards.
64
πŸŸ₯ PREMIUM CAPACITY & PERFORMANCE πŸƒ Card 3 β€” Shared vs Premium Capacity Front Two servers: Left crowded (shared) Right dedicated (premium)
Back Shared Capacity Resources shared among tenants Lower cost Performance variability Premium Capacity Dedicated resources Predictable performance Large model support Advanced features Premium is required for enterprise-scale BI.
65
πŸƒ Card 4 β€” Why Capacity Matters Q: What determines report performance in the Service?
A: Not just model design β€” also capacity resources: * CPU * Memory * Concurrent queries * Refresh workload Performance = Model Quality Γ— Capacity
66
πŸŸ₯ DATAFLOWS β€” CENTRALIZED ETL πŸƒ Card 5 β€” Dataflows Purpose Front ETL pipeline feeding multiple datasets
Back Dataflows move data preparation to the cloud. Benefits: * Reusable transformations * Centralized ETL * Consistent data across reports * Reduced duplication Think: Power Query in the Service.
67
πŸƒ Card 6 β€” Dataflow vs Dataset Front Pipeline β†’ Model
Back Dataflow = Data preparation layer Dataset = Semantic modeling layer Separation improves scalability and governance.
68
πŸŸ₯ DEPLOYMENT PIPELINES β€” DEVOPS FOR BI πŸƒ Card 7 β€” Deployment Pipeline Stages Front DEV β†’ TEST β†’ PROD
Back Deployment pipelines enable controlled release of BI content. Stages: Development Authoring and experimentation Test Validation and QA Production Business consumption Prevents breaking reports used by executives.
69
πŸƒ Card 8 β€” Why Pipelines Matter
A: They bring software engineering practices to BI: * Version control mindset * Safe updates * Rollback capability Enterprise BI requires change management.
70
πŸŸ₯ ENTERPRISE DATA GOVERNANCE πŸƒ Card 9 β€” Certified vs Promoted Datasets Front Dataset badges
Back Certification levels communicate trust: Promoted Recommended by authors Certified Approved by governance team Encourages reuse of authoritative data.
71
πŸƒ Card 10 β€” Single Source of Truth Problem Q: Why do organizations struggle with β€œmultiple versions of the truth”?
A: Because of: * Duplicate datasets * Independent report creation * Lack of governance Centralized certified datasets solve this.
72
πŸŸ₯ SECURITY β€” ENTERPRISE SCALE πŸƒ Card 11 β€” Workspace Isolation Strategy Front Multiple workspace containers by department
Back Large organizations separate workspaces by: * Department * Project * Sensitivity level This limits data exposure and simplifies management.
73
πŸƒ Card 12 β€” Object-Level Security Front Dataset with hidden tables
Back Allows restricting access to specific tables or columns within a dataset. Used when: Different users require different data visibility.
74
πŸŸ₯ PERFORMANCE MANAGEMENT πŸƒ Card 13 β€” Dataset Size Limits Q: Why do large datasets require Premium capacity?
A: Because shared capacity imposes size and memory limits. Premium enables: * Large models * Incremental refresh * Better concurrency
75
πŸƒ Card 14 β€” Incremental Refresh Strategy Front Timeline showing only new data loaded
Back Refresh only recent partitions instead of entire dataset. Benefits: * Faster refresh * Reduced resource usage * Enables very large historical datasets
76
πŸŸ₯ ENTERPRISE ARCHITECTURE PATTERNS πŸƒ Card 15 β€” Hub-and-Spoke Model Front Central dataset β†’ many reports
Back Centralized datasets (hub) feed multiple reports (spokes). Advantages: * Consistency * Reusability * Reduced maintenance Common in mature BI environments.
77
πŸƒ Card 16 β€” Self-Service vs Managed BI Q: What tension exists between self-service BI and governance?
A: Self-service: * Flexibility * Speed Governance: * Control * Consistency Enterprise BI must balance both.
78
πŸ’€ Level 4 β€” Performance & VertiPaq Internals 🧠 CARD 4.1 β€” What VertiPaq Actually Is Q: What is VertiPaq and why is it fast?
A: VertiPaq is the in-memory columnar storage engine used by Power BI (Import mode) and Analysis Services Tabular. Why it’s fast: COLUMNAR STORAGE (not row-based): Stores each column separately Queries scan only needed columns Eliminates unnecessary I/O Row store (SQL): Row1: A B C D E Row2: A B C D E Column store (VertiPaq): A: A A A A A B: B B B B B C: C C C C C πŸ‘‰ Aggregations (SUM, COUNT, AVG) operate on one column vector
79
HEAVY COMPRESSION: VertiPaq compresses data using: 1) Dictionary Encoding Unique values stored once
Example: Product column: Laptop Phone Laptop Tablet Phone Dictionary: 1 = Laptop 2 = Phone 3 = Tablet Data stored: 1,2,1,3,2
80
2) Run-Length Encoding (RLE) Best for sorted columns 111111122222333 β†’ (1x7), (2x5), (3x3)
Best for sorted columns 111111122222333 β†’ (1x7), (2x5), (3x3)
81
3) Value Encoding If numeric range small: Actual values: 1000–1005 Stored as: 0–5
If numeric range small: Actual values: 1000–1005 Stored as: 0–5
82
RESULT: Memory footprint shrinks dramatically. Typical compression ratio: 10x β€” 100x smaller than raw data
Typical compression ratio: 10x β€” 100x smaller than raw data
83
🧠 CARD 4.2 β€” Why Model Design Affects Performance Q: Why does star schema massively improve performance?
A: VertiPaq optimizes for: Few large fact tables Many small dimension tables One-to-many relationships
84
❌ Snowflake schema problems:
More joins More relationship traversals Larger filter propagation cost
85
βœ… Star schema benefits: FactSales β†’ DimDate β†’ DimCustomer β†’ DimProduct
VertiPaq can: Push filters efficiently Reduce scan space Use compressed dictionaries effectively
86
🚨 High-cardinality columns hurt compression High cardinality = many unique values
Bad examples: GUIDs Transaction IDs Timestamps (to the second) Text fields Why bad? Dictionary becomes huge β†’ memory ↑ β†’ scan cost ↑
87
🧠 CARD 4.3 β€” Storage Engine vs Formula Engine Q: What are the two engines inside Power BI?
A: 1️⃣ Storage Engine (SE) VertiPaq engine Handles: Data scans Aggregations Filtering Compression usage Runs in parallel Highly optimized C++ code FAST πŸ”₯
88
2️⃣ Formula Engine (FE)
Handles: DAX evaluation Complex logic Iterators (SUMX, FILTER, etc.) Context transitions Single-threaded Not vectorized Slower
89
πŸ”₯ Performance Rule:
The more work pushed to Storage Engine, the faster the query.
90
🧠 CARD 4.4 β€” Why Iterators Are Dangerous Q: Why can SUMX / FILTER destroy performance?
A: Iterators force row-by-row evaluation in Formula Engine. Example: SUMX(Sales, Sales[Quantity] * Sales[Price]) Process: Iterate each row Compute expression Aggregate results Instead of vectorized column aggregation.
91
Better: Create calculated column: Sales[LineTotal] = Quantity * Price Then: SUM(Sales[LineTotal]) Now Storage Engine can aggregate directly.
Better: Create calculated column: Sales[LineTotal] = Quantity * Price Then: SUM(Sales[LineTotal]) Now Storage Engine can aggregate directly.