Dataflows & Data Pipelines Flashcards

(32 cards)

1
Q

Visual workflow tool that moves data from A to B (copy, schedule, orchestrate).

A

Data Factory Pipelines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Shortcut (OneLake Shortcut) - What is it?

A

A zero-copy reference to external data stored outside Fabric (e.g.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Shortcut (OneLake Shortcut) - Why use it?

A

Use when data already exists in a clean data lake and you want immediate access with no storage duplication.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Shortcut (OneLake Shortcut) - Example external data

A

ADLS Gen2 curated lake zones, Amazon S3 marketing data, Google Cloud Storage research datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Database Mirroring - What is it?

A

A high availability feature for SQL Server that maintains a mirrored database for failover.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Database Mirroring - Why use it?

A

Use when you need redundancy and disaster recovery for an operational SQL Server—not for data ingestion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Database Mirroring - Example use case

A

Primary ticketing database + secondary failover replica.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Dataflow (ETL Dataflow) - What is it?

A

Power Query-based GUI tool for ingesting and transforming data into a Lakehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Dataflow (ETL Dataflow) - Why use it?

A

Use for business-analyst-friendly cleaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Dataflow (ETL Dataflow) - Example external data

A

Salesforce Contacts, SurveyMonkey CSV exports, Excel finance report uploads, On-prem SQL via Gateway.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data Pipeline (ETL Data Pipeline) - What is it?

A

An orchestrated and scheduled ETL workflow using Data Factory inside Fabric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Pipeline (ETL Data Pipeline) - Why use it?

A

Use for recurring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data Pipeline (ETL Data Pipeline) - Example external data

A

Azure SQL DB, Snowflake, AWS S3, Google BigQuery, Dynamics 365, Zendesk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Notebook (ETL Notebook) - What is it?

A

A Spark-based code workspace for ingestion and transformation using Python

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Notebook (ETL Notebook) - Why use it?

A

Use for complex logic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Notebook (ETL Notebook) - Example external data

A

IoT telemetry, website clickstream logs, multi-GB CSV archives, sensor data feeds.

17
Q

Eventstream - What is it?

A

A real-time streaming ingestion pipeline that processes events continuously.

18
Q

Eventstream - Why use it?

A

Use when dashboards or alerts must update instantly based on live event data.

19
Q

Eventstream - Example external data

A

People counter sensors, POS transaction event messages, Kafka or Event Hubs streams, real-time user activity logs.

20
Q

Front

21
Q

On-premise Data Gateway - What is it?

A

A secure connector that allows Fabric to access on-premises data sources inside your internal network.

22
Q

On-premise Data Gateway - Why use it?

A

Use when the data source is on-prem and cannot have a public endpoint. Requires secure outbound-only access.

23
Q

On-premise Data Gateway - Example external data

A

On-prem SQL Server, Oracle DB in data center, Shared drive with Excel/CSV files.

24
Q

VNet Data Gateway - What is it?

A

A Microsoft-managed private network connection to Azure data sources that are secured with private endpoints/VNet integration.

25
VNet Data Gateway - Why use it?
Use when source is in Azure but locked behind private networking and cannot be accessed publicly.
26
VNet Data Gateway - Example external data
Azure SQL DB with private endpoint, ADLS Gen2 in private VNet, Managed SQL Instance in isolated subnet.
27
Fast Copy - What is it?
A high-throughput bulk data copy mode optimized for large data movement without applying transformations.
28
Fast Copy - Why use it?
Use when you need to move large datasets quickly and don't need transformations during ingestion.
29
Fast Copy - Example external data
Full history table copies, Parquet files in S3, Archive extract from data warehouse.
30
Staging - What is it?
A temporary landing area for raw data before transformation into curated tables.
31
Staging - Why use it?
Use when applying multi-step ETL
32
Staging - Example external data
Bronze raw zone files, intermediate processed tables, source dumps prior to cleansing.