Interop: Databricks with Snowflake Flashcards

(25 cards)

1
Q

What is a typical role of Databricks in a stack that also includes Snowflake?

A

Databricks often serves as the heavy ETL, data science, and ML engine over object storage, while Snowflake serves as the primary SQL warehouse and BI serving layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a typical role of Snowflake in such a stack?

A

Snowflake acts as a governed, performant data warehouse for analytic SQL and dashboards, leveraging its SQL engine and ecosystem integrations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why might an organization choose to use both Databricks and Snowflake instead of only one?

A

To leverage Databricks’ strengths in Spark-based processing and ML and Snowflake’s strengths in SQL warehousing, governance, and BI tool integration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are common patterns for moving data from Databricks to Snowflake?

A

Writing curated data from Databricks to cloud storage in formats like Parquet or Delta and loading it into Snowflake using COPY or connectors, or writing directly via Snowflake connectors from Databricks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are common patterns for moving data from Snowflake to Databricks?

A

Exporting query results from Snowflake to object storage (e.g., via UNLOAD) for Databricks to read, or using connectors/JDBC to pull data directly into Spark DataFrames.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is object storage often used as the interchange layer between Databricks and Snowflake?

A

It is accessible to both platforms, supports columnar formats, and avoids tight coupling between compute engines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a cost consideration when repeatedly moving data between Databricks and Snowflake?

A

Data egress and duplicate storage can become expensive; minimizing unnecessary copying and carefully planning interfaces is important.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When might Databricks be the better place to do heavy transformations?

A

When transformations involve complex joins, UDFs, ML feature engineering, or very large-scale processing where Spark’s distributed engine excels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When might Snowflake be a better place for transformations?

A

For SQL-centric, warehouse-style transformations that fit well into Snowflake’s SQL engine and can benefit from its optimizations and governance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a good practice for defining ‘ownership’ of curated tables when both Databricks and Snowflake are present?

A

Clearly define which platform owns which modeled layer (e.g., Databricks owns lakehouse medallion layers; Snowflake owns specific marts or serving views) to avoid conflicting definitions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can Databricks consume Snowflake data for ML use cases?

A

By pulling feature sets from Snowflake via connectors or exporting them to object storage, then combining with other data in Spark for feature engineering and training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can Snowflake consume Databricks lakehouse outputs?

A

By loading Gold-level Delta/Parquet outputs into Snowflake tables for downstream analytics and reporting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a naive anti-pattern when integrating Databricks and Snowflake?

A

Treating them as competing pipelines that each independently ingest and model the same raw data, leading to duplication and inconsistent results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a better integration pattern to avoid inconsistent modeling?

A

Use one platform as the primary modeling/curation layer for certain domains, then publish clear, stable interfaces (tables/exports) for the other platform to consume.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does latency influence where to perform transformations between Databricks and Snowflake?

A

Low-latency BI dashboards might benefit from transformations directly in Snowflake, while longer-running transformation and ML pipelines may be better in Databricks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an example workflow: Databricks upstream, Snowflake downstream?

A

Databricks ingests and cleans raw data into Bronze/Silver/Gold Delta tables, then exports curated Gold tables to Snowflake for reporting and KPI dashboards.

17
Q

What is an example workflow: Snowflake upstream, Databricks downstream?

A

Snowflake maintains conformed, cleaned entity tables; Databricks pulls them into Spark, joins with additional data, and builds features and models.

18
Q

How can governance be maintained when data flows between Databricks and Snowflake?

A

By aligning catalogs/schemas with data contracts, using consistent naming and documentation, and ensuring access controls and lineage are tracked in both systems.

19
Q

What is a performance consideration when reading from Snowflake into Databricks via JDBC?

A

Row-by-row or small-batch reads are slow; using efficient unloads to object storage or bulk reads with partitioning options is generally better for large datasets.

20
Q

Why is it important to align time zones, data types, and schemas when exchanging data between Databricks and Snowflake?

A

Mismatches can cause subtle bugs, incorrect joins, and data loss or rounding issues, especially with timestamps and numeric types.

21
Q

How should you handle metrics or critical tables used by both Databricks and Snowflake?

A

Designate one system as the canonical source for those metrics and have the other read them, rather than recomputing separately in both places.

22
Q

What is a good approach to documenting interfaces between Databricks and Snowflake?

A

Define data contracts that specify schemas, refresh cadence, and semantics for tables or exports that cross the boundary.

23
Q

How can you avoid tight coupling between Databricks and Snowflake implementations?

A

Use logical views and stable table schemas as APIs between systems, and hide internal implementation details behind those interfaces.

24
Q

In environments with both tools, what is the role of the lakehouse on Databricks?

A

To act as the central, ACID-managed store on object storage, from which Snowflake may receive curated feeds for warehouse and BI consumption.

25
In one sentence, what is the core mental model for Databricks with Snowflake?
Use Databricks as your flexible compute and ML workbench over the lakehouse and Snowflake as a governed SQL warehouse, with clear, minimal, and well-documented data interfaces between them.