Open-source protocol for reading and writing files to cloud storage.
Delta Lake
The default format for tables created in Databricks
Entire transaction completes
Atomicity
Data follows rules or it will be rolled back
Consistency
One transaction completed before the start of another
Isolation
Data is saved in a persistent state once completed
Durability
Automatically adjusts the schema of your Delta table as your data changes
Schema Evolution
Ensures that any data written to the Delta table matches the table’s defined schema
Schema Enforcement
Creates a table by selecting data from an existing table or data source
CREATE TABLE AS (CTAS)
Provides a point-and-click interface to upload files and create tables
UPLOAD UI
Incrementally (streaming) processes new data files as they arrive in cloud storage
Auto Loader
What level of the Medallion Architecture is this?
Bronze
What level of the Medallion Architecture is this?
Silver
What level of the Medallion Architecture is this?
Gold
Which statement describes the Databricks workspace?
A) It is a classroom setup for running Databricks lessons and exercises.
B) It is a mechanism for cleaning up lesson-specific assets created during a learning session.
C) It is a set of predefined tables and path variables within Databricks.
D) It is a solution for organizing assets within Databricks
D) It is a solution for organizing assets within Databricks
What assets can be accessed from and organized within the Databricks workspace?
A) Virtual machine configurations for clusters
B) Machine learning models and algorithms
C) Notebooks and files
D) Cloud storage accounts
C) Notebooks and files
Which statement describes Databricks Repos?
A) A feature for scheduling and orchestrating data pipelines within Databricks
B) A tool for managing virtual environments and dependencies in Databricks
C) A capability centered around continuous integration of assets in Databricks and external Git repositories.
D) An integrated development environment (IDE) specifically designed for Databricks notebooks
C) A capability centered around continuous integration of assets in Databricks and external Git repositories.
**What is the basic cloud-based compute structure of Databricks?88
A) Data Nodes
B) Data Warehouses
C) Databricks Clusters
D) Databricks Instances
C) Databricks Clusters
As a Data Engineer, which of the following would you use to orchestrate data tasks?
A) Databricks AI Library
B) Databricks Academy
C) Spark MLlib
D) LakeFlow Jobs
D) LakeFlow Jobs
How do clusters and warehouses differ in their roles?
A) Clusters are designed for data visualization, while SQL warehouses execute SQL queries
B) Clusters provide compute resource for running notebooks and warehouses work specifically with SQL queries
C) Clusters offer storage optimization, while SQL warehouses provide data replication
D) Clusters handle machine learning tasks, while SQL warehouses focus on data processing
B) Clusters provide compute resource for running notebooks and warehouses work specifically with SQL queries
What are the high-level configuration options available when setting up a cluster?
A) Data Transformation Pipelines, Machine Learning Models, and Data Visualization
B) Data Replication, Disk Encryption, and Data Partitioning.
C) Autoscaling Options, Access Mode, and Cluster Name
D) Notebook Sharing, Version Control, and User Permissions.
C) Autoscaling Options, Access Mode, and Cluster Name
What are the primary high-level configuration options available when setting up a warehouse?
A) Compute Cluster Size, Auto-stop Timer, and Scaling Parameters
B) Query Execution Speed, Access Mode, and Visualization Mode
C) Data Compression, Cluster Name, and Query Optimization.
D) Data Replication, Notebook Sharing, and Data Partitioning
A) Compute Cluster Size, Auto-stop Timer, and Scaling Parameters
What are the benefits of using the available serverless compute features?
A) Cost efficiency, scalability, and simplified management
B) Enhanced query performance for all workloads.
C) Fixed and predetermined billing structure.
D) Manual adjustment of resource allocation.
A) Cost efficiency, scalability, and simplified management
What is the primary interface used by data engineers when working with Databricks?
A) Visual Studio Code
B) Command Line Interface
C) Databricks Notebooks
D) Data Dashboards
C) Databricks Notebooks
What are the common use cases for data engineers when working with Notebooks?
A) Writing Research Papers
B) Creating Mobile Apps
C) Data Exploration, Reporting, and Dashboarding
D) Playing Online Games
C) Data Exploration, Reporting, and Dashboarding