What is Unity Catalog?
Unity Catalog is a centralized data catalog that provides access control, auditing, lineage, quality monitoring, and data discovery capabilities across Databricks workspaces
What are 3 architectural levels that surrounds UC?
1.Compute (Apache)
2.UC
3.Data Storage
What are the four levels UC catalog object model?
1.Metastore
2.Catalog and Non-Data Objects
3.Schema
4.Data Objects
What is a Metastore?
What is the best practice for using them?
A top level container that registers metadata about data and the permissions that govern access to them.
Each cloud region should have a different Metastore.
Whats is a Catalog?
How are they typically used?
Catalog is an organization unit for scheams.
They should mirror functional organizations e.g. (Sales, HR)
What are 4 examples of a securable object for external data source access?
1.Storage Access Credentials
2.Service credentials
3.External locations
4.Connections
What are 4 examples of a securable object for shared access?
1.Clean Rooms
2.Shares
3.Recipients
4.Providers
What is a Schema?
Logical layer containing data objects under Catalogs
What are 5 examples of Objects?
1.UDF
2.Volume
3.Table
4.View
5.Function
What is a Volume?
How is it used?
Logical object layer under Schema.
Volumes store, organize, and access files containing structured and unstructured data.
What are the 2 key features of a Managed volume ?
1.Contains managed tables
What are the 2 key features of an external volume?
What is its use case?
1.Placed in a prexisting cloud storage location
2.Allows read/write w/o cloud specific priviliges
Onboarding existing data lakes, governing non-Delta data
What are the 3 roles that have default admin access to Unity Catalog?
1.Account Admins
2.Workspace Admins
3.Metastore Admins
What is a Managed Table?
UC manages both the governance and the underlying data files.
When table deleted underlying data is deleted.
What is an External Table ?
Specified storage location where data persists after table is droped.
How is a Connection created?
How is it used?
What happens when a batch of Insert records violates on or more table constraints? What happens if this situation occurs in a stream?
The whole batch/job fails and no records are written.
Can a constraint be added to a table that contains records that violate it?
No.