What is Unity Catalog?
Unity Catalog is a centralized governance layer for data and AI assets:
* Manages access control across workspaces
* Provides data lineage
* Enables fine-grained security (row/column level)
π It replaces the legacy Hive Metastore with enterprise-grade governance.
What is the object hierarchy in Unity Catalog?
Catalog β Schema β Table/View
Example:
catalog = finance
schema = transactions
table = payments
π This structure supports multi-tenant environments.
What are the key differences between Unity Catalog and Hive Metastore?
Unity Catalog:
Centralized across workspaces
Fine-grained access control
Built-in lineage
Hive Metastore:
Workspace-level
Limited governance
No native lineage
π Unity Catalog is designed for enterprise governance at scale.
What types of access control does Unity Catalog support?
Table-level
Column-level
Row-level filtering
π Enables:
Data masking
Compliance (PII protection)
How does row-level security work in Unity Catalog?
How is column-level security implemented?
salary or SSN for non-authorized usersWhat is data lineage and why is it important?
Tracks:
* Data origin
* Transformations
* Downstream dependencies
π Helps with:
* Debugging
* Impact analysis
* Compliance audits
What is an external location in Unity Catalog?
What are storage credentials in Unity Catalog?
What is the difference between managed and external tables?
How does Unity Catalog enable data sharing?
What is the principle of least privilege and why is it important?
How is RBAC implemented in Unity Catalog?
What is data masking and when should it be used?
How does Unity Catalog handle multiple workspaces?
Why is auditing important in data governance?
Why is defining data ownership important?
What is the tradeoff between governance and flexibility?
How would you secure PII data in Databricks?
Why is it important to understand data access patterns?
How do you enforce governance in data pipelines?
What are common mistakes in data governance?
How does Unity Catalog integrate with Delta Lake?
How would you handle a request for sensitive data access?