What happens in the data staging layer?
The ETL processes that extract, integrate and clean data from operational sources to feed the data warehouse layer.
When does ETL happen?
When a data warehouse is populated for the first time. After that, it occurs every time the data warehouse is regularly updated.
How is the ETL stage often defined in the three-layer architecture and why?
Reconciliation.
Which four stages are present in ETL?
*Note: Cleansing and transformation stage are often blurred together.
What is the main difference between the cleansing and the transformation stage?
Cleansing focusses on rectifying data values, whereas transformation is focussed on managing data formats.
Which three types of data extraction are possible?
What happens in the cleansing stage of ETL?
Remove inconsistencies and mistakes that make the data dirty to improve data quality.
What are the most frequent mistakes with data that require cleansing?
What are the main data cleansing features in ETL and what do they do?
-> Using specific dictionaries to rectify typing mistakes and to recognize synonyms.
What is rule-based cleansing?
A technique to achieve rectification and homogenization. It enforces domain-specific rules and defines appropiate associations between values.
What happens in the transformation phase?
Data is converted from its operational source format into a specific data warehouse format.
-> In a three-layer architecture, this results into the reconciled layer.
Which data aspects are rectified in the transformation phase?
What are the three main transformation processes in the transformation phase?
Which transformation technique is used when populating a data warehouse after the transformation phase?
Denormalization.
so it requires aggregation to sum up data from the data warehouse as end-user
Which two ways of loading (into the data warehouse) exist?