What is a data lake ?
Data swamp
Highly disorganised data repository
Data lake (RUDEAS)
Data Warehouse (DDRUSS)
3 Techniques of big data integration (DRS)
Schema Mapping
Create a mediated global schema that is relevant to the business
Identfiy mappings between the schema and the data source
Record Linkage
Identify records that refer to teh same logical entity across different data sources
3 Record Linkage techniques (PCB)
Data Fushion
A combination of techniques that aims to resolve conflicts from a collection of sources to find truth
Data Fushion (techniques) (CSV)
Big Data Processing Pipeline 3 steps