Types of structured data
Characteristics of unstructured data
Types of variables
Types of ML techniques
Types of missing values in data cleaning
Missing values can be …
MCAR: completely random and unpredictable
MAR: predictable
Missing, dependent on unobserved variables
—> Imputation (substitution) by Mean, Median, Stratified (sorting), Regression
Big Data Characteristics
ML vs. Statistics
Statistics vs. ML
Building a model
Partition Data set
( Training, validation, Test )
Cross-Industry Standard Process for Data Mining CRISP-DM:
Business process understanding
Data understanding
Data preparation
Modelling
Evaluation
Deployment
Applicatipon of ML
Prediction (financial markets)
Insurance
Credit Scoring
Fraud Detection
Consumer Credit and Marketing, CRM (Classification?)