Data science tasks can be split into two groups
Unsupervised Methods
Supervised Methods
- Causal (Explanatory) Modeling
Predictive modeling
is the process of applying a statistical model or data mining algorithm to data for the purpose of predicting new or future observations.
Predictive modeling is a method for estimating an unknown value of interest, which is called target
Causal (Explanatory) Modeling
is the use of statistical models for explaining how the world works (by testing causal explanations)
Why Empirical Explanation and Empirical Prediction Differ?
Linear Regression is…
an approach for modeling the relationship between a dependent variable and one or more explanatory variables
Logistic regression can also be used for classification
Decision Trees
The process of recursively segmenting the population stops when at least one of the following conditions is met:
How to evaluate if a model is a good model? How do we compare two models?
It is important to consider carefully what would we like to achieve when building a model
This seems simple but many times we see some statistics reported without a clear understanding why this is the right statistic
Now: Basic performance metrics
Later: More sensible performance metrics
Accuracy is …
the proportion of correct decisions made by the classifier.
Accuracy is a popular metric because …
it is very simple to calculate
Error rate is …
the proportion or wrong decisions made by the classifier.