Bias-Variance tradeoff
When trying the optimal model we are in fact trying to find…
the optimal tradeoff between bias and variance
Bias-Variance tradeoff
We can reduce variance by
by putting many models together and aggregating their outcomes
Bagging (or bootstrap aggregation) creates
multiple data sets from the original training data by bootstrapping – re-sample with repetition.
Runs several models and aggregates output with a voting system
Other ensemble methods
Random Forest
combines bagging with random selection of features (or predictors)
Other ensemble methods
Boosting
applies classifiers sequentially, assigning higher weights to observations that have been mis-classified by the previous methods
A table model
memorizes the training data and performs no generalization
Useless in practice! Previously unseen customers would all end up with
“0% likelihood of churning”
Generalization
is the property of a model or modeling process whereby
the model applies to data that were not used to build the model
If models do not generalize at all, they fit perfectly to the training data !
–> they overfit
Overfitting
is the tendency to tailor models to the training data, at the expense of generalization to previously unseen data points.
Holdout Validation
As a model gets more complex, it is allowed to pick up harmful spurious correlations
This phenomenon is not particular to decision trees
Simplest method to limit tree size:
specify a minimum number of instances that must be present in a leaf
Just as with trees, as you increase the dimensionality,
you can perfectly fit larger and larger sets of arbitrary points
Why is overfitting bad?
A small imbalance in the training data can be ’learned’ by the tree and erroneously propagated
Why is the phenomenon of overfitting not particular to decision trees
- There is no general analytic way to avoid overfitting