Bagging - what is it? when do we do it?
Bootstrap Aggregating. In bootstrap, we replicate our dataset by sampling with replacement. In bagging , we average the predictions of a model fit to many bootstrap samples.
when do we bag?
Bagging Decision Trees
disadvantage: each tree you make from a BS sample could be completely different
1. for each predictor, add up the total amt by which the RSS (or Gini index) decreases every time we use the predictor in Tb
2. average this total over each BS estimate
Out of Bag Error
to estimate the test error of a bagging estimate, we could use CV, but each time we do BS we only use ~ 63% of the data.
Idea: use the rest of the observations as a test set
The problem with bagging trees
the trees by different BS samples can be similar
Random Forest
Boosting (don’t really understand this)
set fhat(x) = 0, and ri = yi for i = 1, ..., n for b = 1 to B iterate: - fit a decision tree fhat_b with d splits to the response r1,...,rn - update the prediction to fhat(x) <- r_i - lambda fhat_b (x) output the final model fhat(x) = sum[lambda*fhat_b(x)
boosting learns slowly: we first use the samples that are easiest to predict, we then downweigh these cases and move on to harder cases
tuning using lambda, d, and B