Recap:
What is PAC(strong) learnability?
What is weak PAC learning?
Which one must imply the other?
The question becomes, can we turn a weak PAC learning algorithm into a strong one
What is the difference between Bagging and Boosting?
Bagging is done in parallel, boosting is done sequentially
What was Robert Schapire’s rough sketch of how to implement boosting?
What is the adaptive boosting (AdaBoost) algorithm?
So for each iteration, you minimise the weighted empirical risk, then you assign some sort of contribution factor to that classification function to determine your final weighted classification function
What loss function do you use here for AdaBoost?
** CHECK THIS
Is there a closed-form solution
What is a decision stump?
decision stump –> decision tree with only one split
What are the properties of AdaBoost?
What is the VC dimension of boosting combinations of m weak classifiers?
If we can use AdaBoosting in the discrete classification setting, what do we use in the continuous regression setting?
What is the general idea behind this process?
What is the idea behind Gradient Boosting?
What is the Gradient Boost Algorithm?
What are the two ensemble methods we have considered in the course, and what are their focuses?
Bagging can be done in parallel, but boosting must be done sequentially