Chapter 6.3: Cross-validation, boosttrap and tree-based methods - Boosting Flashcards

Question 1

Q

Recap:

What is PAC(strong) learnability?

What is weak PAC learning?

Which one must imply the other?

Answer

A

The question becomes, can we turn a weak PAC learning algorithm into a strong one

Question 2

Q

What is the difference between Bagging and Boosting?

Answer

A

Bagging is done in parallel, boosting is done sequentially

Question 3

Q

What was Robert Schapire’s rough sketch of how to implement boosting?

Answer

A

f-hat³ acts as the referee when the other two dont agrees, whatever it says will become what we classify it as due to majority vote.
in our case they are NOT independent (as we are training the classifier on each point misclassified by the previous)

Question 4

Q

What is the adaptive boosting (AdaBoost) algorithm?

Answer

A

So for each iteration, you minimise the weighted empirical risk, then you assign some sort of contribution factor to that classification function to determine your final weighted classification function

Question 5

Q

What loss function do you use here for AdaBoost?

Answer

A

weights go up or down based on whether the classifier has the same sign as Y??

** CHECK THIS

Question 6

Q

Is there a closed-form solution

Question 7

Q

What is a decision stump?

Answer

A

decision stump –> decision tree with only one split

Question 8

Q

What are the properties of AdaBoost?

What is the VC dimension of boosting combinations of m weak classifiers?

Question 9

Q

If we can use AdaBoosting in the discrete classification setting, what do we use in the continuous regression setting?

What is the general idea behind this process?

Answer

A

With Gradient descent

Question 10

Q

What is the idea behind Gradient Boosting?

Answer

A

Note that for the squared loss function, we know the theoretical risk minimiser is f* = E[Y|X] –> Setting Y equal to the partial derivative essential guarentees we are minimising the risk under the l².
It is called a pseudo residual as L=(Y-θ)^2 so taking the partial derivative gives dL/dθ = 2(Y-θ) = 2 (f-hat(X)-Y) –> so you are not looking at the residual itse lf but a measure between your Y and your previously fitted function

Question 11

Q

What is the Gradient Boost Algorithm?

Answer

A

At some point, you are going to risk overfitting if you continue, as you will keep fitting to the residuals, which reduces the bias but increases the variance

Question 12

Q

What are the two ensemble methods we have considered in the course, and what are their focuses?

Answer

A

Bagging can be done in parallel, but boosting must be done sequentially

Chapter 6.3: Cross-validation, boosttrap and tree-based methods - Boosting Flashcards

(12 cards)