Chapter 6.3: Cross-validation, boosttrap and tree-based methods - Boosting Flashcards

(12 cards)

1
Q

Recap:

What is PAC(strong) learnability?

What is weak PAC learning?

Which one must imply the other?

A

The question becomes, can we turn a weak PAC learning algorithm into a strong one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between Bagging and Boosting?

A

Bagging is done in parallel, boosting is done sequentially

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What was Robert Schapire’s rough sketch of how to implement boosting?

A
  • f-hat3 acts as the referee when the other two dont agrees, whatever it says will become what we classify it as due to majority vote.
  • in our case they are NOT independent (as we are training the classifier on each point misclassified by the previous)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the adaptive boosting (AdaBoost) algorithm?

A

So for each iteration, you minimise the weighted empirical risk, then you assign some sort of contribution factor to that classification function to determine your final weighted classification function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What loss function do you use here for AdaBoost?

A
  • weights go up or down based on whether the classifier has the same sign as Y??

** CHECK THIS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Is there a closed-form solution

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a decision stump?

A

decision stump –> decision tree with only one split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the properties of AdaBoost?

What is the VC dimension of boosting combinations of m weak classifiers?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If we can use AdaBoosting in the discrete classification setting, what do we use in the continuous regression setting?

What is the general idea behind this process?

A
  • With Gradient descent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the idea behind Gradient Boosting?

A
  • Note that for the squared loss function, we know the theoretical risk minimiser is f* = E[Y|X] –> Setting Y equal to the partial derivative essential guarentees we are minimising the risk under the l2.
  • It is called a pseudo residual as L=(Y-θ)^2 so taking the partial derivative gives dL/dθ = 2(Y-θ) = 2 (f-hat(X)-Y) –> so you are not looking at the residual itse lf but a measure between your Y and your previously fitted function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the Gradient Boost Algorithm?

A
  • At some point, you are going to risk overfitting if you continue, as you will keep fitting to the residuals, which reduces the bias but increases the variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two ensemble methods we have considered in the course, and what are their focuses?

A

Bagging can be done in parallel, but boosting must be done sequentially

How well did you know this?
1
Not at all
2
3
4
5
Perfectly