Chapter 6.2 - Cross validation, bootstrap and tree-based methods - Tree-based methods Flashcards

(16 cards)

1
Q

What are tree-based methods?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do we write a classification problem of a regression tree in formal notation?

A
  • A deep tree that has 100 leaves may only have one or two observations in each leaf –> thus do not generalise well
  • A shallow tree that only has two leaves may not give us enough detailed insight into true classifications
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we go about finding cj?

A

Target for regression:

  • sum over all term regions, then look at all observations that fall into those regions, then calculate the L2 loss for those observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do we go about finding the partition order and deciding how the tree splits?

A
  • why computationally intractable –> say we have a feature vector of length p. then if you want to find the optimal partition to minimise the the RSS then the computation cost is exp(p) –> which is going to become huge even when p is moderate
  • Greedy: at the current time, look at what feature to add in or what split to take that has the best outcome currently (not the best outcome in the future or overall)
    • Risk of this is you can end up with a model that is very far away from being the optimal one
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we apply the top-down greed approach to build a tree model?

A
  • In the bracket, we are basically calculating the minimal overall loss of all observations in the “left” region plus the “right” region. –> we set the estimators of c1 and c2 to be the average of the Yi’s of the X’s belonging to that region
  • There are only p variables to look at( so p possibilities for j) and at most n cut points (s). –> so computation cost is O(n *p) (CHECK THIS)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the general process of splitting the predictor space?

A
  1. You do the analysis to decide which feature you split and where
  2. Then you can decide whether to start to split the left space or the right space next.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we deal with the problem of overfitting when creating a tree through the top-down greedy approach?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we go about choosing the optimal lambda (optimal subtree)?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the entire algorithm for building a tree regression?

A
  • Note: a tree with more splits will always have a lower training error (overfitting issue)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do tree methods change for classification problems?

A

l-hat(j) is the majority vote class that lies in the j-th region

  • works well for a binary-classification problem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What other methods of loss do we use for multi-classification problems?

A
  • Gini measure purity or inequality – > pure class means that the leaf contains only all the observations of a single class –> will get a Gini coefficient of 0
  • This implies the pure the class the lower the Gini, and the more uniform the class spread the higher the Gini values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the advantages and disadvantages of the trees?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do we deal with the overfitting issue of trees?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do we perform Bootstrap aggregation or Bagging?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Does bagging actually reduce variance?

A
  • If there is correlation, we aren’t able to reduce variance as much if at all, when compared to the ideal independence sample case
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can we improve on this for tree models?