Chapter 6.2 - Cross validation, bootstrap and tree-based methods - Tree-based methods Flashcards by Dylan Ottey

What are tree-based methods?

How well did you know this?

Not at all

Perfectly

How do we write a classification problem of a regression tree in formal notation?

A deep tree that has 100 leaves may only have one or two observations in each leaf –> thus do not generalise well
A shallow tree that only has two leaves may not give us enough detailed insight into true classifications

How well did you know this?

Not at all

Perfectly

How do we go about finding c_j?

Target for regression:

sum over all term regions, then look at all observations that fall into those regions, then calculate the L₂ loss for those observations

How well did you know this?

Not at all

Perfectly

How do we go about finding the partition order and deciding how the tree splits?

why computationally intractable –> say we have a feature vector of length p. then if you want to find the optimal partition to minimise the the RSS then the computation cost is exp(p) –> which is going to become huge even when p is moderate
Greedy: at the current time, look at what feature to add in or what split to take that has the best outcome currently (not the best outcome in the future or overall)
- Risk of this is you can end up with a model that is very far away from being the optimal one

How well did you know this?

Not at all

Perfectly

How do we apply the top-down greed approach to build a tree model?

In the bracket, we are basically calculating the minimal overall loss of all observations in the “left” region plus the “right” region. –> we set the estimators of c₁ and c₂ to be the average of the Y_i’s of the X’s belonging to that region
There are only p variables to look at( so p possibilities for j) and at most n cut points (s). –> so computation cost is O(n *p) (CHECK THIS)

How well did you know this?

Not at all

Perfectly

What is the general process of splitting the predictor space?

You do the analysis to decide which feature you split and where
Then you can decide whether to start to split the left space or the right space next.

How well did you know this?

Not at all

Perfectly

How do we deal with the problem of overfitting when creating a tree through the top-down greedy approach?

How well did you know this?

Not at all

Perfectly

How do we go about choosing the optimal lambda (optimal subtree)?

How well did you know this?

Not at all

Perfectly

What is the entire algorithm for building a tree regression?

Note: a tree with more splits will always have a lower training error (overfitting issue)

How well did you know this?

Not at all

Perfectly

How do tree methods change for classification problems?

l-hat(j) is the majority vote class that lies in the j-th region

works well for a binary-classification problem

How well did you know this?

Not at all

Perfectly

What other methods of loss do we use for multi-classification problems?

Gini measure purity or inequality – > pure class means that the leaf contains only all the observations of a single class –> will get a Gini coefficient of 0
This implies the pure the class the lower the Gini, and the more uniform the class spread the higher the Gini values