Part 2: Decision Trees Flashcards

Question 1

Q

Decision tree

Answer

A

A decision tree is used to predict the outcome. There are as many splits as attributes. After determining which the possible splits are, the best splits have to be determined to end up with pure nodes. To make this possible, classes are used. To end up with pure nodes, information gain and entropy is used.

Question 2

Q

Entropy (E)

Answer

A

Measures the uncertainty. When you have a pure node, E = 0. When there is a high uncertainty E = 1.

Question 3

Q

Information gain (I)

Answer

A

Measures the increase/decrease in uncertainty given certain information.

Question 4

Q

Theory regarding entropy

Answer

A

New entropy is always less than the original entropy, which causes an information gain.

Question 5

Q

Split on attributes

Answer

A

Number of variables - the splitting is.
+ Univariate - only one variable is tested.
+ Multivariate - more than one variable is tested at once.
Type of splitting variable(s)
+ Nominal - e.g., temp.=cool, temp.=mild, temp.=hot
+ Numerical - e.g., temp.<30, temp. >30.
Number of outcome splits
+ Two (binary) - trees with only binary splits are called binary trees.
+ Multiway split

Question 6

Q

Decision boundary

Answer

A

Borderline between two neighbouring regions of different classes. Decision boundary is parallel to axes because test condition involves a single attribute at-a-time.

Question 7

Q

Types of decision trees

Answer

A

Type of predicted variable (end node):

Nominal (class) = majority of label of instances in node Classification trees.
Numerical value = average value of instances in node Regression trees.
Equation = regression equation with least squares fir Model trees.

Question 8

Q

Tree induction

Answer

A

Strategy
+ Split the records based on an attribute test that optimizes certain criterion (information gain).
Important issues
+ Determine how to split records
+ How to specify the attribute test condition?
+ How to determine the best split?
+ Determine when to stop splitting. This is also important to keep in mind when there is noice in the data. Data with noise doesn’t have pure nodes, so if you want to reach pure nodes, you’re modelling noise.

Question 9

Q

Measures for node impurity (a.k.a. when to stop splitting)

Answer

A

Misclassification error.
Entropy (this is the preferred option because in general this provides the smallest trees).
Gini index

Question 10

Q

Quality of a split

Answer

A

Objective: obtaining pure nodes, i.e., nodes that contain objects from a single class (if node is impure, take majority class as label).
- Measures for impurity
\+ Misclassification error - fraction of objects that are classified correctly and part incorrectly if we assign every object to the majority class in that node.

Question 11

Q

Splitting based on classification error

Answer

A

Classification error at a node t:
- Error(t) = 1 - maxP(i|t)
Measures misclassification error made by a node
- Maximum (1-1/nc), when records are equally distributed among all classes.
- Minimum (0.0) when all records belong to one class pure node.

Question 12

Q

For efficient computation

Answer

A

For each attribute:

Sort the attribute on values.
Linearly scan these values, each time updating the count matrix and computing entropy or index.
Choose the split position that has the least entropy or index.

Question 13

Q

Regression tree overfitting

Answer

A

Divide the x axis in parts in such a way that in each part the value of the node is equal to the average of the red points. By continuing splitting you can fit the red points arbitrarily well.

2 splits
4 splits
6 splits -> very sensitive to noise.

Question 14

Q

Growing the tree

Answer

A

Using the training data:
- Split nodes base on impurity criterium.
- Until all leave nodes are pure (impurity = 0)
Problem: if the tree is too large, the model also contains noise of the training data.

Question 15

Q

Underfitting

Answer

A

When model is too simple, both training and test errors are large.

Question 16

Q

Overfitting

Answer

Study These Flashcards

A

When model is too complex, training errors are small and test errors are large:
- Overfitting results in decision trees that are more complex than necessary.
- Training error no longer provides a good estimate of how well the tree will perform on previously unseen records.
- Solutions:
Occam’s Razor principle:
+ Given two models of similar generalization errors, one should prefer the simpler model over the more complex model.
+ For complex models, there is a greater chance that it was fitted accidentally by errors in data.
Possible solution:
- Include model complexity when evaluating a model.

Question 17

Q

Stopping criteria

Answer

Study These Flashcards

A

Stop splitting if information gain is too low < threshold.
Stop splitting if the error on the validation set goes up.
Problem: in a later stage you might get higher gain or lower error.

2 systematic methods:

Quinlan in C4.5: reduced error pruning
Breiman in CART: cost complexity pruning

Question 18

Q

Tree pruning

Answer

Study These Flashcards

A

Reduced error pruning: Make inner node into leaf node if error on validation set decreases, continue until no decrease possible.
Cost complexity pruning: just grow the tree until impurity is zero, and then prune back using cost complexity measure.

Part 2: Decision Trees Flashcards

(18 cards)