What is the ID3 algorithm?
Split (node, {example}):
What is the definition of Entropy?
H(S) = -p(+) log2 p(+) - p(-) log2 p(-)
Where p(+) and p(+) are % positive/negative examples in S
How can we interpret entropy?
How many bits needed to tell if X is positive or negative
How do we compute the expected drop in entropy (gain)?

What does gain tell us?
Average entropy after the split, biased to larger subsets
What is infomation gain?
Difference between the entopy before the split and after the split
What does infomation gain tell you?
How much more certain you are after a split
What happens if you run ID3 to completion?
All subsets will be pure
How to Decision trees overfit?
Running ID3 to completion, end up with lots of singleton subsets, not alot of confidence in estimates (only 1 traning example).
How can we avoid overfitting (decision trees)?
How is split entropy defined?

What do we use split entropy for
Normalize infomation gain by how fine grained the split is
Definition of gain ratio?

What does GainRation penalize?
Attributes with many values
What is the problem with infomation gain?
Biased towards attributes with many values (they create lots of small pure subsets)
Whats unique about DT?
Not a black box, you can interpet rules of the tree
How can we expland DT to multi class?
How can we expand DT to regression?
Whats are the pros of DT?
What are the cons of DT?
How do you create a random decision forest?
How do you classify an example with random forrests?
What does entropy measure?
How pure/inpure a subset is