Explain Boosting
Aim: reduce error rate by putting more weights on previously incorrectly classified
Explain classification trees
Build tree:
- decide which variable Xj to split and at which value s (and binary or ternary split) by minimising
min(j,s) { min(c1) {sum (Yi_R1 - c1^)^2} + min(c2) {sum (Yi_R2 - c2^)^2} } where c1^ and c2^ are respectively the centroids of R1 and R2
- stop when sum of squares is not smaller and get the final M leaves
- predict value
Explain classification trees
Build tree using train data:
- decide which variable Xj to split and at which value s (and binary or ternary split) by minimising
min(j,s) { min(c1) {sum (Yi_R1 - c1^)^2} + min(c2) {sum (Yi_R2 - c2^)^2} } where c1^ and c2^ are respectively the centroids of R1 and R2
- stop when sum of squares is not smaller and get the final M leaves
- or stop using cost-compl
Predict value:
- the predicted value yi^ is the centroid c^j of the leaf Rj when xi ∈ Rj
Pros and cons of Classification and Regression Tree (CART)
Pros: fast and simple method
Cons: lack of continuity (very volatile) and inefficient in some cases
Explain cost-complex pruning
Explain Bootstrap and Bagging
Aim : get sampling distribution statistical information (mean, variance,…) without making strong assumptions on Xi or F.
Bagging:
f^bag(x) = 1/B sum f^b(x)
Explain Random Forest
Random forest is similar to Bootstrap but adding a step which takes a sample of variables to choose from at each node