Name some properties of good features in images
How can the pixel values of a patch be calculated using the integral image
Bottom right corner - bottom left - top right + top left
What is the idea behind ensemble learning.
Aggregate the results of several predictors into one prediction
In what case will ensamble learning not lead to improved results over one predictor.
If all predictors are higly correlated, so they give the same output.
What kind of error do sequential parallel learners reduce?
Sequential learners reduce mostly bias, while parallel reduce variance.
Name 4 different ensamble methods, and which of them are parallel/ sequential.
Parallel:
Bagging ( Bootstrap aggregation)
Voting
Random forrests
Sequential:
Boosting
How does Bagging work?
Create N versions of the training set, using sampling with replacement and train a weak learner on each one. Use Averaging for regression, and majority voting for classification.
What is Out-Of-Bag error for bagging?
For each training sample, xi:
What is boosting?
We compute classifiers sequential, by increasing the weight of missclassified examples each run.
Describe the adaboost algorithm
How can we calculate the probability that a decision tree prediction is correct?
We know how many samples from each class of the test set each node(prediction) contained.
Describe the decision tree optimization for creating a feature space partition
At each node Sj
for each feature
for each value of this feature:
evaulate I(Sj, Aj)
chose the best feature and value for splitting
reapeatWhat are the two choices for categorizing tree optimization cost functions, I(Sj, Aj)?
2. Gini Index.
Desribe the information gain cost function, I(Sj, Aj)
I(Sj, Aj) = Entropy parent - weighted average entropy of children.
Entropy = - sum p(xj) log(p(xj))
Describe the Gini Index
Indicates how mixed classes are, perfect seperation results in score 0, 50/50 seperation results in score 0.5.
Gini = 1 - sum (p(yk))**2 Final = weighted average of the Ginis
What loss do we normally use for regression trees?
weighted average of MSE
What can we do to prevent overfitting in decision trees?
Combine them to an esamble (Forests).
For random forrests, how does the: 1. Number of features selected per node 2. Number of trees 3. Max depth Affect bias and variance?
What is the difference between a random forrest and a boosting algorithm with decision trees as weak learners?
The trees differ as the splitting is randomized.
Name some advantages of decision trees and dissadvantages of decissions trees?
Advantages:
Disadvantages: