What is overfitting?
An overfit model looks great on the training data and then performs poorly on new data.
Training Error: Model’s prediction error for the training data.
Generalisation Error: Model’s prediction error for new data.
Usually, the training error will be smaller than the generalisation error (no big surprise). Ideally, though, the two errors should be close to each other.
If the generalisation error is large and your model’s test performance is poor while your training error is small, then your model has probably overfit the training data.
Why is overfitting bad, what is preferred?
An overfitting model has memorised the training data instead of discovering generalisable rules or patterns.
Simpler models are preferred as they tend to generalise better and avoid overfitting.
What is log likelihood?
Log likelihood is a measure (a non-positive number) of how well a model’s predictions “match” the true class labels.
Why is a larger log likelihood better>
The larger the magnitude of the log likelihood, the worse the match. So we prefer a higher log likelihood (as it is non-positive), that is close to 0.
The log likelihood of a model’s prediction on a specific instance is the logarithm of the probability that the model assigns to the instance’s actual class.
What is deviance?
The deviance is defined as: measures how far your model is from a perfect model given by
−2 × (logLikelihood − S),
where S is a technical constant called “the log likelihood of the saturated model.”
In most cases, the saturated model is a perfect model that returns probability 1 for items in the class and probability 0 for items not in the class (so S = 0).
The lower the deviance, the better the model.
What is Akaike Infromation Criterion (AIC)?
AIC is defined as
deviance + 2 × numberOfParameters
The more parameters are in the model, the more complex the model is;
the more complex a model is, the more likely it is to overfit.
Thus, AIC is deviance penalised for model complexity.
When comparing models (on the same test set), you will generally prefer
the model with a smaller AIC.
What is AIC useful for?
The AIC is useful for comparing models with different measures of complexity
and modelling variables with differing numbers of levels.
How can a model be scored using AIC?
A model can be scored with
- a bonus proportional to its scaled log likelihood on the calibration data
- minus a penalty proportional to the complexity of the model.
What are evaluation method?
When we want to find the best single-variable model, we can use evaluation methods such as log likelihood and deviance
- It needs to have the largest log likelihood
- It needs to have the smallest deviance
What are the steps to the evaluation method when checking how good a model is?
Compute the log likelihood
Run through categorical variables: pick variables based ont he reduction on the deviance will respect to the Null deviance
Run through the numerical values
REFER TO SLIDES FOR CODE EXAMPLES
What are Decision tree models?
Decision trees are a simple model type – they make a prediction that is piecewise constant.
- The construction of a decision tree involves splitting the training data into pieces and using a simple constant on each piece.
Decision trees can be used to quickly predict categorical or numeric outcomes.
How do decision trees work?
Decision trees are binary trees. A decision tree is built by iteratively
finding the optimal feature out of all the features and the optimal threshold to split a node of the tree, e.g., at the root node, Age is the optimal feature and 27 is the optimal threshold found by the decision tree’s algorithm.
This splitting process results in training instances being divided and passed down the branches to the two child nodes. As we progress down the decision tree, there are fewer and fewer training instances in each node.
How are splitting of nodes terminated in decision trees?
The splitting of a node can be terminated by any of the following criteria:
- the node contains only instances of the same class;
- the node is at the pre-defined maximum depth value for the tree;
- the node has too few instances for further splitting.
What is the objective of decision trees?
We can also consider that the objective behind decision tree methods is to partition the feature space into homogeneous regions (i.e., having instances belonging to one class only) as much as possible. Also, the regions should not be narrow and long.
What are different types of decision trees?
Notable Decision Tree algorithms include:
- ID3 (Iterative Dichotomiser 3)
- C4.5 (successor of ID3)
- CART (Classification And Regression Tree)
- CHAID (CHi-squared Automatic Interaction Detector)
- MARS: extends decision trees to handle numerical data better.
- Conditional Inference Trees.
What are the pros of decision tres?
What are the cons of decision trees?
How to build a decision tree?
We use a library such as rpart()
What can we do if the model looks too good on the training data and not as good on the calibration and test data?
What we can do to work around this is fit on our reprocessed variables, which hide the categorical levels (replacing them with numeric predictions), and remove NAs (treating them as just another level).
If it is still performing quite poorly on the calibration data we turn our suspicion to overfitting
What hyperparameters can help with help inprove the AUC of the decision tree model?
By setting the minsplit, minbucket, and maxdepth hyperparameters appropriately helped to improve only the AUC of the model. The next thing that we try is using only the reprocessed numerical variables that achieved high AUC scores.
How to interpret a decision tree?
If there is poor performance of a decision tree, what other model can we use?
The best guess is that this dataset is unsuitable for decision trees and a method that deals better with overfitting issues is needed – such as random forests.
What are KNNs?
kNN: predicting a property of a datum based on the datum or data that are most similar to it. It can be used for regression and multi-class classification. For example,
How do KNNs work?