Explain the cross validation approach
What are some drawbacks of CV?
Explain Leave-One-Out Cross-Validation
This is where the test data set consists of only one data point.
We train the data on the n-1 data set and repeat this n times, until all data points have been used as the validation set.
The LOOCV estimate for the test MSE is then the average of all test errors.
Explain k-fold CV
This involves splitting the dataset into K subsets, and utilising only one of the subsets at the test dataset.
Then this is repeated K times until all K subsets have been used as the test.
The test error is averaged from the K MSE estimates.
K is typically 5 or 10
Where is the best model complexity, ABC, based on the train and testing error? Why?
A. We want the testing error to be as low as possible. This is where the model is generalised well to new unseen data.
What is the bootstrapping method?