Validation set method
Split Data into training and test set, fitting the model on the training set and calculating MSE on the validation set.
Haldout Method
Perform validation set method several times and choose the model with the best validation error
Validation error
The perdiction error calculated on a test set
Validation set disadvantes
- Validation error highly depended on initial randomness of validation sample
LOOCV
Leave one out Cross Validation
Leave one out Cross Validation
Validation set method cost
Cheap
LOOCV cost
Expensive
K-Fold Cross validation
Bias of K-Fold validation error
Nested K-Fold Validation
Select model with K-Fold and report error of selected model on test set
Temporal Data
Be carefull not to include data from any point leter than what the model should predict
Sub selection
Try different subsets of features and seöect the subset with the best validation error
Feature
Input variables
Dimensional Reduction
transform features into smaller feature spaces
Regularization
Add punishment term for large coefficients
Target variable
Y variable the model should predict
(x1, y1), (X2, y2),…,(Xn, yn)
data points
x1, x2, … ,xn
feature vector
y1, y2, …, yn
target variable: output of the model
Hyperplane
In a p-dimensional space, a hyperplane is a flat affine subspace of dimension p-1
Seperating Hyperplanes
classifier margin
width that the seperating hyperplane could be increased by without hitting a new datapoint
Maximum margin classifier
The seperating hyperplane with the largest possible margin