Difference between prediction and inference?
minimize prediction error (ŷ)
create interpretable model (f(x))
What is a parametric approach?
What is supervised learning?
for each observation of a set of features Xᵢ there is a measured response yᵢ
Difference regression- and classification problem?
r: quantitative response
c: qualitative response
What is the variance of a model?
how much f(x) changes with different sets of training data
What is the bias of a model
the error introduced by approximation and simplification
What is the error rate?
incorrect predictions / total number of predictions
What is the accuracy?
1 - error rate
What is the bayes-classifier?
the classifier with the smallest probability of misclassification given the same set of predictors (benchmark)
K-nearest neighbours classification:
identify k-nearest neighbours, then estimate conditional probability for class j, assign x to class with highest probability
Residual Sum off Squares (RSS)
RSS=1∑n (yᵢ−ŷᵢ)²
Standard Error (SE(ß))
SE(ß) = σ² / 1∑n (xᵢ - x̄)²
F-test formula
F = ((TSS - RSS) / p) / RSS / (n-p-1)
if F > F crit reject H0
Forward Selection:
starting from null model, the variable associated with the lowest RSS for its linear model is added, this continues until a stopping rule is satisfied
Backward Selection
Starting with a full model, step by step the variable with the largest p-value is removed until a stopping condition is reached
Mixed Selection:
like forward selection but if at any point a p-value exceeds a threshold the corresponding variable is removed, this procedure continues until each variable is either inside the model or all variables outside the model were discarded
Polynomial regression
Y = ß0 + ßvariable + ßvariable² …
in R: y ~ poly(variable, n)
with n being the highest order polynomial
Heteroscedasticity
the variance of the response variable is not constant
εᵢ is dependent on yᵢ
standardization
rescale data to have mean 0 and standard deviation 1.
xnew = (xᵢ – x̄) / s
s = standard deviation
normalization
rescaling the data so that every observation falls between 0 an 1
xnew = (xi – xmin) / (xmax – xmin)
Outlier
observation with unusual and significantly different response yᵢ
High leverage point
observation with an unusual set of features that has “more weight” in determining the model due to its distance from other observations
Variance inflation factor
measures collinearity: 1 / 1 - R²
where R² is the R² of a regression from Xj onto all other predictors
1 = no multicollinearity, >5 = problematic
Logistic function
p(x) = e^f(x) / 1 + e^f(x)
p(x) is between 0 and 1 -> probability