What is regression?
Predicted variable (dependent) = ^y (yhead) Other variables (explanatory variables)
Whats the difference between regression and classification?
Difference to classification is that classification predicts nominal class attributes whereas regression predicts continuous attributes
What are regression techniques?
1) Linear Regression
2) Polynomial Regression
3) Local Regression
4) ANN
5) DNN
6) K-nearest-Neighbours Regression
Explain k-nearest neighbor regression
Choose k values between 1 and 20 (Rule of thumb)
How can you evaluate regression models?
Methods for Model Evaluation:
What metrics for Model Evaluation can be applied?
How do you interpret R2
R2 = 1; perfect model as total variation of y can be completely explained from X
How can you apply regression trees?
What may happen if your tree has a higher depth?
- The model learns several outliers
What is the assumption of linear regression?
How do you fit a regression function?
Least-squares approach: Find the weight vector that minimizes the sum of squared error for all training examples
Error: Difference between estimated and real value in training data
What is ridge regularization?
alpha of 0 = normal least squares regression
alpha of 100 = strongly regularized flat curve (strong penalty)
What problems can occur by feature selection for regression?
Problem 1: Highly correlated variables (height in cm and inch)
- weights are meaningless, one variable should be removed
Problem 2: Insignificant variables
- uncorrelated variables get w=0 or relatively small weights assigned
How can you check if a variable with a small weight really is insignificant?
What does Interpolation mean?
Interpolating regression:
What does extrapolation mean?
Extrapolating regression:
An explanatory variable that is out of range could result in a predicted dependent variable out of range
How can the results of a linear regression and a K-NN regression be described?
–> Linear regression is sensitive to outliers
Which technique can be applied to non-linear problems?
Explain polynomial regression
Where does a polynomial regression often overfits and which workaround can you apply to mitigate overfitting?
Workarounds:
What is the idea behind local regression?
Assumption: non-linear problems are approximately linear in local areas
Idea: use linear regression locally for the data point at hand (lazy learning)
- Combination of k-NN and linear regression
How does local regression work?
Given a datapoint:
1) retrieve the k nearest neighbors
2) learn a regression model for those neighbors
3) use the learned model to predict the y value
What are the advantages of local regression?
Advantage: fits non-linear models well
What are the disadvantages of local regression?
Disadvantage: