What is overfitting?
What is underfitting
What is an alternative term for overfitting?
high variance
because they can end up with highly variable predictions
What is an alternative term for underfitting?
high bias
What are approaches to deal with overfitting?
Describe how collecting more data can help with overfitting
Describe how feature selection can help with overfitting
Describe how regularization techniques can help with overfitting
How does regularization work?
What is the basic idea of regularization?
How is regularization usually implemented?
J(w,b) = 1/2m * Sum over m of (f_wb*(xi)-yi)² + (lambda / 2m) * Sum over n of w_j²
n - features
m - examples
lambda - regularization parameter, >0
By scaling both lambda and the first part of the function (1/2m) its easier to choose a correct lambda
Lambda balances the goal of fitting data and of keeping w_j small
What happens when lambda is 0 or extremely large?
What changes with regularization for linear regression in terms of gradient descent?
b does not have to be regularized and remains the same for updated b
updated w = (1/m) * Sum over m of (f_wb(xi)-yi)x_j**i + (lambda/m)w_j
What changes with regularization for logistic regression in terms of gradient descent and cost function?
cost function of logistic regression gets added the following term:
+ (lambda/2m) * Sum over n of w_j²
gradient descent for w derivative gets added:
+ (lambda/m) * w_j
-> similiar to regularized linear regression but f(x) is a different function
b will not be regularized