Overfitting and Regularization Flashcards

Question 1

Q

What is overfitting?

Answer

A

occurs when a model learns the data too well
capturing noise and fluctuations, leading to poor generalization on new data
might result in the cost function being exactly zero for the training data
also called high variance
might happen with too many features

Question 2

Q

What is underfitting

Answer

A

a model does not fit the training data well
also called high bias (technical context)
unable to capture the pattern in the training data
might happen with too little features

Question 3

Q

What is an alternative term for overfitting?

Answer

A

high variance
because they can end up with highly variable predictions

Question 4

Q

What is an alternative term for underfitting?

Answer

A

high bias

Question 5

Q

What are approaches to deal with overfitting?

Answer

A

collecting more data
feature selection
regularization techniques

Question 6

Q

Describe how collecting more data can help with overfitting

Answer

A

larger training set will make the algorithm learn a function that is less wringly / has less variance

Question 7

Q

Describe how feature selection can help with overfitting

Answer

A

selecting the features that have the highest relevance to the function / subject
a lot of feature + not enough data might lead to overfitting
one way to do so is using the intuition of what is relevant
but some information about the subject are thrown away (everything not selected)

Question 8

Q

Describe how regularization techniques can help with overfitting

Answer

A

eliminating a feature is equal to setting the feature to 0
regularization isntead encourages the model to still keep the other features but only set them to a very small value like 0.00001
lets you keep all your features
keeps features from having an overly large effect
by convention we only regularize the parameters w but not b
should make little difference

Question 9

Q

How does regularization work?

Answer

A

modified cost function which is to be minimized with a penalty for large parameter values
min_wb = 1/2m * Sum of (f_wb(xi) - yi)2 + 1000 w32 + 1000 w4**2
in order to minimze this cost function the model needs to choose very small values for w3 and w4

Question 10

Q

What is the basic idea of regularization?

Answer

A

a simpler model is less likely to overfit and therefore smaller values for parameters can be valuable
typically all w_j parameters are penalized

Question 11

Q

How is regularization usually implemented?

Answer

A

J(w,b) = 1/2m * Sum over m of (f_wb*(xi)-yi)² + (lambda / 2m) * Sum over n of w_j²

n - features
m - examples
lambda - regularization parameter, >0
By scaling both lambda and the first part of the function (1/2m) its easier to choose a correct lambda
Lambda balances the goal of fitting data and of keeping w_j small

Question 12

Q

What happens when lambda is 0 or extremely large?

Answer

A

lambda = 0 -> overfitting,
lambda = 10¹⁰ -> all features are close to 0 -> graph is a horizontal line, underfitting

Question 13

Q

What changes with regularization for linear regression in terms of gradient descent?

Answer

A

calculation for w and b remains basically the same except for the derivative of w / b of J_wb because J_wb has changed to include the regularization expression of (lambda/2)* Sum over n of w_j

b does not have to be regularized and remains the same for updated b

updated w = (1/m) * Sum over m of (f_wb(xi)-yi)x_j**i + (lambda/m)w_j

Question 14

Q

What changes with regularization for logistic regression in terms of gradient descent and cost function?

Answer

A

cost function of logistic regression gets added the following term:
+ (lambda/2m) * Sum over n of w_j²

gradient descent for w derivative gets added:
+ (lambda/m) * w_j
-> similiar to regularized linear regression but f(x) is a different function

b will not be regularized

Overfitting and Regularization Flashcards

(14 cards)