In what relation are gradient descent and cost function
A
gradient descent tries to minimize the cost function by calculating partial derivatives of w and b
cost function calculates the cost / error of given parameters i.e how large the discrepancy between target value and prediction are
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
How to make sure that gradient descent is working well?
A
as the goal of gradient descent is to minimize the cost function one way to double check gradient descent is working is plotting the cost function J(w,b) over the iterations of simultaneous updates
J(w,b) on y-axis and # iterations on x
should start high and flatten over iterations -> learning curve
J should decrease after every iteration -> if not, probably the learning rate is to high
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
What are considerations when choosing a learning rate for your model?
A
if learning curve (j(w,b) over iterations) shows sometimes rising cost -> could mean bug in the code / learning rate too large
debugging tip: with small enough learning rate alpha cost function should decrease on every iteration
try alpha till you find it to be too small and then to find it when it is too big