Prove
Prove
Crucially, we add a 0 vector to the bottom of the new Wi and a single zero to the new bi BUT Wi+1 adds a zero vector to the right side because the dim that increases is transposed
Regularisation has what effect on bias variance trade off
Slightly increases bias (improving generalisation)
But significantly decreases variance
L2 regularise this and then take grad of