Gradient descent with momentum
b is between 0 and 1
Gradient descent with Adagrad
Gradient descent with RMSProp
Gradient descent with Adam
t is the iteration index
Hessian of squared loss
Point-Specific Hessian of Squared Loss
Linear, SVM and logistic models with and without line search