Gradient Descent - Definition
iterative optimization algorithm used to minimize a cost function to find the optimal values of the parameters (weights) in a machine learning model
Gradient Descent - Steps
Gradient Descent - Step Size
Gradient Descent - Con
Gradient
= derivative of a function that has > 1 input variable
Stochastic GD
= solution to reduce computational cost: randomly picks one dp at each iteration to reduce computations -> n = smaller
Batch & Mini Batch GD
Stochastic GD - pro
Stochastic GD - Con
Regularization
Methods to prevent overfitting
Regularization - Methods with minimizing loss function + penalty
Regularization - Methods with minimizing loss function + penalty - Techniques
L0 regularization
penalty is one number (lambda if w not 0)
L1 Regularization (Lasso)
L2 Regularization (Ridge)
L2 & L1 Regularization - Similarity
L2 & L1 Regularization - Difference
Early Stopping
= technique to prevent overfitting by monitoring models performance on validation set during training & stopp at lowest validation error (training error can go further down)
Early Stopping - Method
Dropout
= technique to prevent overfitting by randomly dropping out (i.e., setting to zero) a proportion of the output features or activations of a layer during training – for deep neural networks
Dropout - Method
Dropout - Time-Effort-trade-off
well-tuned dropout -> slows convergence but better model
Batch Norm
technique used to mitigate the negative effects of the internal covariate shift by stabilizing distribution of inputs (over a minibatch) to each layer during training