Initialization and regularization Flashcards

Question 1

Q

why the way we initialize is important

Answer

A

because of the problem of exploding or vanishing gradients. If the gradients are larger than 1 the gradient will be amplified

if under 1 the issue is the same but the gradients will be vanishing and learning difficult

we want to maintain the gradients consistent across the network

when the gradient is big then there is learning and gradient but the magnitudes are so large that is difficult to differentiate the different paths

Question 2

Q

what is the goal of an initialization?

Answer

A

the general goal is to initialize the weights such that the variance of the activations are the same across every layer. this constant variance helps prevent the gradient from exploding or vanishing

Question 3

Q

how can we help derive our initializations values to avoid exploding or vanishing gradients?

Answer

A

the can simplify to the following assumptions so we can have a similar variance per layer
1. weights and inputs centered at zero
2. weights and inputs are independent and identically distributed
3. biases are initialized as zeros

inputs also play in these assumptions, so they have to be normalized

Question 4

Q

what are 2 types of known initializations

Answer

A

he and xavier initialization

Question 5

Q

what is the xavier initialization

Answer

A

designed for the tanh activation function
utilizes assumption that tanh is approx linear for small inputs

Question 6

Q

what is the he initialization

Answer

A

designed for the ReLU activation function
also known as kaiming initialization
used by default by pyTorch for linear layer

Question 7

Q

what is overfitting

Answer

A

when we have high variance
also when train performance significantly better than test performance
validation loss increasing after a certain point

Question 8

Q

what is underfitting

Answer

A

high bias
train and test loss nearly identical
high loss, no signs of decreasing

Question 9

Q

how to combate overfitting

Answer

A

increase amount of data (because now we take more steps per epoch so more update per epoch)
utilize data augmentation methods
decrease model size/complexity (fewer hidden layers)
implement early stopping (we stop at the moment at which cost was low)
utilize regularization techniques

Question 10

Q

how to combate underfitting

Answer

A

increase model size and complexity
train for a longer period of time
reduce regularization techniques

Question 11

Q

What are in simple regularization techniques?

Answer

A

Regularization = “make overfitting harder.” It’s any trick that nudges a model to generalize better rather than memorize the training set.

Question 12

Q

what is L1 and L2 regularization

Answer

A

they change the weights
L1 generally pushes many values to 0 (sparser weights)
L2 generally more heavily impacts larger values and doesn’t force many values to 0

Question 13

Q

what is the other name of L2 regularization

Answer

A

weight decay

Question 14

Q

how does dropout work

Answer

A

with some specified p, drop any given neuron of the neural network with probability p during a training iteration
do this for each neuron independently
avoids single neurons becoming overly important and helps equalize capacity/power of each neuron

Question 15

Q

in which stage of development of the neural network we apply dropout

Answer

A

only during training, not during testing
during testing all neurons are turned on and dropout is disabled

Question 16

Q

what is the main help of dropout

Answer

Study These Flashcards

A

it helps with overfitting mainly by making the model more generalizable. Doesn’t improve model’s capacity, rather focuses on generalizability of the model

Question 17

Q

Answer

Study These Flashcards

A

Initialization and regularization Flashcards

(17 cards)