Even with Xavier and Kaming initialization, it can occur by chance that the weights of a neural network are initialized in such a way that the network is unable to learn anything useful.
True
If a pre-trained model is used and no new weights are added, we do not need Xavier and Kaming initialization at all.
True
It is satisfactory to have the mean and variance of the distribution of output values average out to zero and one, respectively, across multiple initializations. In individual cases, these values may deviate.
True
Which tensors can be added to each other?
Same shape, not caring about commas or spaces
All standard weight operations can be expressed as matrix multiplications. This makes neural network operations so efficient when executed on GPUs.
True
Gradient of bias
2⋅(out−target)⋅1/n
Gradient of weight
Gradient of bias x input
Gradient of input
Gradient of bias x weight
Elementwise arithmetics for tensors
Kaming Initialization
When using a ReLU activation the scaling factor of √(2/n_input) preserves the standard deviation
Xavier Initialization
What is calculated during backpropagation?
Gradients
What gradients show?
Gradients indicate how the network should adjust its parameters to minimize the loss, not directly the quality of the network.
What is forward pass?
process of passing input data through the layers of a neural network to produce an output (e.g., predictions or logits).
What is backpropagation?
process of computing gradients of the loss function with respect to the network’s parameters (weights and biases) using the chain rule of calculus