Lesson 6 - Learning & Optimization Flashcards

Question 1

Q

Which optimizations can you do prior-training?

Answer

A

data augmentation
input normalization
Xavier/Glorot initialization of weights

Question 2

Q

Which optimizations can you do during training?

Answer

A

Dropout
Batch Normalization

Question 3

Q

Which optimization can you do when computing the loss?

Answer

A

training with weighted examples
focal loss: training with examples of different complexity
triplet loss: learning representations by comparison
using multiple loss functions: MinMaxCAM

Question 4

Q

How can we optimize the training procedure (while searching for the best solution)

Answer

A

By having a variable learning rate

Question 5

Q

What is input normalization?

Answer

A

It is a prior-training optimization

remove the mean image
standardize the input (dividing by standard deviation)

Question 6

Q

To what problem is the Xavier/Glorot initialization a solution?

Answer

A

When initializing the weights of the network, the common practice was to initialize randomly from a normal distribution.
The problem: large variance - var(z)

Question 7

Q

What was the Xavier/Glorot solution?

Answer

A

Make the weights smaller by doing var(z) = 1/n
Therefor
weight_i = weight_i x sqrt(1/n)

Question 8

Q

What problem does using dropout tackle?

Answer

A

The decrease dependence of a given feature

Question 9

Q

What is batch normalization?

Answer

A

It is an optimization during training technique.

–> Normalize internal activation by considering dataset statistics
–> stochastic optimization - batch-level statistics

Question 10

Q

What problem does batch normalization tackle?

Answer

A

During training, updates on weights at a later layer should take into account changes at earlier layers (covariance shift)
-> introduce changes in the distribution of internal activations
-> requires careful initialization and a small learning rate

Question 11

Q

What are the benefits of using batch normalization?

Answer

A

Less sensitivity to initialization
Allows using larger learning rates (faster training)

Question 12

Q

What could be a potential problem/weakness with the gradient descent as how we have seen it so far? And how do we tackle it?

Answer

A

If 80% of the examples are from one class then the model will learn the important features of that class, this is because the update process of the weights is dominated by the majority of examples

Tackle this by having weighted examples

Question 13

Q

What does Focal Loss do?

Answer

A

down-weights the loss from well-classified examples
focusses training on sparse set of hard examples

Question 14

Q

What problem does focal loss tackle?

Answer

A

When the dataset is balanced (so 50-50 for example) but some class has more difficult features to learn (more details, more small/fine points)

Question 15

Q

Where could focal loss be usefull?

Answer

A

dense prediction tasks
in the presence of outliers

Question 16

Q

How does focal loss work?

Answer

Study These Flashcards

A

It is an extra parameter that will increase the loss for examples that are harder to classify, and therefor forcing the model to train on those examples.

It is based on the probability that the model guessed the label correctly. The higher that value, the less influence this focal loss has. So for very uncertain examples, the loss is high and the model will be pushed on them

Question 17

Q

What is Triplet Loss?

Answer

Study These Flashcards

A

given three examples: Archor, Positive, Negative
learn a representation that distance(positive, anchor) < distance(negative, anchor)

Question 18

Q

How is triplet loss different?

Answer

Study These Flashcards

A

With normal loss we compare prediction to ground truth (original label)

With triplet loss, we use three examples, and compare distance.

Anchor and positive should share the same class

Question 19

Q

What is the idea behind using multiple loss functions?

Answer

Study These Flashcards

A

Object localization

–> regularize a high-performing classifier to enable localization

Question 20

Q

Why would we opt to use a variable learning rate? (Annealing)

Answer

Study These Flashcards

A

As training progresses, taken steps might be to large to reach the optimum (when using a fixed learning rate)

Question 21

Q

In self-supervised learning, we have the problem that data annotation is expensive. What could be a solution to this?

Answer

Study These Flashcards

A

Supervise using labels generated from data (without manual annotation)

Lesson 6 - Learning & Optimization Flashcards

(21 cards)