6.3 Regularization and Architecture Flashcards

Question 1

Q

a) Why is regularization needed?
b) What does regularization try to improve? (name at least 3 and know how they perform)

Answer

A

a)
used to reduce overfitting

b)
improves generalization

ex, L1, L2, dropout, batch normalization, data augmentation, early stopping

Question 2

Q

a) What is the main difference between l1 and l2 regulations?

b) What does elastic net mean?

Answer

A

a)
L1 (Lasso)
- reduces value of model weights closer or equal to zero
- suitable for cases where feature selection is desirable

(get rid of small weights and sets them to zero)

L2 (ridge regularization)
- pushes the weight to decrease (not equal to zero)
- suitable for datasets with severe local min (high-dim models w/ correlated features)

(reduces effects of weights and reduces complexity of model)

b)
- hybrid regularization
- balances feature selection and weight shrinkage

Question 3

Q

a) What does drop out mean?

b) How does it help with the regularization goals (reducing overfitting)

c) Is dropout performed during inference (forward path)

Answer

A

a)
randomly deactivates a fraction of the units w/ prob of p
- no weight updates are made on backward pass

b)
making less neurons active –> prevents complex co-adaptions between units –> prevents overfitting

c)
No, dropout is only active during training
- during inference the dropout is turned off and the network uses all neurons

Question 4

Q

a) What does batch learning mean?

b) What does epoch mean?

Answer

A

a)
initial dataset is randomly split into diff batches and trained sequentially in every single batch

b)
one complete sweep over the entire dataset

iteration per epoch: training sample / batch size

Question 5

Q

What does batch normalization mean?

Answer

A

(decrease training w/ less num of epochs while keeping the same accuracy)

normalization for each mini-batch and back propagating the gradients through the normalization params

ex.
1. normalize each unit in layer l
2. scaled and shfited values are fed into the following layer as input

Applications:
- greatly shorten training time and reduce need for dropout

Question 6

Q

a) Why is data augmentation considered a regularization technique?

b) Name a few augmentation methods

Answer

A

a)
Data augmentation reduces overfitting and improves generalization

how? enriching the training set with transformed versions of examples –> model must perform well across plausible variations rather than memorization exact inputs

b)
1) flip and rotation
2) translation
- scaling and cropping
- shearing

Question 7

Q

a) What is early stopping?

b) How does it improve training?

Answer

A

a)
stopping the model’s training before it reaches the lowest training error

b)
any more decrease in training error may be due to overfitting (memorizing data points)

return the set of parameters at the time it stopped –> model with have low variance and better generalization

Question 8

Q

a) What does hyperparameter mean?

b) How can they be chosen?

Answer

A

a)
configuration value set before training
- controls model structure or learning process

Model specific: num of layers, neurons, kernel size, activation funciton

training specific: learning rate, batch size, epochs, optimizers

regularization: l1/l2 coef

b)
Tuning methods:

manual search (adjust based on observation)
grid search (create grid of all possible combinations –> train and evaluate model for each combination
random search (samples random values within a predefined range)

Question 9

Q

What is the benefit of batch normalization in training neural networks

Answer

A

stabilizes training process and allows for higher learning rates
reduces internal covariate shift by normalizing layer inputs
improves convergence speed and reduces sensitivity to weight intializations

Question 10

Q

Describe common CNN architectures

Answer

A

Alexnet
- used to solve image classification on the imagenet database
- features: dropout, Relu, overlapping max-pooling layers

VGG
- goal: effect of increasing depth of CNNs on large-scale image recognistion tasks
- VGG 16 –> 16 layers
- VGG 19 –> 19 layers

Question 11

Q

a) Describe the overall structure of Unet

b) What happens through contracting part?

c) What does the symmetrical architecture of Unet do?

d) What do skip connections do?

Answer

A

a)
symmetrical structure (u-shaped)

b)
sequence of convolutions and max-polling operation –> spatial contraction of image

encoder: extracts features through convolution and pooling (downsampling)

c)
adds an expansive bath after contracting path

decoder: performs upsampling to reconstruct the image

d) skip connections

concatenates the low-level feature maps in the encoding path and high-level features in the decoding path
- restores lost spatial information

Question 12

Q

What are unet applications and limitations?

Answer

A

Advantages:
i) works well with limited training data
- skip connections: preserve fine detail
- fully convolutional (efficient for segmentation)

ii) adapted for many fields beyond medical imaging

limitations:
high memory consumption

6.3 Regularization and Architecture Flashcards

(12 cards)