6.3 Regularization and Architecture Flashcards

(12 cards)

1
Q

a) Why is regularization needed?
b) What does regularization try to improve? (name at least 3 and know how they perform)

A

a)
used to reduce overfitting

b)
improves generalization

ex, L1, L2, dropout, batch normalization, data augmentation, early stopping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

a) What is the main difference between l1 and l2 regulations?

b) What does elastic net mean?

A

a)
L1 (Lasso)
- reduces value of model weights closer or equal to zero
- suitable for cases where feature selection is desirable

(get rid of small weights and sets them to zero)

L2 (ridge regularization)
- pushes the weight to decrease (not equal to zero)
- suitable for datasets with severe local min (high-dim models w/ correlated features)

(reduces effects of weights and reduces complexity of model)

b)
- hybrid regularization
- balances feature selection and weight shrinkage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

a) What does drop out mean?

b) How does it help with the regularization goals (reducing overfitting)

c) Is dropout performed during inference (forward path)

A

a)
randomly deactivates a fraction of the units w/ prob of p
- no weight updates are made on backward pass

b)
making less neurons active –> prevents complex co-adaptions between units –> prevents overfitting

c)
No, dropout is only active during training
- during inference the dropout is turned off and the network uses all neurons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

a) What does batch learning mean?

b) What does epoch mean?

A

a)
initial dataset is randomly split into diff batches and trained sequentially in every single batch

b)
one complete sweep over the entire dataset

iteration per epoch: training sample / batch size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does batch normalization mean?

A

(decrease training w/ less num of epochs while keeping the same accuracy)

  • normalization for each mini-batch and back propagating the gradients through the normalization params

ex.
1. normalize each unit in layer l
2. scaled and shfited values are fed into the following layer as input

Applications:
- greatly shorten training time and reduce need for dropout

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

a) Why is data augmentation considered a regularization technique?

b) Name a few augmentation methods

A

a)
Data augmentation reduces overfitting and improves generalization

  • how? enriching the training set with transformed versions of examples –> model must perform well across plausible variations rather than memorization exact inputs

b)
1) flip and rotation
2) translation
- scaling and cropping
- shearing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

a) What is early stopping?

b) How does it improve training?

A

a)
stopping the model’s training before it reaches the lowest training error

b)
any more decrease in training error may be due to overfitting (memorizing data points)

  • return the set of parameters at the time it stopped –> model with have low variance and better generalization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

a) What does hyperparameter mean?

b) How can they be chosen?

A

a)
configuration value set before training
- controls model structure or learning process

Model specific: num of layers, neurons, kernel size, activation funciton

training specific: learning rate, batch size, epochs, optimizers

regularization: l1/l2 coef

b)
Tuning methods:

  1. manual search (adjust based on observation)
  2. grid search (create grid of all possible combinations –> train and evaluate model for each combination
  3. random search (samples random values within a predefined range)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the benefit of batch normalization in training neural networks

A
  1. stabilizes training process and allows for higher learning rates
  2. reduces internal covariate shift by normalizing layer inputs
  3. improves convergence speed and reduces sensitivity to weight intializations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe common CNN architectures

A

Alexnet
- used to solve image classification on the imagenet database
- features: dropout, Relu, overlapping max-pooling layers

VGG
- goal: effect of increasing depth of CNNs on large-scale image recognistion tasks
- VGG 16 –> 16 layers
- VGG 19 –> 19 layers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

a) Describe the overall structure of Unet

b) What happens through contracting part?

c) What does the symmetrical architecture of Unet do?

d) What do skip connections do?

A

a)
symmetrical structure (u-shaped)

b)
sequence of convolutions and max-polling operation –> spatial contraction of image

encoder: extracts features through convolution and pooling (downsampling)

c)
adds an expansive bath after contracting path

decoder: performs upsampling to reconstruct the image

d) skip connections

concatenates the low-level feature maps in the encoding path and high-level features in the decoding path
- restores lost spatial information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are unet applications and limitations?

A

Advantages:
i) works well with limited training data
- skip connections: preserve fine detail
- fully convolutional (efficient for segmentation)

ii) adapted for many fields beyond medical imaging

limitations:
high memory consumption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly