a) Why is regularization needed?
b) What does regularization try to improve? (name at least 3 and know how they perform)
a)
used to reduce overfitting
b)
improves generalization
ex, L1, L2, dropout, batch normalization, data augmentation, early stopping
a) What is the main difference between l1 and l2 regulations?
b) What does elastic net mean?
a)
L1 (Lasso)
- reduces value of model weights closer or equal to zero
- suitable for cases where feature selection is desirable
(get rid of small weights and sets them to zero)
L2 (ridge regularization)
- pushes the weight to decrease (not equal to zero)
- suitable for datasets with severe local min (high-dim models w/ correlated features)
(reduces effects of weights and reduces complexity of model)
b)
- hybrid regularization
- balances feature selection and weight shrinkage
a) What does drop out mean?
b) How does it help with the regularization goals (reducing overfitting)
c) Is dropout performed during inference (forward path)
a)
randomly deactivates a fraction of the units w/ prob of p
- no weight updates are made on backward pass
b)
making less neurons active –> prevents complex co-adaptions between units –> prevents overfitting
c)
No, dropout is only active during training
- during inference the dropout is turned off and the network uses all neurons
a) What does batch learning mean?
b) What does epoch mean?
a)
initial dataset is randomly split into diff batches and trained sequentially in every single batch
b)
one complete sweep over the entire dataset
iteration per epoch: training sample / batch size
What does batch normalization mean?
(decrease training w/ less num of epochs while keeping the same accuracy)
ex.
1. normalize each unit in layer l
2. scaled and shfited values are fed into the following layer as input
Applications:
- greatly shorten training time and reduce need for dropout
a) Why is data augmentation considered a regularization technique?
b) Name a few augmentation methods
a)
Data augmentation reduces overfitting and improves generalization
b)
1) flip and rotation
2) translation
- scaling and cropping
- shearing
a) What is early stopping?
b) How does it improve training?
a)
stopping the model’s training before it reaches the lowest training error
b)
any more decrease in training error may be due to overfitting (memorizing data points)
a) What does hyperparameter mean?
b) How can they be chosen?
a)
configuration value set before training
- controls model structure or learning process
Model specific: num of layers, neurons, kernel size, activation funciton
training specific: learning rate, batch size, epochs, optimizers
regularization: l1/l2 coef
b)
Tuning methods:
What is the benefit of batch normalization in training neural networks
Describe common CNN architectures
Alexnet
- used to solve image classification on the imagenet database
- features: dropout, Relu, overlapping max-pooling layers
VGG
- goal: effect of increasing depth of CNNs on large-scale image recognistion tasks
- VGG 16 –> 16 layers
- VGG 19 –> 19 layers
a) Describe the overall structure of Unet
b) What happens through contracting part?
c) What does the symmetrical architecture of Unet do?
d) What do skip connections do?
a)
symmetrical structure (u-shaped)
b)
sequence of convolutions and max-polling operation –> spatial contraction of image
encoder: extracts features through convolution and pooling (downsampling)
c)
adds an expansive bath after contracting path
decoder: performs upsampling to reconstruct the image
d) skip connections
concatenates the low-level feature maps in the encoding path and high-level features in the decoding path
- restores lost spatial information
What are unet applications and limitations?
Advantages:
i) works well with limited training data
- skip connections: preserve fine detail
- fully convolutional (efficient for segmentation)
ii) adapted for many fields beyond medical imaging
limitations:
high memory consumption