What is a pooling layer? What does it do?
Would be good practice after every pooling layer multiply features by 2
For Neural Network against Overfitting Ridge regularization or Dropout? Or early stopping?
stopping?
* Answer: Dropout
* Explanation:
o CNN: weights are organized in filter kernels that slide across the input image to extract feature -> filters shared across different spatial locations of the input.
o Ridge regularization: adding a penalty term to each individual weight in the network.
o Ridge regularization: disrupt the sharing of weights and the spatial relationships captured by the filters.
o Adding penalty to each weight independently: potentially altering the balance and importance of the shared weights; regularization penalty may affect the weights differently at different spatial locations -> undermine the shared knowledge encoded in the weights; Encourages weights to be small and sparse
Why dropout is preferred over early stopping in a Neural Network?
o “Neurons trained with dropout cannot co-adapt with their neighboring neurons; they have to be as useful as possible on their own. They also cannot rely excessively on just a few input neurons; they must pay attention to each of their input neurons. They end up being less sensitive to slight changes in the inputs. In the end, you get a more robust network that generalizes better.”
o a unique neural network is generated at each training step. Since each neuron can be either present or absent, there are a total of 2N possible networks (where N is the total number of droppable neurons). This is such a huge number that it is virtually impossible for the same neural network to be sampled twice. Once you have run 10,000 training steps, you have essentially trained 10,000 different neural networks (each with just one training instance). These neural networks are obviously not independent because they share many of their weights, but they are nevertheless all different. The resulting neural network can be seen as an averaging ensemble of all these smaller neural networks.”
how would you handle underfitting using dropout? = what should you do if you have dropout and you are underfitting
Why use both early stopping & dropout rate? Is good to use both at same time?
What would I choose for image dimension reduction PCA and SVD (really deep question about the underlying reasons)
PCA – how are the PC’s being calculated. How does it work?
How to find optimal K in KMeans
Why is RNN better than ARIMA for time series data?
What is TLU?
AdaBoost: Adaptive Boosting
o focuses on difficult-to-classify samples by assigning higher weights to them during training
o Goal: By iteratively adjusting the weights and training weak learners -> improve the overall model performance and handle complex classification tasks
o Able to handle imbalanced data
o Steps
Initlaize weights (to all training samples same)
Train week lerner e.g. decision tree
Compute error: sum of weights for misclassified samples
Updata weights
Repeat until desired performance achieved or fiexed iterations
Aggregate predictionns: combine predictions of all weak learners by assigning weight to theire predictions based on performance during training
Final prediction: the weighted vote or average of week learners prediction
Can you do multiple class classification with a SVM
what are the autoencoders (how they are working and what types there are)
Why we used Colab and not UCloud?
Cross-Validation: Effects & How to pick number of folds