How to avoid overfitting and find the best hyper parameters’ combination?
Split data methods:
1. Holdout: train-validation-test data sets;
2. Cross-validation: k-fold cross validation;
what is cross validation - k fold
The KCV
consists in splitting a dataset into k subsets; then, iteratively, some of them are
used to learn the model, while the others are exploited to assess its
performance.
What is hyper parameter optimisation
❑ Hyperparameters influence model performance and generalisation.
❑ Examples: Learning rate, batch size, number of layers, activation functions etc.
❑ Goal: Find the best combination to maximise validation accuracy and minimise overfitting.
What is grid search
❑ Exhaustive search over a manually specified set of hyperparameters.
❑ Evaluates all possible combinations systematically.
Give an example of grid search: If learning rate = [0.01, 0.001] and optimiser = [adam, sgd]
Grid Search will test 2 × 2 = 4 combinations.
What are the pros of grid search
❑ Systematic.
❑ Suitable for small hyper parameter spaces.
What are the cons of grid search
❑ Computationally expensive as dimensions increase.
❑ Not suitable for high-dimensional or continuous spaces.
What is random search
❑ Randomly samples hyperparameters from specified distributions.
❑ Number of trials is predefined.
❑ More efficient as it avoids exhaustive combinations.
Exampe random search: With 10 trials, Random Search might sample:
(0.01, adam), (0.001, sgd), (0.01, sgd), …
What are the pros of random search
❑ More efficient than Grid Search in high dimensions.
❑ Works well with continues hyperparameters and larger hyperparameter
space.
What are the cons of random search
❑ No guarantee of finding the absolute best combination.
❑ Performance depends on the number of trials.
Grid search vs random search vs advanced methods
❑ Grid Search: Thorough but expensive. Use for small parameter sets.
❑ Random Search: Efficient for large or continuous spaces.
❑ Consider advanced methods (e.g., Bayesian) for complex problems.
What does a classification model aim to provide
A classification model aims to provide the
current label for each input.
* Binary classification.
* Multi – class.
* Output is commonly the probability of an
input belonging to a class.
What is the formula for accuracy
Accuracy = (TP + TN)/ (TP + TN + FP + FN)
Why is accuracy not the best classification metric
While 91% accuracy may seem good at first glance, another tumor-classifier model that always predicts benign would
achieve the exact same accuracy (91/100 correct predictions) on our examples.
Accuracy alone doesn’t tell the full story when you’re working with a class-imbalanced data set, like this one, where
there is a significant disparity between the number of positive and negative labels.