Evaluating ML/DL models Flashcards

Question 1

Q

How to avoid overfitting and find the best hyper parameters’ combination?

Answer

A

Split data methods:
1. Holdout: train-validation-test data sets;
2. Cross-validation: k-fold cross validation;

Question 2

Q

what is cross validation - k fold

Answer

A

The KCV
consists in splitting a dataset into k subsets; then, iteratively, some of them are
used to learn the model, while the others are exploited to assess its
performance.

Question 3

Q

What is hyper parameter optimisation

Answer

A

❑ Hyperparameters influence model performance and generalisation.
❑ Examples: Learning rate, batch size, number of layers, activation functions etc.
❑ Goal: Find the best combination to maximise validation accuracy and minimise overfitting.

Question 4

Q

What is grid search

Answer

A

❑ Exhaustive search over a manually specified set of hyperparameters.
❑ Evaluates all possible combinations systematically.

Question 5

Q

Give an example of grid search: If learning rate = [0.01, 0.001] and optimiser = [adam, sgd]

Answer

A

Grid Search will test 2 × 2 = 4 combinations.

Question 6

Q

What are the pros of grid search

Answer

A

❑ Systematic.
❑ Suitable for small hyper parameter spaces.

Question 7

Q

What are the cons of grid search

Answer

A

❑ Computationally expensive as dimensions increase.
❑ Not suitable for high-dimensional or continuous spaces.

Question 8

Q

What is random search

Answer

A

❑ Randomly samples hyperparameters from specified distributions.
❑ Number of trials is predefined.
❑ More efficient as it avoids exhaustive combinations.

Question 9

Q

Exampe random search: With 10 trials, Random Search might sample:

Answer

A

(0.01, adam), (0.001, sgd), (0.01, sgd), …

Question 10

Q

What are the pros of random search

Answer

A

❑ More efficient than Grid Search in high dimensions.
❑ Works well with continues hyperparameters and larger hyperparameter
space.

Question 11

Q

What are the cons of random search

Answer

A

❑ No guarantee of finding the absolute best combination.
❑ Performance depends on the number of trials.

Question 12

Q

Grid search vs random search vs advanced methods

Answer

A

❑ Grid Search: Thorough but expensive. Use for small parameter sets.
❑ Random Search: Efficient for large or continuous spaces.
❑ Consider advanced methods (e.g., Bayesian) for complex problems.

Question 13

Q

What does a classification model aim to provide

Answer

A

A classification model aims to provide the
current label for each input.
* Binary classification.
* Multi – class.
* Output is commonly the probability of an
input belonging to a class.

Question 14

Q

What is the formula for accuracy

Answer

A

Accuracy = (TP + TN)/ (TP + TN + FP + FN)

Question 15

Q

Why is accuracy not the best classification metric

Answer

A

While 91% accuracy may seem good at first glance, another tumor-classifier model that always predicts benign would
achieve the exact same accuracy (91/100 correct predictions) on our examples.

Accuracy alone doesn’t tell the full story when you’re working with a class-imbalanced data set, like this one, where
there is a significant disparity between the number of positive and negative labels.

Question 16

Q

Answer

Study These Flashcards

A

Evaluating ML/DL models Flashcards

(16 cards)