Evaluation Metrics Flashcards

(35 cards)

1
Q

MAP

A
  • Mean Average Precision
  • Useful for evaluating ranking algoirthms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

ROC-AUC

A
  • Area under the ROC curve
  • summarizes a classification model’s ability to distinguish between classes
  • an AUC of 0.5 indicates a random guess
  • a perfect model would have a AUC of 1.0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Correlation matrix

A

Shows the correlation coefficient between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Residual plots

A
  • A residual is the difference between the target and the predicted value
  • A residual plot is a a histogram of residuals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Positive residul

A

The model is over estimating the target

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Negative residual

A

The model is under estimating the target

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does a residual plot in a bell shape centered on zero indicate?

A
  • The model makes mistakes in a random manner and does not systematically over-predict or under-predict
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

RMSE

A
  • Used to measure accuracy of regression problems (not classification)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Can a confusion matrix be used to evaluate multi-class classification model?

A
  • The metric is calculated for each class by treating it as a binary classification problem after grouping all the other classes as belonging to the second class.
  • Then the binary metric is averaged over all the classes to get either a macro average (treat each class equally) or weighted average (weighted by class frequency) metric.
  • In Amazon ML, the macro average F1-measure is used to evaluate the predictive success of a multiclass classifier.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Box plot

A
  • visualize how data is distributed across an axis
  • min, max, median, first quartile, third quartile
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Box plot positive skew

A
  • median lies closer to the first quartile
  • whisker at the upper end is longer
  • has positive (right) skew
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Box plot normal distribution

A
  • median is in the center
  • whiskers are the same length
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Box plot negative skew

A
  • median lies closer to the third quartile.
  • if the whisker at the lower end is longer, then it has negative (left) skew
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

ROC

A
  • Receiver Operating Characteristic (ROC) curve
  • graph of TP rate against FP rate at various thresholds
  • shows the diagnostic ability of a binary classifier system as its discrimination threshold is varied
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

MAPE

A
  • Mean Absolute Percentage Error
  • used to evaluate regression models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What do the whiskers of a box plot mean?

A

They indicate the minimum and maximum values excluding outliers

17
Q

k-fold Cross-validation

A

is a technique for evaluating ML models by training several ML models on subsets of the available input data and evaluating them on the complementary subset of the data. Use cross-validation to detect overfitting, i.e. failing to generalize a pattern.

18
Q

Transfer learning

A
  • Initialize the network with pre-trained weights in all layers except for the output layer. Initialize the output layer with random weights.
  • Then, the whole network is fine-tuned with new data. In this mode, training can be achieved even with a smaller dataset. This is because the network is already trained and therefore can be used in cases without sufficient training data.
19
Q

What transformation should you use to address positive skew?

A

Logarithmic transformation

20
Q

What transformation should you use to address negative skew?

A

Third-order polynomial transformation

21
Q

How do you select the optional k paramter?

A

Choose the elbow in the distortion curve

22
Q

MICE

A
  • Multiple Imputations by Chained Equations
  • advanced statistical method often used in machine learning to handle missing data
  • involves creating multiple imputed datasets by using chained equations, where each feature with missing data is modeled based on the other features in the dataset
23
Q

When is recall most relevant?

A

when trying to minimize false negatives

24
Q

When is AUC most relevant

A

when trying to visualize the tradeoff between false positives and false negatives

25
When is precision most relevant?
TP / TP + FP when trying to minimize false positives
26
F1 Score
27
SMOTE
Technique for imbalanced data sets (cancer, fraud) SMOTE is a method of oversampling that creates synthetic samples for the minority class. If you have a dataset that is not fully populated, you can use SMOTE to add new information by adding synthetic data points to the minority class. SMOTE adds more diversity to the dataset. Therefore, this method helps to reduce overfitting and enhances the model's accuracy.
28
precision@k
for a recommendation system, the proportion of recommended items that are actually purchased
29
normalized discounted cumulative gain
a metric used to evaluate the quality of a ranking system, like a search engine or recommendation algorithm. It measures the relevance of results, giving higher importance to the position of relevant items at the top of the list by dividing the Discounted Cumulative Gain (DCG) by the Ideal Discounted Cumulative Gain (IDCG). A score of 1 indicates a perfect ranking, while a score of 0 indicates a completely irrelevant one.
30
what's a problem with ROC-AUC?
- Can look great even if the classifier is completely useless for highly imbalanced datasets - use PR-AUC
31
PR-AUC
- area under the precision vs recall curve - used to evaluate performance of binary classifier - useful for highly imbalanced datasets where the positive class is rare - summarizes the model's ability to correctly identify positive instances
32
Recall
- aka True Positive Rate - TP / (TP + FN) - how well does the model identify actual positives - High TPR means the model is good at catching most of the positive cases
33
Specificity
- True negative rate TNR = TN / (TN + FP) - Measures the % of actual negatives identified correctly - Useful in evaluating models where false positives carry serious implications
34
False negative rate
FNR = FN / (FN + TP) - % of positives that are missed - reducing FNR is critical for healthcare and fraud detection
35
What does it mean to vary the threshold in AUC-ROC
- changes the score at which a score is considered positive - anything above 0.5, 0.8, 0.99 is considered spam