- Mean Average Precision - Useful for evaluating ranking algoirthms

- Area under the ROC curve - summarizes a classification model's ability to distinguish between classes - an AUC of 0.5 indicates a random guess - a perfect model would have a AUC of 1.0

- A residual is the difference between the target and the predicted value - A residual plot is a a histogram of residuals

- Used to measure accuracy of regression problems (not classification)

- visualize how data is distributed across an axis - min, max, median, first quartile, third quartile

- Receiver Operating Characteristic (ROC) curve - graph of TP rate against FP rate at various thresholds - shows the diagnostic ability of a binary classifier system as its discrimination threshold is varied

- Mean Absolute Percentage Error - used to evaluate regression models

Evaluation Metrics Flashcards by Katie D

MAP

Mean Average Precision
Useful for evaluating ranking algoirthms

How well did you know this?

Not at all

Perfectly

ROC-AUC

Area under the ROC curve
summarizes a classification model’s ability to distinguish between classes
an AUC of 0.5 indicates a random guess
a perfect model would have a AUC of 1.0

How well did you know this?

Not at all

Perfectly

Correlation matrix

Shows the correlation coefficient between variables

How well did you know this?

Not at all

Perfectly

Residual plots

A residual is the difference between the target and the predicted value
A residual plot is a a histogram of residuals

How well did you know this?

Not at all

Perfectly

Positive residul

The model is over estimating the target

How well did you know this?

Not at all

Perfectly

Negative residual

The model is under estimating the target

How well did you know this?

Not at all

Perfectly

What does a residual plot in a bell shape centered on zero indicate?

The model makes mistakes in a random manner and does not systematically over-predict or under-predict

How well did you know this?

Not at all

Perfectly

RMSE

Used to measure accuracy of regression problems (not classification)

How well did you know this?

Not at all

Perfectly

Can a confusion matrix be used to evaluate multi-class classification model?

The metric is calculated for each class by treating it as a binary classification problem after grouping all the other classes as belonging to the second class.
Then the binary metric is averaged over all the classes to get either a macro average (treat each class equally) or weighted average (weighted by class frequency) metric.
In Amazon ML, the macro average F1-measure is used to evaluate the predictive success of a multiclass classifier.

How well did you know this?

Not at all

Perfectly

Box plot

visualize how data is distributed across an axis
min, max, median, first quartile, third quartile

How well did you know this?

Not at all

Perfectly

Box plot positive skew

median lies closer to the first quartile
whisker at the upper end is longer
has positive (right) skew

How well did you know this?

Not at all

Perfectly

Box plot normal distribution

median is in the center
whiskers are the same length

How well did you know this?

Not at all

Perfectly

Box plot negative skew

median lies closer to the third quartile.
if the whisker at the lower end is longer, then it has negative (left) skew

How well did you know this?

Not at all

Perfectly

ROC

Receiver Operating Characteristic (ROC) curve
graph of TP rate against FP rate at various thresholds
shows the diagnostic ability of a binary classifier system as its discrimination threshold is varied

How well did you know this?

Not at all

Perfectly

MAPE

Mean Absolute Percentage Error
used to evaluate regression models

How well did you know this?

Not at all

Perfectly

What do the whiskers of a box plot mean?

Study These Flashcards

They indicate the minimum and maximum values excluding outliers

k-fold Cross-validation

Study These Flashcards

is a technique for evaluating ML models by training several ML models on subsets of the available input data and evaluating them on the complementary subset of the data. Use cross-validation to detect overfitting, i.e. failing to generalize a pattern.

Transfer learning

Study These Flashcards

Initialize the network with pre-trained weights in all layers except for the output layer. Initialize the output layer with random weights.
Then, the whole network is fine-tuned with new data. In this mode, training can be achieved even with a smaller dataset. This is because the network is already trained and therefore can be used in cases without sufficient training data.

What transformation should you use to address positive skew?

Study These Flashcards

Logarithmic transformation

What transformation should you use to address negative skew?

Study These Flashcards

Third-order polynomial transformation

How do you select the optional k paramter?

Study These Flashcards

Choose the elbow in the distortion curve

MICE

Study These Flashcards

Multiple Imputations by Chained Equations
advanced statistical method often used in machine learning to handle missing data
involves creating multiple imputed datasets by using chained equations, where each feature with missing data is modeled based on the other features in the dataset

When is recall most relevant?

Study These Flashcards

when trying to minimize false negatives

When is AUC most relevant

Study These Flashcards

when trying to visualize the tradeoff between false positives and false negatives

When is precision most relevant?

TP / TP + FP when trying to minimize false positives

F1 Score

SMOTE

Technique for imbalanced data sets (cancer, fraud) SMOTE is a method of oversampling that creates synthetic samples for the minority class. If you have a dataset that is not fully populated, you can use SMOTE to add new information by adding synthetic data points to the minority class. SMOTE adds more diversity to the dataset. Therefore, this method helps to reduce overfitting and enhances the model's accuracy.

precision@k

for a recommendation system, the proportion of recommended items that are actually purchased

normalized discounted cumulative gain

a metric used to evaluate the quality of a ranking system, like a search engine or recommendation algorithm. It measures the relevance of results, giving higher importance to the position of relevant items at the top of the list by dividing the Discounted Cumulative Gain (DCG) by the Ideal Discounted Cumulative Gain (IDCG). A score of 1 indicates a perfect ranking, while a score of 0 indicates a completely irrelevant one.

what's a problem with ROC-AUC?

- Can look great even if the classifier is completely useless for highly imbalanced datasets - use PR-AUC

PR-AUC

- area under the precision vs recall curve - used to evaluate performance of binary classifier - useful for highly imbalanced datasets where the positive class is rare - summarizes the model's ability to correctly identify positive instances

Recall

- aka True Positive Rate - TP / (TP + FN) - how well does the model identify actual positives - High TPR means the model is good at catching most of the positive cases

Specificity

- True negative rate TNR = TN / (TN + FP) - Measures the % of actual negatives identified correctly - Useful in evaluating models where false positives carry serious implications

False negative rate

FNR = FN / (FN + TP) - % of positives that are missed - reducing FNR is critical for healthcare and fraud detection

What does it mean to vary the threshold in AUC-ROC

- changes the score at which a score is considered positive - anything above 0.5, 0.8, 0.99 is considered spam

Evaluation Metrics Flashcards

(35 cards)