What aspects are important for model evaluation?
Which focus have metrics for model evaluation?
What is the confusion matrix?
In case of credit card fraud, FN and FP would be unsatisfactory.
What is the formula for accuracy?
(TP + TN) / (TP + TN + FP + FN)
correct predictions / all predictions
What is the formula for error rate?
1 - accuracy
Describe the class imbalance problem
# negative examples = 9990 # positive examples = 10 -> if model predicts all records negative, the accuracy is 99.9% --> Accuracy is misleading because it does not detect any positive example
How can you mitigate the class imbalance problem?
What is the precision performance metric?
p = TP / ( TP + FP)
Question: How many examples that are classified positive are actually positive?
-> False Alarm rate
What is the recall performance metric?
r = TP / (TP + FN)
Question: Which fraction of all positive examples is classified correctly
-> Detection rate
In which case is precision and recall problematic?
Consequence:
We need a measure that
1. combines precision and recall and
2. is large if both values are large
Explain the F1-Measure
Formula:
(2pr) / (p +r)
What is the low threshold for the F1-Measure Graph
What is the restrictive threshold for the F1-Measure graph?
What alternative performance metric can be used if you have domain knowledge?
What is a ROC curve?
How is a ROC curve drawn?
How to you interpret a ROC curve?
What is to be considered to obtain a reliable estimate of the generalization performance (methods for model evaluation)
What data set splitting approaches do you know?
What does the learning curve describe?
-> If low model performance, get more training data (use labeled data rather for training than testing)
Problem: Labeling additional data is often expensive due to manual effort
Describe the Holdout method.
-> Use stratified sample to apply random sampling
What is stratified sampling?
Describe the random subsampling method
Problem:
Explain cross validation