In Hypothesis Testing, what is the null hypothesis?
Null Hypothesis H_0 states the assumption to be tested
What are the two types of error in hypothesis testing
Type I Error “false positive”: Rejecting H_0 when it is in fact true
Type II Error “false negative”: Failing to reject H_0 when it is in fact false
Give the formulas for misclassification rate and accuracy in Classifier Accuracy
MR = # incorrect predictions /
total predictions
ACC = # correct predictions /
total predictions
What do confusion matrices summarise
the performance of an algorithm when compared with the real classes
Give the formulas for ACC, TPR, FPR, TNR
ACC = TP+TN / TP+FP+TN+FN
TPR = TP / TP + FN
FPR = FP / FP + TN
TNR = TN / FP + TN
Give the formulas for precision and recall
A = Retrieved and relevant
B = Retrieved results
C = Relevant Results
Precision = A / A+B
Recall = A / A+C
What is a Precision-Recall (PR) curve used for?
Give the formulas for Balance Accuracy Rate (BAR) and Balance Error Rate (BER)
BAR: Mean of TPR and TNR
BER: Mean of FPR and FNR
Give the formula of the F1 Measure
2 * Precision * Recall /
Precision + Recall
What is a decision threshold and what is its most common value?
the value (theta) used to discriminate when selecting between a positive and negative outcome.
Most common value = 0.5
What does a ROC plot visualise?
How the TPR and FPR change over many different thresholds.
What is overfitting
Model is fitted too closely to the training data (including its noise). The model cannot generalise to situations not presented during training, so it is not useful when applied to unseen data
Possible causes of overfitting
What is peeking and what can you use to avoid it
When the performance of a model is evaluated using the same data used to train it.
Avoid peeking by using a hold-out set
What are some drawbacks to using a random split as the hold-out strategy
Briefly explain the steps of k-Fold Cross Validation
What is the validation set in the Three-Way-Hold-Out Strategy?
The subset of examples used to tune the classifier
What is the main advantage of the Three-Way-Hold-Out Strategy?
It avoids a bias in evaluation of the model