a) What does supervised learning mean?
b) What is the diff between a generative algo and a discriminative algo?
c) What is the focus of supervised learning?
d) What are their main tasks?
a)
model is given labels
b)
generative algo –> unsupervised (clustering)
discriminative algo –> supervised (classification)
c)
focus on predictions
d)
learn from dataset a mapping of features X onto Y (predictions)
What is the diff between binary and multi-class classification?
Binary classification
- only 2 types of predictions
Multi-class
- multiple types of predictions
a) What are regression metrics?
b) What do MSE or MAPE indicate? (what does a low / high MSE mean?)
a)
numerical summaries that evaluate the quality of a model that predicts continuous outcomes
b)
Mean squared error (MSE)
- avg value over sample of prediciton error
- lower MSE = better fit
Mean absolute percentage error
- relative errors
a) What is the main objective in a classification setting?
b) What does conditional probability indicate?
c) define the classification rule
Classification goal: build classifier that assignes a class label J to unlableed observation X
a)
min mean classification error (MCE)
b)
conditional probability indicates whether a pixel belongs to a categorical class
c)
Classification rule –> function that maps an input feature to a class label by implementing the decision a classifier makes given the model’s outputs.
Define TP, TN, FN, and FP
True Positive:
Model thinks P1, is actually P1
True Negative
Model thinks not P1, is actually not P1
False Negative
Model thinks not P1, is actually P1
False positive
Model thinks P1, is actually not P1
a) What does a confusion matrix indicate?
b) What are common metrics that are derived from the confusion matrix?
a)
diagonal (top left to bottom right) : frequencies of correct matches
diagnoal (bottom left to top right) :
frequencies of incorrect matches
b)
Accuracy:
(TP + TN) / (TP + FP + TN + FN)
Errors:
- false positive error
- false negative error
How do we calculate accuracy for a multi-class classification purpose?
subset accuracy = (1/N) ∑ I(ŷi = yi)
yi = true label for ith instance
ŷi = predicted label for ith instance
If the model correctly identifies 80% of tumours but falsely labels 10% of healthy cases as tumours, what does the 10% indicate?
False Positive
a) Name metrics that define the performance of a classifier
b) When can the F1-score be a preferred metric to show a classifier’s performance?
a)
Precision = TP / (TP + FP)
- ability of classifier not to label as positive a unit that is neg
Recall = TP / (TP + FN)
- ability of classifier to find all the positive units and to avoid false neg
Specificity = TN / (TN + FP)
b)
F1 = (1 + β^2) (precision x recall) / (β^2 precision + recall)
preferred for class imbalanced problems
high F1 scores –> sufficiently high values for both precision and recall
a) What are 2 strategies to deal with class imbalance in the dataset?
b) What does the SMOTE technique do?
a)
b)
Synthetic minority oversampling technique SMOTE
- synthetically generates data for the minority class
- looks at euclidiean dist of minority data class and adds new data int he middle
a) Explain the sensitivity-specificity trade-off
b) What does the ROC curve indicate?
c) How can we use and interpret them? (interpret AUC values)
a)
sensitivity and specificity are funcs of the threshold probability (p*)
p(x) = Prob(y =1|x) > p&
for diff p*, diff sensitivity / specificity pairs
b)
ROC measures global accuracy of a classifier
- diagonal line: model just guesses
- furthest away from diagonal line in the top left part of the graph –> model has good predictive ability
c)
AUC –> overall measure of accuarcy based on the sensitivity / specificity trade-off
large AUC = better classifier
What are the common approaches for calculating AUC in a multi-class classification setting?
a) Define how OVO and OVA are performed and what are the pros and cons
a)
1. one vs one (OVO) approach:
- avg of pairwise AUC scores
if prediction contains C classes:
- avg over C(C-1) / 2 AUC scores
if prediction contains C classes:
- avg over C AUC scores
Pro: less computation
Con: not good for images w/ weird shapes –> can contain null regions where pixel does not belong to any class
a) What does cross entropy mean?
b) How do you interpret it?
a)
measure to evaluate the probability outputs of a classifier
b)
low –> model assigns a high probability to the correct class
high –> model assigns a low probability to the correct class
a) What are common pitfalls in model evaluation in terms of non-representative test sets?
b) What does distribution shift mean?
c) What does hidden subclasses mean?
d) What does tuning to the test set mean?
a)
1. non-representing test set
- distribution shifts
- hidden subclasses
b)
data the model sees during testing differs from the data it was trained on
- joint dist of inputs and labels changes between training and testing
- model assumes training and test data are drawn from the same dist
c)
finer-grained groups inside a labelled class that the model treats as one class but behaves diff
- labelled class may contain multiple unlabelled subclasses
- since they’re not annotated separates , overall accuracy can mask poor performances on rare but important subgroups
d)
over-optimism about how the model will generalize to unseen data
- holdout test should only be used once (validation sets should be used instead)
a) When do we do cross-validation?
b) How does k-fold cross-validation work?
c) When do we use LOOCV?
a)
used for testing error estimation and hyper parameter tuning
b)
1. divide dataset into K num of folds
2. leave out one fold (used as test set)
3. fit model to the other folds
4. repeat steps 1 to 3 such that each fold was once used as a test set
c)
Leave-one-out Cross validation (LOOCV)
- one sample is left out for validation
- (repeat for each sample such that each sample acts as validation)
- used for small datasets or in special cases (# folds = # of instances in dataset)