5.1 Supervised Learning Flashcards

Question 1

Q

a) What does supervised learning mean?

b) What is the diff between a generative algo and a discriminative algo?

c) What is the focus of supervised learning?

d) What are their main tasks?

Answer

A

a)
model is given labels

b)
generative algo –> unsupervised (clustering)
discriminative algo –> supervised (classification)

c)
focus on predictions

d)
learn from dataset a mapping of features X onto Y (predictions)

Question 2

Q

What is the diff between binary and multi-class classification?

Answer

A

Binary classification
- only 2 types of predictions

Multi-class
- multiple types of predictions

Question 3

Q

a) What are regression metrics?

b) What do MSE or MAPE indicate? (what does a low / high MSE mean?)

Answer

A

a)
numerical summaries that evaluate the quality of a model that predicts continuous outcomes

b)
Mean squared error (MSE)
- avg value over sample of prediciton error
- lower MSE = better fit

Mean absolute percentage error
- relative errors

Question 4

Q

a) What is the main objective in a classification setting?

b) What does conditional probability indicate?

c) define the classification rule

Answer

A

Classification goal: build classifier that assignes a class label J to unlableed observation X

a)
min mean classification error (MCE)

b)
conditional probability indicates whether a pixel belongs to a categorical class

c)
Classification rule –> function that maps an input feature to a class label by implementing the decision a classifier makes given the model’s outputs.

Question 5

Q

Define TP, TN, FN, and FP

Answer

A

True Positive:
Model thinks P1, is actually P1

True Negative
Model thinks not P1, is actually not P1

False Negative
Model thinks not P1, is actually P1

False positive
Model thinks P1, is actually not P1

Question 6

Q

a) What does a confusion matrix indicate?

b) What are common metrics that are derived from the confusion matrix?

Answer

A

a)
diagonal (top left to bottom right) : frequencies of correct matches

diagnoal (bottom left to top right) :
frequencies of incorrect matches

b)
Accuracy:
(TP + TN) / (TP + FP + TN + FN)

Errors:
- false positive error
- false negative error

Question 7

Q

How do we calculate accuracy for a multi-class classification purpose?

Answer

A

subset accuracy = (1/N) ∑ I(ŷi = yi)

yi = true label for ith instance
ŷi = predicted label for ith instance

Question 8

Q

If the model correctly identifies 80% of tumours but falsely labels 10% of healthy cases as tumours, what does the 10% indicate?

Answer

A

False Positive

Question 9

Q

a) Name metrics that define the performance of a classifier

b) When can the F1-score be a preferred metric to show a classifier’s performance?

Answer

A

a)
Precision = TP / (TP + FP)
- ability of classifier not to label as positive a unit that is neg

Recall = TP / (TP + FN)
- ability of classifier to find all the positive units and to avoid false neg

Specificity = TN / (TN + FP)

b)
F1 = (1 + β^2) (precision x recall) / (β^2 precision + recall)

preferred for class imbalanced problems
high F1 scores –> sufficiently high values for both precision and recall

Question 10

Q

a) What are 2 strategies to deal with class imbalance in the dataset?

b) What does the SMOTE technique do?

Answer

A

a)

Oversampling
- over sample the minority class
Under sample
- under sample the majority class

b)

Synthetic minority oversampling technique SMOTE
- synthetically generates data for the minority class
- looks at euclidiean dist of minority data class and adds new data int he middle

Question 11

Q

a) Explain the sensitivity-specificity trade-off

b) What does the ROC curve indicate?

c) How can we use and interpret them? (interpret AUC values)

Answer

A

a)
sensitivity and specificity are funcs of the threshold probability (p*)

p(x) = Prob(y =1|x) > p&

for diff p*, diff sensitivity / specificity pairs

b)
ROC measures global accuracy of a classifier
- diagonal line: model just guesses
- furthest away from diagonal line in the top left part of the graph –> model has good predictive ability

c)
AUC –> overall measure of accuarcy based on the sensitivity / specificity trade-off

large AUC = better classifier

Question 12

Q

What are the common approaches for calculating AUC in a multi-class classification setting?

a) Define how OVO and OVA are performed and what are the pros and cons

Answer

A

a)
1. one vs one (OVO) approach:
- avg of pairwise AUC scores

if prediction contains C classes:
- avg over C(C-1) / 2 AUC scores

one vs all (OVA) approach:
- avg of AUC scores for each class against all other classes

if prediction contains C classes:
- avg over C AUC scores

Pro: less computation
Con: not good for images w/ weird shapes –> can contain null regions where pixel does not belong to any class

Question 13

Q

a) What does cross entropy mean?

b) How do you interpret it?

Answer

A

a)
measure to evaluate the probability outputs of a classifier

b)
low –> model assigns a high probability to the correct class
high –> model assigns a low probability to the correct class

Question 14

Q

a) What are common pitfalls in model evaluation in terms of non-representative test sets?

b) What does distribution shift mean?

c) What does hidden subclasses mean?

d) What does tuning to the test set mean?

Answer

A

a)
1. non-representing test set
- distribution shifts
- hidden subclasses

tuning to test set

b)
data the model sees during testing differs from the data it was trained on
- joint dist of inputs and labels changes between training and testing
- model assumes training and test data are drawn from the same dist

c)
finer-grained groups inside a labelled class that the model treats as one class but behaves diff
- labelled class may contain multiple unlabelled subclasses
- since they’re not annotated separates , overall accuracy can mask poor performances on rare but important subgroups

d)
over-optimism about how the model will generalize to unseen data
- holdout test should only be used once (validation sets should be used instead)

Question 15

Q

a) When do we do cross-validation?

b) How does k-fold cross-validation work?

c) When do we use LOOCV?

Answer

A

a)
used for testing error estimation and hyper parameter tuning

b)
1. divide dataset into K num of folds
2. leave out one fold (used as test set)
3. fit model to the other folds
4. repeat steps 1 to 3 such that each fold was once used as a test set

c)
Leave-one-out Cross validation (LOOCV)
- one sample is left out for validation
- (repeat for each sample such that each sample acts as validation)
- used for small datasets or in special cases (# folds = # of instances in dataset)

5.1 Supervised Learning Flashcards

(15 cards)