5.3 SVM Flashcards

Question 1

Q

a) Why is SVM a discriminative algorithm?

b) What does a hyperplane mean?

Answer

A

a)
SVM is a discriminative algo because it seeks to find decision boundaries and separating planes between classes

b)
hyperplane is a linear decision boundary that separates two classes in feature space

Question 2

Q

Why is the SVM goal?

Answer

A

SVM chooses the best plane

Goal:
- separating hyperplanes between classes max their dist from the closest data-points of their respective classes
(line must have maximal dist from pts in either classes)

Question 3

Q

a) Explain the general equation of a hyperplane and margin

b) What does SVM try to maximize?

c) What do support vectors indicate?

d) What does a max-margin hyperplane mean?

Answer

A

a)
w*x + ϐ = 0 (w is matrix of features β)
- w is a weight vector (pts perpendicular to hyperplane)
- x is a point
- ϐ is the bias (intercept) (shifts hyperplane away from origin along normal dir)

special case (2 features)
{x1,x2}: β + β1x1 + β2x2 = 0
- straight line

b)
SVM max margin to improve classification accuracy
- margin: dist between hyperplane and nearest data pt from each class

c)
training instnaces that lie closests to the hyperplane
- ex. pixels at the boundary between ‘tumour’ and ‘normal’ tissue

d)
hyperplane w/ max margin or dist to the closest pts

Question 4

Q

Explain the following:

“Maximizing the margin of a hyperplane is equivalent to minimizing the ||W||”

Answer

A

||w|| = length (magnitude) of the weight vector

make perpendicular distance from the closest training points to the plane as large as possible

dist is inversely proportional to the normal of the weight vector w
equivalent to minimizing ||w||

Question 5

Q

a) Why is SVM a dual optimization problem?

b) What are two optimization goals?

Answer

A

a)
SVM seeks to address 2 things:
1. ensure classification is correct
2. maximize the margin

b)
1. Min
- minimizing the number of misclassified points –> finding the optimal location of the decision boundary

Max
- finding the optimal oritentation of the decision boundary

Question 6

Q

a) How is the SVM algo and the eq change to deal with non-separable cases?

b) What does the slack variable indicate?

Answer

A

a)
Separable cases:
yi(w*xi + b) ≥ 1

–> yi = label
–> xi = feature vector
–> w = weight (normal vector)
–> b = bias

correct classification:

yi(wxi + b) ≥ 1
- (correct and outside the margin)
1 ≥ yi(wxi + b) ≥ 0
- (correct but within the margin)
0 ≥ yi(w*xi + b)
- (incorrect and / or on the margin)

non-separable :
yi(w*xi + b) ≥ 1 - ε

b)
slack variable (ε)
- allows some variables to be close to the margin
- selects the hyperplane that min empirical error

Question 7

Q

a) What is the role of C in a generalized form of the SVM algo (to deal w/ separable cases?)

b) What are the trade-offs in choosing large/ small values of C?

c) How do we choose the best C?

Answer

A

a)
cost parameter that weights the penalty for:
- misclassification
- margin-violating training points

b)
large C
- strong penalty on errors
- makes ε small
Pro: low training error
Con: risk of overfitting and poor generalization

small C
- weak penalty on error
- tolerates larger ε
Pro: robustness to noise and low variance
Con: underfit if C is too small and model ignores useful structure

c)
N-fold validation to find C

Question 8

Q

In a soft margin SVM, what does the hyperparameter C control?

Answer

A

trade-off between max marging and min classification error

5.3 SVM Flashcards

(8 cards)