5.3 SVM Flashcards

(8 cards)

1
Q

a) Why is SVM a discriminative algorithm?

b) What does a hyperplane mean?

A

a)
SVM is a discriminative algo because it seeks to find decision boundaries and separating planes between classes

b)
hyperplane is a linear decision boundary that separates two classes in feature space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is the SVM goal?

A

SVM chooses the best plane

Goal:
- separating hyperplanes between classes max their dist from the closest data-points of their respective classes
(line must have maximal dist from pts in either classes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

a) Explain the general equation of a hyperplane and margin

b) What does SVM try to maximize?

c) What do support vectors indicate?

d) What does a max-margin hyperplane mean?

A

a)
w*x + ϐ = 0 (w is matrix of features β)
- w is a weight vector (pts perpendicular to hyperplane)
- x is a point
- ϐ is the bias (intercept) (shifts hyperplane away from origin along normal dir)

special case (2 features)
{x1,x2}: β + β1x1 + β2x2 = 0
- straight line

b)
SVM max margin to improve classification accuracy
- margin: dist between hyperplane and nearest data pt from each class

c)
training instnaces that lie closests to the hyperplane
- ex. pixels at the boundary between ‘tumour’ and ‘normal’ tissue

d)
hyperplane w/ max margin or dist to the closest pts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the following:

“Maximizing the margin of a hyperplane is equivalent to minimizing the ||W||”

A

||w|| = length (magnitude) of the weight vector

make perpendicular distance from the closest training points to the plane as large as possible

  • dist is inversely proportional to the normal of the weight vector w
  • equivalent to minimizing ||w||
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

a) Why is SVM a dual optimization problem?

b) What are two optimization goals?

A

a)
SVM seeks to address 2 things:
1. ensure classification is correct
2. maximize the margin

b)
1. Min
- minimizing the number of misclassified points –> finding the optimal location of the decision boundary

  1. Max
    - finding the optimal oritentation of the decision boundary
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

a) How is the SVM algo and the eq change to deal with non-separable cases?

b) What does the slack variable indicate?

A

a)
Separable cases:
yi(w*xi + b) ≥ 1

–> yi = label
–> xi = feature vector
–> w = weight (normal vector)
–> b = bias

correct classification:

yi(wxi + b) ≥ 1
- (correct and outside the margin)
1 ≥ yi(w
xi + b) ≥ 0
- (correct but within the margin)
0 ≥ yi(w*xi + b)
- (incorrect and / or on the margin)

non-separable :
yi(w*xi + b) ≥ 1 - ε

b)
slack variable (ε)
- allows some variables to be close to the margin
- selects the hyperplane that min empirical error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

a) What is the role of C in a generalized form of the SVM algo (to deal w/ separable cases?)

b) What are the trade-offs in choosing large/ small values of C?

c) How do we choose the best C?

A

a)
cost parameter that weights the penalty for:
- misclassification
- margin-violating training points

b)
large C
- strong penalty on errors
- makes ε small
Pro: low training error
Con: risk of overfitting and poor generalization

small C
- weak penalty on error
- tolerates larger ε
Pro: robustness to noise and low variance
Con: underfit if C is too small and model ignores useful structure

c)
N-fold validation to find C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In a soft margin SVM, what does the hyperparameter C control?

A

trade-off between max marging and min classification error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly