Chapter 2.1: Classification (Part I) - Classification concepts, LDA and QDA Flashcards

(15 cards)

1
Q

What is the general setup of classification problems?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the decision boundaries of a classification function?

What are some examples of Linear Classifiers?

How do these look on a graph vs non-linear classifiers?

A

affine –> ax + b (in 2D) but more generally they are straight lines in 2-D and planes in 3-D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the Discriminative approach to classification problems?

A

Discriminative approach:
* Dont know ηl(x), dont know fX(x) so instead we estimate the regression function of each class
* no need to calculate the maringal density fX as when we are calculating the argmax it has no contribution?? - ADD TO THIS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an example of the Discriminative approach? What are its advantages and disadvantages?

A
  • Difficult to interpret –> difficult to tell what features or why my x has a higher probability of being in one class over another.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Generative approach to classification problems?

A
  • Start with the joint distribution in the first case but condition on Y first.
  • Again dont know πL and fl so have to model both.
  • We get the second formula as we apply Bayes theorem
  • P(X=x) drops out of the last equality because, for a fixed x, the denominator is the same for every class l, so it does not affect which class is the largest
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an example of the Generative approach to classification problems?

What are its advantages and disadvantages?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Is it possible to combine the Discriminative and Generative approaches?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the Linear Discriminant Analysis?

What do we assume fl is?

What do we use our training data to estimate?

How do we derive the linear discriminant function?

Finally want is the function that defines LDA and what makes it linear?

A
  • Note the actually linear “discriminant” function is quadratic in x?
  • We model each class
  • In LDA we ignore the constant as it does not affect the final classifcation rule –> that is when we look at the argmax δ(x) over all l, if all δl(x) class scores are shift by the same amount the ordering of the scores does not change.
  • πl-hat –> it is the estimated prior probability of class l, which is just the porporition of the training observations belong to class l. It is just hte proporition of training observation belong to class l.
  • μl –> estimated mean vector for class l (class centeriod), (it gives the center or typicall location of the obervations from class l in the feature space)
  • Σ-hat –> pool covariance matrix estimate, measures how all the variables vary together but under the LDA assumption all classes share the same covariacne matrix
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do we derive the second equation for Σ-hat?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What equation gives us the decision boundary between class 1 and class 2?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What the decision boundary for the simple LDA problem?

A

μi = 1/number of that label * sum of that xi in the vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What isa useful way to understand the LDA?

A
  • Sphering think about scaling/rotation
  • for the Linear discriminant function: Σ-hat is the variance-covariance matrix, and itself and its inverse are both positive definite and you calculate the square root of it –> then follow by absorbing into 1/2 into (x-μ)T and the same into (x-μ) –> thus you can rewrite the discriminant in the attach L2 norm form.
  • problem now reduces to which of my transformed centroids is closed to my transformed X –>want to maximise my discriminant –? want to minimise this L22 norm squared –> So essentially going to minimise the difference between the distance of the two
  • Dimension reduction –? esssentially have 3 classes but after the transformation I can plot the decision boundaries on a 2-D plane
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In R, what are the functions I will use to classify the data set by Species type using LDA?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Quadratic discriminant analysis (QDA)? How does it differ from LDA?

How do we use the training data to estimate π-hatl, μ-hatl and Σ-hatl?

Thus how do we find the quadratic discriminant function (δQDAl(x) and how is the QDA classifier defined?ψQDAl(x)

How does it compare to LDA in parameter use?

A
  • Difference is when I assume the conditional density for each class –> we no longer assume that the classes share the exact same sigma.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which approach is better? QDA or LDA?

A
  • Left hand side, see Bayes CLassifier looks quite linear –> thus LDA may be the more appropriate choice

ADD WHICH GRAPH IS BETTER FOR WHICH

How well did you know this?
1
Not at all
2
3
4
5
Perfectly