Chapter 2.1: Classification (Part I) - Classification concepts, LDA and QDA Flashcards by Dylan Ottey

What is the general setup of classification problems?

How well did you know this?

Not at all

Perfectly

What are the decision boundaries of a classification function?

What are some examples of Linear Classifiers?

How do these look on a graph vs non-linear classifiers?

affine –> ax + b (in 2D) but more generally they are straight lines in 2-D and planes in 3-D

How well did you know this?

Not at all

Perfectly

What are the Discriminative approach to classification problems?

Discriminative approach:
* Dont know η_l(x), dont know f_X(x) so instead we estimate the regression function of each class
* no need to calculate the maringal density f_X as when we are calculating the argmax it has no contribution?? - ADD TO THIS

How well did you know this?

Not at all

Perfectly

What is an example of the Discriminative approach? What are its advantages and disadvantages?

Difficult to interpret –> difficult to tell what features or why my x has a higher probability of being in one class over another.

How well did you know this?

Not at all

Perfectly

What is the Generative approach to classification problems?

Start with the joint distribution in the first case but condition on Y first.
Again dont know π_L and f_l so have to model both.
We get the second formula as we apply Bayes theorem
P(X=x) drops out of the last equality because, for a fixed x, the denominator is the same for every class l, so it does not affect which class is the largest

How well did you know this?

Not at all

Perfectly

What is an example of the Generative approach to classification problems?

What are its advantages and disadvantages?

How well did you know this?

Not at all

Perfectly

Is it possible to combine the Discriminative and Generative approaches?

How well did you know this?

Not at all

Perfectly

What is the Linear Discriminant Analysis?

What do we assume f_l is?

What do we use our training data to estimate?

How do we derive the linear discriminant function?

Finally want is the function that defines LDA and what makes it linear?

Note the actually linear “discriminant” function is quadratic in x?
We model each class
In LDA we ignore the constant as it does not affect the final classifcation rule –> that is when we look at the argmax δ(x) over all l, if all δ_l(x) class scores are shift by the same amount the ordering of the scores does not change.
π_l-hat –> it is the estimated prior probability of class l, which is just the porporition of the training observations belong to class l. It is just hte proporition of training observation belong to class l.
μ_l –> estimated mean vector for class l (class centeriod), (it gives the center or typicall location of the obervations from class l in the feature space)
Σ-hat –> pool covariance matrix estimate, measures how all the variables vary together but under the LDA assumption all classes share the same covariacne matrix

How well did you know this?

Not at all

Perfectly

How do we derive the second equation for Σ-hat?

How well did you know this?

Not at all

Perfectly

What equation gives us the decision boundary between class 1 and class 2?

How well did you know this?

Not at all

Perfectly

What the decision boundary for the simple LDA problem?

μ_i = 1/number of that label * sum of that x_i in the vector

How well did you know this?

Not at all

Perfectly

What isa useful way to understand the LDA?

Sphering think about scaling/rotation
for the Linear discriminant function: Σ-hat is the variance-covariance matrix, and itself and its inverse are both positive definite and you calculate the square root of it –> then follow by absorbing into 1/2 into (x-μ)^T and the same into (x-μ) –> thus you can rewrite the discriminant in the attach L² norm form.
problem now reduces to which of my transformed centroids is closed to my transformed X –>want to maximise my discriminant –? want to minimise this L²2 norm squared –> So essentially going to minimise the difference between the distance of the two
Dimension reduction –? esssentially have 3 classes but after the transformation I can plot the decision boundaries on a 2-D plane

How well did you know this?

Not at all

Perfectly

In R, what are the functions I will use to classify the data set by Species type using LDA?

How well did you know this?

Not at all

Perfectly

What is Quadratic discriminant analysis (QDA)? How does it differ from LDA?

How do we use the training data to estimate π-hat_l, μ-hat_l and Σ-hat_l?

Thus how do we find the quadratic discriminant function (δ^QDA_l(x) and how is the QDA classifier defined?ψ^QDA_l(x)

How does it compare to LDA in parameter use?

Difference is when I assume the conditional density for each class –> we no longer assume that the classes share the exact same sigma.

How well did you know this?

Not at all

Perfectly

Which approach is better? QDA or LDA?

Left hand side, see Bayes CLassifier looks quite linear –> thus LDA may be the more appropriate choice

ADD WHICH GRAPH IS BETTER FOR WHICH

How well did you know this?

Not at all

Perfectly

Chapter 2.1: Classification (Part I) - Classification concepts, LDA and QDA Flashcards

(15 cards)