AIML Week 3 - Classification Flashcards

(18 cards)

1
Q

How does classification work?

A

It models the choice between two distinct alternatives. Ex: Is this fraud or no? Will someone buy this product?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Can we use linear regression for classification tasks?

A

Yes, if we encode the outcome as a binary variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the logistic function (logistic regression)?

A

p(X) = (e^(beta0 + beta1 * X))/(1+e^(beta0 + beta1 *X))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are “the odds”?

A

p(X)/(1-p(X))
(This equals e^(beta0 + beta1 * X))
You can also do log(odds)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

To fit the logistic model, we need to do numerical optimization like…

A

MLE!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Steps to MLE:

A

1.) Write the likelihood function
2.) Take the log and run an optimization on this (like gradient descent)(Start with an initial guess, move parameters in optimal direction, stop when close enough)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the log-likelihood function?

A

sum(i=1 to n) of (y_ilog(y_i(hat)) + (1-y_i)log(1-y_i(hat))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are a couple standard optimization methods?

A

Gradient ascent
Newton-Raphson (IRLS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The K-NN Classifier - process:

A

1.) Fix K
2.) Find the K points nearest to your point
3.) Assign the point to the class in the majority among the neighbors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Naïve Bayes - how is it different?

A

We make a simplifying assumption that we assume the conditional probabilities of each feature given its class are independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Bayes can handle special kinds of data…

A

Text data! (multiple factors, often more than observations)(spam or ham)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the P>N problem?

A

If we have too many features relative to the amount of data, we cannot use models like logistic regression (NLP (text), biomed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In order to avoid log(0) in spam vs ham, we add a fudge factor called

A

Laplace smoothing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Pros and cons of Naive Bayes

A

Fast, works well with high dimensions, handles irrelevant features well

Independence assumption means we can’t learn about interactions between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In evaluation, what’s the difference between regression and classification?

A

Regression: Care about distance (MSE)

Classification : Care about correct label (2 errors: false positives, false negatives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

We use an ROC curve to…

A

Visualize types of errors for all possible thresholds. The quality of the classifier is the area between the ROC curve and they X=Y line.

17
Q

How do you calculate the ROC curve?

A

1.) Sort the instances by score (probability that the data point is a +, for ex), in descending order
2.) Apply the threshold at each unique score and record: count of TP FP TN FN of classifier at that threshold, TP rate, FP rate
3.) Plot the ROC curve by connecting the dots in the TP/FP space

18
Q

How can we measure classifier quality from the ROC?

A

AUC! The area under the ROC curve - an ideal ROC curve will hug the top left corner of the space. An AUC of 0.5 (on the diagonal) is random. AUC less than this indicates worse than random performance!