AIML Week 3 - Classification Flashcards

Question 1

Q

How does classification work?

Answer

A

It models the choice between two distinct alternatives. Ex: Is this fraud or no? Will someone buy this product?

Question 2

Q

Can we use linear regression for classification tasks?

Answer

A

Yes, if we encode the outcome as a binary variable.

Question 3

Q

What is the logistic function (logistic regression)?

Answer

A

p(X) = (e^(beta0 + beta1 * X))/(1+e^(beta0 + beta1 *X))

Question 4

Q

What are “the odds”?

Answer

A

p(X)/(1-p(X))
(This equals e^(beta0 + beta1 * X))
You can also do log(odds)

Question 5

Q

To fit the logistic model, we need to do numerical optimization like…

Question 6

Q

Steps to MLE:

Answer

A

1.) Write the likelihood function
2.) Take the log and run an optimization on this (like gradient descent)(Start with an initial guess, move parameters in optimal direction, stop when close enough)

Question 7

Q

What is the log-likelihood function?

Answer

A

sum(i=1 to n) of (y_ilog(y_i(hat)) + (1-y_i)log(1-y_i(hat))

Question 8

Q

What are a couple standard optimization methods?

Answer

A

Gradient ascent
Newton-Raphson (IRLS)

Question 9

Q

The K-NN Classifier - process:

Answer

A

1.) Fix K
2.) Find the K points nearest to your point
3.) Assign the point to the class in the majority among the neighbors

Question 10

Q

Naïve Bayes - how is it different?

Answer

A

We make a simplifying assumption that we assume the conditional probabilities of each feature given its class are independent

Question 11

Q

Bayes can handle special kinds of data…

Answer

A

Text data! (multiple factors, often more than observations)(spam or ham)

Question 12

Q

What is the P>N problem?

Answer

A

If we have too many features relative to the amount of data, we cannot use models like logistic regression (NLP (text), biomed)

Question 13

Q

In order to avoid log(0) in spam vs ham, we add a fudge factor called

Answer

A

Laplace smoothing

Question 14

Q

Pros and cons of Naive Bayes

Answer

A

Fast, works well with high dimensions, handles irrelevant features well

Independence assumption means we can’t learn about interactions between variables

Question 15

Q

In evaluation, what’s the difference between regression and classification?

Answer

A

Regression: Care about distance (MSE)

Classification : Care about correct label (2 errors: false positives, false negatives)

Question 16

Q

We use an ROC curve to…

Answer

Study These Flashcards

A

Visualize types of errors for all possible thresholds. The quality of the classifier is the area between the ROC curve and they X=Y line.

Question 17

Q

How do you calculate the ROC curve?

Answer

Study These Flashcards

A

1.) Sort the instances by score (probability that the data point is a +, for ex), in descending order
2.) Apply the threshold at each unique score and record: count of TP FP TN FN of classifier at that threshold, TP rate, FP rate
3.) Plot the ROC curve by connecting the dots in the TP/FP space

Question 18

Q

How can we measure classifier quality from the ROC?

Answer

Study These Flashcards

A

AUC! The area under the ROC curve - an ideal ROC curve will hug the top left corner of the space. An AUC of 0.5 (on the diagonal) is random. AUC less than this indicates worse than random performance!

AIML Week 3 - Classification Flashcards

(18 cards)