KNN Flashcards

Question 1

Q

k-NN process overview

Question 2

Q

Goal

Answer

A

given a set of labeled items, automatically label a new item

Question 3

Q

Idea

Answer

A

Consider most similar other items (defined in terms of their attributes), look at their labels and give the unassigned item the majority votes. Ties broken randomly.

Question 4

Q

To automate knn, what two decisions need to be made

Answer

A

How to define similarity?
How many should vote? (what is k?)

Question 5

Q

Euclidean distance

Question 6

Q

Cosine similarity

Question 7

Q

Jaccard distance

Question 8

Q

Hamming distance

Question 9

Q

Manhatan distance

Question 10

Q

Regarding distance metrics…what if attributes are a mixture of kinds of data?

Answer

A

Define your own custom designed metric

Question 11

Q

synonymous terms

Question 12

Q

Evaluation metrics

Answer

A

Accuracy
Precision
Recall
F-score

Question 13

Q

Evaluation Metric : Accuracy

Answer

A

number of correct labels / (total number of labels)

Question 14

Q

Evaluation Metric : Precision

Answer

A

number of true positives /

(number of true positives + number of false positives)

Question 15

Q

Evaluation Metric : Recall

Answer

A

Number of true positives /

(number of true positives + number of false negatives)

Question 16

Q

Evaluation Metric : F-score

Answer

Study These Flashcards

A

Harmonic mean of precision and recall

(2 × precision × recall) / (precision + recall)

Question 17

Q

Evaluation Metric : Misclassification rate

Answer

Study These Flashcards

A

1-accurary

Question 18

Q

Choosing k

Answer

Study These Flashcards

A

Need to understand data well to get a good guess
Then try a few different k’s and see how evaluation changes. Pick the k that optimizes the chosen evaluation metric
In binary classification, pick k to be an odd number

Question 19

Q

Modeling assumptions in K-NN

Answer

Study These Flashcards

A

Question 20

Q

Scaling

Answer

Study These Flashcards

A

Standardize the data so that all variables are given a mean of zero and a standard deviation of one.

In R, this can be achieved using the scale() function

KNN Flashcards

(20 cards)