Classification Flashcards

Question 1

Q

What is classification

Answer

A

Classification is the task of predicting labels for unseen data based on patterns from training data

Question 2

Q

General approach to classification

Answer

A

Training set with known class labels, build a classification label on that and then apply it to the test set, which consists of unknown labels

Question 3

Q

Decision tree induction

Answer

A

The decision tree generation consists of two phases.
Tree construction and tree pruning.
Tree construction
* At the start, all the training examples are at the root
* Split xamples based on selected attributes
* examples that satisfy the condition on the selected attribute move to that branch

Tree pruning
* Identify and remove branches that reflect noise or outliers

A decision tree can be used to test attributes of an unseen sample. The unseen sample is used to predict its attributes based on the tree you built from the training data.

Question 4

Q

What is splitting

Answer

A

splitting is how we divide the data at a node based on an attribute so that the resulting child nodes are “purer”

Question 5

Q

Decision tree

Answer

A

A decision tree is a tree-like structure with branches and leaf nodes. Branches represent tests or decisions on attributes, and the leaf nodes determine the class label (or prediction) for the data.

Question 6

Q

What is discretization.

Answer

A

discrete bins like 20–29, 30–39, 40–49

Question 7

Q

What is binarization

Answer

A

Thresholds like 1 < 10 > 9

Question 8

Q

different purity measures

Answer

A

Information gain and gini index.

Question 9

Q

Information gain

Answer

A

Information Gain (IG) measures how much uncertainty (entropy) is reduced if we split the dataset based on a particular attribute.

Question 10

Q

Entropy

Answer

A

Randomness measurement

Question 11

Q

Precision

Answer

A

Precision is the ratio between the True Positives in all of the Positives.

Question 12

Q

Recall

Answer

A

Recall tells us how well a model finds all True positives in ALL the data(NOT ONLY THE POSITIVE DATA as in precision).

Question 13

Q

When is precision useful

Answer

A

Precision is more useful when we want to
confirm the correctness of our model.

Question 14

Q

KNN clustering

Answer

A

works by finding the “k” closest data points. not great with high dimensionality because distance can lose meaning

Classification Flashcards

(14 cards)