Classification Flashcards

(14 cards)

1
Q

What is classification

A

Classification is the task of predicting labels for unseen data based on patterns from training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

General approach to classification

A

Training set with known class labels, build a classification label on that and then apply it to the test set, which consists of unknown labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Decision tree induction

A

The decision tree generation consists of two phases.
Tree construction and tree pruning.
Tree construction
* At the start, all the training examples are at the root
* Split xamples based on selected attributes
* examples that satisfy the condition on the selected attribute move to that branch

Tree pruning
* Identify and remove branches that reflect noise or outliers

A decision tree can be used to test attributes of an unseen sample. The unseen sample is used to predict its attributes based on the tree you built from the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is splitting

A

splitting is how we divide the data at a node based on an attribute so that the resulting child nodes are “purer”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Decision tree

A

A decision tree is a tree-like structure with branches and leaf nodes. Branches represent tests or decisions on attributes, and the leaf nodes determine the class label (or prediction) for the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is discretization.

A

discrete bins like 20–29, 30–39, 40–49

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is binarization

A

Thresholds like 1 < 10 > 9

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

different purity measures

A

Information gain and gini index.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Information gain

A

Information Gain (IG) measures how much uncertainty (entropy) is reduced if we split the dataset based on a particular attribute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Entropy

A

Randomness measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Precision

A

Precision is the ratio between the True Positives in all of the Positives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Recall

A

Recall tells us how well a model finds all True positives in ALL the data(NOT ONLY THE POSITIVE DATA as in precision).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When is precision useful

A

Precision is more useful when we want to
confirm the correctness of our model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

KNN clustering

A

works by finding the “k” closest data points. not great with high dimensionality because distance can lose meaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly