Non-Parametric Classifiers Flashcards

(19 cards)

1
Q

How is a non-parametric classifier different to a parametric classifier

A

Non-parametric classifiers do not make assumptions about undelying data distribution and can adapt more flexibly as more data is available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

5 factors of non-parametric classifiers

A

Non-parametric classifiers use the data directly at classification time

No explicit model/distribution of the data i.e. so they’re not governed by parameters. The complexity of the model grows with the size of the dataset

No explicit learning stage

Infinite parameters (in theory), the number of parameters grows as more data is added, which allows for greater flexibility in fitting the data

Rely heavily on the training data to define the decision boundary between classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Advantages of parametric classifiers

A

The number of parameters is fixed and typically uses less training data

Once parameters are known, the training data can be discarded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Advantages and disadvantages of non-parametric classifiers

A

More flexible

Often expensive and often require a lot of data to learn things that were assumed by parametric approaches

Poor choice when data is known to come from simple distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is K-NN

A

K Nearest Neighbour assigns a class to a new input based on the majority class of its k nearest neighbours in the feature space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Strengths and weakness of KNN

A

S: simple to understand, no need for training, flexible with more data

W: computationally expensive for large datasets, performance can degrade with high-dimensional data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does changing k affect the decision boundaries

A

The greater the value of k, the smoother the decision boundaries, less jagged. the smaller the value of k the more aggressive the decision boundaries are and the more they are affected by outliers but can be more accurate

the smaller the value the more chance of overfitting, the larger the value the more chance of underfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a decision tree

A

Builds a tree structure where each internal node represents a decision based on an attribute, and the bottom row of nodes (leaf node) each represents a class label

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Strengths of a decision tree

A

Intuitive, can handle both numerical and categorical data, no need for data scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Weaknesses of a decision tree

A

Can easily overfit, especially with small datasets, unless pruning or regularisation techniques are used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a random forest

A

Combines multiple decision trees (each trained on different samples of the data) to improve classification performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Strengths of a random forest

A

Reduces the risk of overfitting, highly accurate, handles large datasets and features well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Weaknesses of a random forest

A

Less interpretable than a single decision tree, requires careful timing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is kernel density estimation

A

Estimates the pdf of the data by summing the influence of kernels placed at each point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a kernel

A

If a sandbag is placed at each datapoint, the datapoints which are very close together will pile up into a hill, this is where a kernel has lots of influence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is the pdf calculated in Kernel density estimation

A

The pdf is computed using a kernel function which is a function of the distance from the training point and a width parameter h. the formula:

17
Q

Strengths of KDE

A

Can model complex distributions, no need for a fixed number of parameters

18
Q

Weaknesses of KDE

A

High computational cost with large datasets, kernel choice and bandwidth (h) selection are very important

19
Q

How is KDE used for classification

A

For each feature we use KDE to estimate how likely a given data point is to belong to each class, then using these probabilities use Bayes’ theorem to classify each datapoint based on which class has the highest likelihood