How is a non-parametric classifier different to a parametric classifier
Non-parametric classifiers do not make assumptions about undelying data distribution and can adapt more flexibly as more data is available
5 factors of non-parametric classifiers
Non-parametric classifiers use the data directly at classification time
No explicit model/distribution of the data i.e. so they’re not governed by parameters. The complexity of the model grows with the size of the dataset
No explicit learning stage
Infinite parameters (in theory), the number of parameters grows as more data is added, which allows for greater flexibility in fitting the data
Rely heavily on the training data to define the decision boundary between classes
Advantages of parametric classifiers
The number of parameters is fixed and typically uses less training data
Once parameters are known, the training data can be discarded
Advantages and disadvantages of non-parametric classifiers
More flexible
Often expensive and often require a lot of data to learn things that were assumed by parametric approaches
Poor choice when data is known to come from simple distributions
What is K-NN
K Nearest Neighbour assigns a class to a new input based on the majority class of its k nearest neighbours in the feature space
Strengths and weakness of KNN
S: simple to understand, no need for training, flexible with more data
W: computationally expensive for large datasets, performance can degrade with high-dimensional data
How does changing k affect the decision boundaries
The greater the value of k, the smoother the decision boundaries, less jagged. the smaller the value of k the more aggressive the decision boundaries are and the more they are affected by outliers but can be more accurate
the smaller the value the more chance of overfitting, the larger the value the more chance of underfitting
What is a decision tree
Builds a tree structure where each internal node represents a decision based on an attribute, and the bottom row of nodes (leaf node) each represents a class label
Strengths of a decision tree
Intuitive, can handle both numerical and categorical data, no need for data scaling
Weaknesses of a decision tree
Can easily overfit, especially with small datasets, unless pruning or regularisation techniques are used
What is a random forest
Combines multiple decision trees (each trained on different samples of the data) to improve classification performance
Strengths of a random forest
Reduces the risk of overfitting, highly accurate, handles large datasets and features well
Weaknesses of a random forest
Less interpretable than a single decision tree, requires careful timing
What is kernel density estimation
Estimates the pdf of the data by summing the influence of kernels placed at each point
What is a kernel
If a sandbag is placed at each datapoint, the datapoints which are very close together will pile up into a hill, this is where a kernel has lots of influence
How is the pdf calculated in Kernel density estimation
The pdf is computed using a kernel function which is a function of the distance from the training point and a width parameter h. the formula:
Strengths of KDE
Can model complex distributions, no need for a fixed number of parameters
Weaknesses of KDE
High computational cost with large datasets, kernel choice and bandwidth (h) selection are very important
How is KDE used for classification
For each feature we use KDE to estimate how likely a given data point is to belong to each class, then using these probabilities use Bayes’ theorem to classify each datapoint based on which class has the highest likelihood