What is the input/feature
The variables you use to make a prediction
What is the output/target
What you are trying to predict
What is a data matrix
A table of many observations
What is a target vector
The column of correct answers/labels for each row
What is a model
A mathematical function that maps inputs to outputs
What are parameters
The numbers inside a model that get adjusted during training
What is training
Tuning the parameters so the model fits the data
What is machine learning about fundamentally
Finding patterns in data to make predictions on new, unseen data
What is the idea behind the 1-Nearest Neighbour algorithm
To classify a new point, find the closest training example and assign then same label
What is the decision boundary in the 1-Nearest Neighbour
If you apply 1-NN everywhere, you can draw a line separating the regions where the algorithm predicts X vs Y. This is the decision boundary
What is the main problem with 1-NN
An outlier would create a small ‘island’ of the wrong class in the middle of the correct region.
What is the fix to the outlier problem in 1-NN
k-Nearest Neighbours
Instead of 1 neighbour, use the k nearest neighbours and take a majority vote
What is the tradeoff when choosing the size of k
k too small = sensitive to noise
k too big = ignores local structure
Euclidian distance formula
Distance = sqrt ((x1 - x2)^2 + (y1 - y2)^2)
What are the 2 causes of uncertainty in ML outcomes
Difference between classification and regression
In classification, the target is discrete (labels like Cat/Dog)
In regression, the target is a real number
What is the goal of minimising the sum of squared distances
To train or optimise by tuning its parameters so that the predicted function fits the observed data as closely as possible