Describe the parametric approach to building a classifier
Choose distribution for P(x|ωi) with parameters θ
For each class ωi, find the parameters that best fit the training data
Determine the prior probabilities P(wi), determine what share of the training data this class makes up
Compute P(ωi|x) for each class and make a classification
What does p(x) means
a function of x, same as f(x)
What is the parameter (θ) likelihood function with respect to X
p(x1,x2, … ,xN:θ)
We can think of this as the probability that all these datapoints came from the distribution with parameters θ
What is the maximum likelihood estimate
A function that chooses the parameter values 𝜃 that make the observed data 𝑋
as likely as possible under the model.
The function chooses parameters that maximises the likelihood
likelihood: probability of the training data given the class
Why maximise the likelihood
By maximising the likelihood function, we find the parameter values that best explain the observed data
In many cases the data is independent of each other (peoples heights are independent of one another), what do we do to ensure that Max likelihood function only has one input
Factorise the probabilities
P(A,B) = P(A)P(B)
How is the product of all the max likelihood probabilities shown
How do you find the minimum and maximum of f(x)
Find the turning points
df(x)/dx = 0
Using a cunning trick, instead of maximising the likelihood, what should we maximise
the log likelihood
ln p(X; θ)
How do we find the best parameters using the log likelihood
Start with the likelihood
Take the log of the likelihood, this turns it into a sum
Take the derivative of the log likelihood and set it to zero
Calculate θ
What is the advantage of taking the log likelihood
It turns the product of all the likelihoods into the sum of all the log likelihoods, which is easier to deal with
What is the negative log likelihood
the minus of the negative log likelihood, you minimise it rather than maximise it
What is the maximum of the gaussian function for the mean and variance and therefore the parameters