What is the goal of linear regression?
To predict the value of a target variable t for a new input x, given a training dataset comprising N observations {x_n, t_n}.
How does the simplest approach to linear regression work?
In the simplest approach, linear regression involves directly constructing an appropriate function y(x) such that for new inputs x, the function predicts corresponding values of t.
How can linear regression be approached from a probabilistic perspective?
From a probabilistic perspective, we aim to model the predictive distribution p(t∣x), which expresses the uncertainty about the value of t for each value of x.
Why is the probabilistic approach useful in linear regression?
The probabilistic approach allows us to make predictions of t for any new value of x, minimizing the expected value of a suitably chosen loss function.
What is the goal of (polynomial) curve fitting?
To exploit the training set to discover the underlying function and make predictions for new inputs, even though individual observations are corrupted by noise.
Why is polynomial curve fitting considered linear in the parameters?
How are the coefficients in polynomial curve fitting determined?
What is the error function used in polynomial curve fitting?
How is the optimal polynomial determined in polynomial curve fitting?
What is model selection in polynomial curve fitting?
What is overfitting in polynomial curve fitting?
How can overfitting be detected?
What metric is used to assess generalization performance?
Why does performance get worse as the polynomial order M increases?
As M increases:
What happens when M=9 and there are 10 coefficients in polynomial curve fitting?
What happens when M=0 and there are 10 coefficients in polynomial curve fitting?
What happens when M=1 and there are 10 coefficients in polynomial curve fitting?
What general principle can be learned from overfitting?
Overfitting is a general property of maximum likelihood estimation. It can also occur in deep learning when training on a small dataset, leading to poor generalization.
What is the effect of adding observations to the training set on training error?
What is the effect of adding observations to the training set on test error?
Test error typically decreases because a larger training set helps the model generalize better to unseen data, reducing overfitting.
What is the effect of removing observations from the training set on training error?
What is the effect of removing observations from the training set on test error?
Test error typically increases because the model is more prone to overfitting the smaller training set.
What is the effect of adding observations to the test set?
What is the effect of removing observations from the test set?