What is a common pitfall when using frequentist statistics?
Frequentist statistics often minimize error based on averages without accounting for the uncertainty around predictions.
How is data modeled in the frequentist framework?
What does p(t∣x) represent in the frequentist framework?
What is the definition of the expected loss function E[L] in machine learning?
the average loss across possible values of t, weighted by their probability, expressed as a double integral over x and t.
What does minimizing the expected loss function involve?
Taking the derivative with respect to the model’s prediction y(x) and setting it to zero.
What result do you get when taking the derivative of the expected loss function with respect to y(x)?
How do you find the optimal prediction y(x) after taking the derivative of the expected loss function?
By setting the derivative to zero and solving, you find that the optimal prediction y(x) is the expected value of t given x, denoted as E[t∣x].
If you do not use a model, how can you estimate the best value of t for a given x?
By taking the expected value of t at that fixed value of x.
What is the purpose of adding zero in the form of −E[t∣x]+E[t∣x]=0 when decomposing the expected loss function?
The purpose is to separate the prediction y(x) from the true expected value E[t∣x], which serves as the best predictor of t given x.
What happens to the cross term 2(y(x)−E[t∣x])(E[t∣x]−t) when taking the expectation of the expanded loss function?
The cross term disappears because E[t∣x]−t has zero mean.
What are the two main components left after taking the expected value of the expanded loss function?
Why is variance related to
t always present in the model?
What are the components of the bias-variance decomposition of the expected loss?
How can bias and variance be controlled in a model?
What is the final form of the expected loss after decomposition?
Expectedloss = (Bias)^2 + Variance + Noise
How does regularization affect the bias-variance tradeoff in a model?
Regularization controls the tradeoff between bias and variance, influencing the model’s performance by adjusting flexibility.
What happens when the regularization parameter λ is high?
What are the effects of low regularization (λ is low)?
What happens when the model is over-regularized (extremely low λ)?
The model exhibits even more variability (high variance), allowing it to be highly flexible, but with very little bias.
What happens to bias as regularization decreases?
Bias decreases as regularization decreases because the model becomes more flexible and can better fit the data.
What happens to variance as regularization decreases?
Variance increases as regularization decreases because the model becomes more complex and can overfit the data.
What is the relationship between test error, bias, and variance?
The test error behaves similarly to the combined bias squared and variance curve, but at a slightly higher error value.
What is the goal when tuning the regularization parameter?
The goal is to find a balance where the sum of bias and variance is minimized, leading to the lowest test error.
How does model complexity affect bias and variance?
Model complexity increases variance but decreases bias, as more complex models can fit the data more closely.