What is statistical learning?
A set of methods for estimating an unknown relationship between predictors and a response, used for prediction or inference.
What is the difference between prediction and inference?
Prediction focuses on accurately predicting future outcomes, while inference focuses on understanding relationships between variables.
What is the bias–variance trade-off?
More flexible models reduce bias but increase variance; optimal performance balances both to minimize test error.
Why is test error more important than training error?
Training error is optimistically biased, while test error reflects performance on unseen data.
What is the difference between supervised and unsupervised learning?
Supervised learning uses labeled outcomes, while unsupervised learning identifies structure without known responses.
Why is linear regression unsuitable for classification?
It can produce predictions outside the [0,1] interval and does not model class probabilities correctly.
What does logistic regression model?
The log-odds of the probability that an observation belongs to a class as a linear function of predictors.
How are coefficients interpreted in logistic regression?
A one-unit increase in a predictor multiplies the odds by eβ, holding other variables constant.
What is a decision threshold in logistic regression?
A cutoff probability used to convert predicted probabilities into class labels.
Why is the choice of threshold important?
Different thresholds change the balance between false positives and false negatives and affect economic outcomes.
Why do we use cross-validation?
To estimate test error and compare models when a separate test set is unavailable.
What is the difference between a validation set and k-fold cross-validation?
A validation set is simpler but noisier, while k-fold cross-validation is more stable and data-efficient.
What is leave-one-out cross-validation?
A form of cross-validation where each observation is used once as the validation set.
What is the purpose of the bootstrap?
To estimate the variability and uncertainty of model estimates.
How do decision trees work?
They recursively split the predictor space into regions that minimize prediction error within each region.
What is a main advantage of decision trees?
They are easy to interpret and can model nonlinearities and interactions.
What is a main weakness of single decision trees?
High variance, meaning small changes in data can lead to very different trees.
What is bagging?
An ensemble method that averages predictions from trees trained on bootstrap samples to reduce variance.
How do random forests differ from bagging?
Random forests add random feature selection at each split to reduce correlation between trees.
What is boosting?
A sequential ensemble method that focuses on observations that were previously mispredicted.
What is predictive analytics?
Techniques used to predict future outcomes based on historical data.
What is descriptive analytics?
Techniques used to summarize and describe patterns in data.
Why is data preprocessing important?
Poor data quality directly reduces model performance and decision quality.
Why is model evaluation critical in business analytics?
High predictive accuracy does not necessarily imply high economic value.