What is the main difference between linear and logistic regression?
Linear regression predicts a continuous value, while logistic regression predicts a probability between 0 and 1 for binary classification.
What are the two parameters in a linear regression model?
Slope (β₁) and intercept (β₀).
What is the dependent variable in a regression model?
It is the output that the model is trying to predict.
What are the assumptions of linear regression?
Linearity, independence of features, normality of errors, and homoscedasticity.
What is multicollinearity?
It’s when input features are highly correlated with each other, which can distort model interpretation.
Why do we square the errors in Ordinary Least Squares (OLS)?
To make them positive and easier to optimize, and to make large differences more noticeable.
What is robust regression?
An alternative to OLS that uses absolute values instead of squares, making it less sensitive to outliers.
What is a scatterplot useful for?
Detecting linearity and spotting outliers.
What is R-squared?
A measure of how well the model fits the data; closer to 1 means better fit.
Why do we scale features before fitting a model?
To ensure all features contribute equally and avoid dominance by large-scale variables.
How does logistic regression output predictions?
It uses a sigmoid function to output probabilities between 0 and 1.
What is the role of feature engineering?
Transforming or creating features to improve model performance or fit.
What is the difference between homoscedasticity and heteroscedasticity?
Homoscedasticity means the variance of errors is constant across all levels of input; heteroscedasticity means the variance changes.
What is a residual in regression?
The difference between the actual value and the predicted value by the model.
What is the sigmoid function used in logistic regression?
A mathematical function that maps any real value into a range between 0 and 1.
When should you not use linear regression?
When the relationship between variables is non-linear or when the assumptions of linear regression are violated.
Why is normality of errors important in linear regression?
It ensures reliable inference like confidence intervals and p-values.
How can linear regression be used in FP&A?
To forecast costs or revenue based on inputs like volume, location, and seasonality.
How can logistic regression be used in FP&A?
To predict binary outcomes such as whether an order will be late or whether a customer will reorder.