What does a t stat test? When would and wouldn’t a t stat be significant?
The t-statistic tests the hypothesis that a particular regression coefficient (β) is different from zero (i.e., that the variable has an effect).
* A large absolute t-value (e.g., > 2 or < -2) suggests the coefficient is significantly different from zero.
* A small t-value means the variable might not be contributing much to the model.
Signal-to-noise ratio: how strong the effect of the variable is relative to uncertainty
Note: t = estimate/standard error
What does a p value tell you? What does a high or low p value show and when would you reject the null?
The p-value tells you the probability of observing a t-statistic as extreme as the one calculated, assuming the null hypothesis is true (i.e., the coefficient is zero). Is the effect of the variable real or random noise?
Interpretation:
* Low p-value (< 0.05) → Reject the null hypothesis → The variable is statistically significant.
* High p-value (> 0.05) → Fail to reject the null → The variable is not statistically significant.
What are the assumptions of multiple regression?
*Linear relationship between x and y
*No exact linear relationship among x’s (violation = multicollinearity)
*Expected value of error term = 0
*Variance of error term is constant (violation = heteroskedasticity)
*Errors not serially correlated
*Errors normally distributed
What is a normal Q=Q plot?
Compares a variables distribution to a normal distribution. Helpful for exploring whether residuals are normally distributed (key assumption)
What are the columns included in an ANOVA table?
What do they measure/show?
What is an ANOVA table used to calculate?
F-test and R^2
What are the degrees of freedom for Regression, error and total in the ANOVA table?
Why do I lose k + 1 degrees of freedom in multiple regression?
In a multiple regression model with kkk predictors and an intercept, you estimate k+1k + 1k+1 parameters.
Each estimated parameter uses up one degree of freedom.
So, from your total sample size nnn, you lose k+1 degrees of freedom.
Total degrees of freedom: nnn (number of observations)
Used for estimating parameters: k predictors + 1 intercept = k+1
Remaining degrees of freedom (for residuals):
n−k−1
→ These are used to assess the fit of the model, such as calculating the standard error of the regression and testing hypotheses.
So, if you have k predictors, you lose k + 1 data points’ worth of freedom.
It’s like having k + 1 fewer shops to test your idea — those were spent fitting the model.
The remaining n − k − 1 degrees of freedom are used to measure how well the model fits the data.
How do you calculate Regression Sum of Squares and what is its significance and usage?
What kind of variation does it reflect? What does high v low SSR mean?
= explained variation
Significance: SSR measures the variation explained by the regression model. It indicates how much of the total variation is accounted for by the model’s predictions.
**Usage: ** A higher SSR suggests that the model is effective in explaining the variability in the data. It is used to assess the model’s explanatory power.
Y estimated v.s. Y mean
How do you calculate Error Sum of Squares and what is its significance and usage?
= unexplained variation
Significance: SSE measures the variation that is not explained by the regression model. It represents the residual or unexplained variability in the data.
Usage: A lower SSE indicates that the model’s predictions are closer to the actual data points, suggesting a better fit. It is used to evaluate the model’s accuracy.
Y estimated v.s. Y mean
How do you calculate Total Sum of Squares and what is its significance and usage?
= explained variation + unexplained variation
= SST = RSS + SSE
**Significance: **SST represents the total variation in the observed data. It serves as a baseline measure of how much the actual data points deviate from the overall mean (sum of squared differences).
**Usage: **SST is used to quantify the total variability in the dataset before any model is applied.
Y actual v.s. Y mean
How is the Mean Sqaure calculated?
Sum of squares divided by degrees of freedom
What is the formula to calculate R2 directly from the ANOVA table and what does it show?
R2 measures the % of total variation in the Y variable (dependent) explained by the X variable (independent)
= explained/total variation
or
= total variation - unexplained variation/total variation
What does an R2 of 0.25 mean?
X explains 25% of the variation in Y
What does adjusted R2 do and how is it calculated? Why and whats the formula?
Adjusted R2 applies a penalty factor to reflect the quality of added variables.
Too many expanatory x variables run the risk of trying to overexplain the data (explains randomness not true patterns) = poor forecasting.
formula: 1- (total df/unexplained df x 1-r2)
Explain the adjusted r2 in words?
“Let’s take the unexplained variance and scale it based on how many predictors (k) you used and how much data you had. If you added predictors that don’t help, we’ll penalize you.”
What makes adjusted R2 more reliable/refined? i.e. explain how the penalty works
Why this makes it refined
If you add a predictor that doesn’t help, R2, R2 barely increases, but k increases.
That makes the denominator n−k−1 smaller → the whole fraction gets bigger → Adjusted R2 drops.
So the formula is saying:
“You added complexity, but didn’t improve the model enough to justify it.”
What does a small or large right hand side of the adjusted r2 formula show about a model?
Small right-hand side → model explains a lot → Adjusted R2 is high.
Large right-hand side → model explains little or is overfitted → Adjusted R2R^2R2 is low.
Is higher or lower better for the following:
* R2
* AIC
* BIC
What does AIC help to evaluate/when is it best used?
What is the formula? And what effect does k have?
AIC: goodness of fit if the purpose is prediction i.e. the goal is to have a better quality and accurate forecast.
if k increases, AIC increases
What does BIC help to evaluate/when is it best used?
What is the formula? And what effect does k have compared to AIC?
BIC is preferred if simplest goodness of fit is the goal. It imposes a higher penalty for overfitting, if K increases, BIC increases k more than AIC.
BIC selects the simplest model that best explains the data, with a stronger penalty for complexity as your dataset grows.
When do you use AIC v BIC?
Use AIC when:
You care more about predictive accuracy.
You’re okay with a slightly more complex model if it improves fit.
Use BIC when:
You want a parsimonious model (simpler is better).
You have a large dataset and want to avoid overfitting.
Imagine you’re choosing a team for a project:
AIC says: “Add people if they help—even a little.”
BIC says: “Only add someone if they really help, especially if the team is already big.”
What is the purpose of the F-statistic in nested/joint models?
To determine if the simpler (nested) model is significantly different from the more complex (full) model.
How do you calculate the F-statistic for nested models?
Its basically, new - old/old with p involved