Reading 1: Multiple Regression Flashcards

(70 cards)

1
Q

What does a t stat test? When would and wouldn’t a t stat be significant?

A

The t-statistic tests the hypothesis that a particular regression coefficient (β) is different from zero (i.e., that the variable has an effect).
* A large absolute t-value (e.g., > 2 or < -2) suggests the coefficient is significantly different from zero.
* A small t-value means the variable might not be contributing much to the model.

Signal-to-noise ratio: how strong the effect of the variable is relative to uncertainty

Note: t = estimate/standard error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does a p value tell you? What does a high or low p value show and when would you reject the null?

A

The p-value tells you the probability of observing a t-statistic as extreme as the one calculated, assuming the null hypothesis is true (i.e., the coefficient is zero). Is the effect of the variable real or random noise?
Interpretation:
* Low p-value (< 0.05) → Reject the null hypothesis → The variable is statistically significant.
* High p-value (> 0.05) → Fail to reject the null → The variable is not statistically significant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the assumptions of multiple regression?

A

*Linear relationship between x and y
*No exact linear relationship among x’s (violation = multicollinearity)
*Expected value of error term = 0
*Variance of error term is constant (violation = heteroskedasticity)
*Errors not serially correlated
*Errors normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a normal Q=Q plot?

A

Compares a variables distribution to a normal distribution. Helpful for exploring whether residuals are normally distributed (key assumption)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the columns included in an ANOVA table?

What do they measure/show?

A
  • source of variation
  • degrees of freedom: The number of independent values that can vary in the calculation.
  • sum of squares: Measures the total variation in the data.
  • mean square: The average variation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an ANOVA table used to calculate?

A

F-test and R^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the degrees of freedom for Regression, error and total in the ANOVA table?

A
  • regression = k
  • error = n-k-1
  • total = n-1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why do I lose k + 1 degrees of freedom in multiple regression?

A

In a multiple regression model with kkk predictors and an intercept, you estimate k+1k + 1k+1 parameters.
Each estimated parameter uses up one degree of freedom.
So, from your total sample size nnn, you lose k+1 degrees of freedom.

Total degrees of freedom: nnn (number of observations)
Used for estimating parameters: k predictors + 1 intercept = k+1
Remaining degrees of freedom (for residuals):
n−k−1
→ These are used to assess the fit of the model, such as calculating the standard error of the regression and testing hypotheses.

So, if you have k predictors, you lose k + 1 data points’ worth of freedom.
It’s like having k + 1 fewer shops to test your idea — those were spent fitting the model.
The remaining n − k − 1 degrees of freedom are used to measure how well the model fits the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you calculate Regression Sum of Squares and what is its significance and usage?

What kind of variation does it reflect? What does high v low SSR mean?

A

= explained variation

Significance: SSR measures the variation explained by the regression model. It indicates how much of the total variation is accounted for by the model’s predictions.

**Usage: ** A higher SSR suggests that the model is effective in explaining the variability in the data. It is used to assess the model’s explanatory power.

Y estimated v.s. Y mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you calculate Error Sum of Squares and what is its significance and usage?

A

= unexplained variation

Significance: SSE measures the variation that is not explained by the regression model. It represents the residual or unexplained variability in the data.
Usage: A lower SSE indicates that the model’s predictions are closer to the actual data points, suggesting a better fit. It is used to evaluate the model’s accuracy.

Y estimated v.s. Y mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you calculate Total Sum of Squares and what is its significance and usage?

A

= explained variation + unexplained variation
= SST = RSS + SSE

**Significance: **SST represents the total variation in the observed data. It serves as a baseline measure of how much the actual data points deviate from the overall mean (sum of squared differences).

**Usage: **SST is used to quantify the total variability in the dataset before any model is applied.

Y actual v.s. Y mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is the Mean Sqaure calculated?

A

Sum of squares divided by degrees of freedom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the formula to calculate R2 directly from the ANOVA table and what does it show?

A

R2 measures the % of total variation in the Y variable (dependent) explained by the X variable (independent)

= explained/total variation
or
= total variation - unexplained variation/total variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does an R2 of 0.25 mean?

A

X explains 25% of the variation in Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does adjusted R2 do and how is it calculated? Why and whats the formula?

A

Adjusted R2 applies a penalty factor to reflect the quality of added variables.
Too many expanatory x variables run the risk of trying to overexplain the data (explains randomness not true patterns) = poor forecasting.

formula: 1- (total df/unexplained df x 1-r2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain the adjusted r2 in words?

A

“Let’s take the unexplained variance and scale it based on how many predictors (k) you used and how much data you had. If you added predictors that don’t help, we’ll penalize you.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What makes adjusted R2 more reliable/refined? i.e. explain how the penalty works

A

Why this makes it refined

If you add a predictor that doesn’t help, R2, R2 barely increases, but k increases.
That makes the denominator n−k−1 smaller → the whole fraction gets bigger → Adjusted R2 drops.
So the formula is saying:
“You added complexity, but didn’t improve the model enough to justify it.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does a small or large right hand side of the adjusted r2 formula show about a model?

A

Small right-hand side → model explains a lot → Adjusted R2 is high.
Large right-hand side → model explains little or is overfitted → Adjusted R2R^2R2 is low.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Is higher or lower better for the following:
* R2
* AIC
* BIC

A
  • r2 = higher
  • AIC and BIC = lower
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does AIC help to evaluate/when is it best used?

What is the formula? And what effect does k have?

A

AIC: goodness of fit if the purpose is prediction i.e. the goal is to have a better quality and accurate forecast.

if k increases, AIC increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does BIC help to evaluate/when is it best used?

What is the formula? And what effect does k have compared to AIC?

A

BIC is preferred if simplest goodness of fit is the goal. It imposes a higher penalty for overfitting, if K increases, BIC increases k more than AIC.
BIC selects the simplest model that best explains the data, with a stronger penalty for complexity as your dataset grows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When do you use AIC v BIC?

A

Use AIC when:

You care more about predictive accuracy.
You’re okay with a slightly more complex model if it improves fit.

Use BIC when:

You want a parsimonious model (simpler is better).
You have a large dataset and want to avoid overfitting.

Imagine you’re choosing a team for a project:

AIC says: “Add people if they help—even a little.”
BIC says: “Only add someone if they really help, especially if the team is already big.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the purpose of the F-statistic in nested/joint models?

A

To determine if the simpler (nested) model is significantly different from the more complex (full) model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How do you calculate the F-statistic for nested models?

A

Its basically, new - old/old with p involved

MSE = SSEunrestricted/n-k-1 p = number of excluded variables in restricted model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What are the degrees of freedom for the F statistic nested model?
numerator df = q = number of excluded variables in the restricted model denominator = n-k-1 k = number of independent variables in full model
26
What are the hypotheses for the F-statistic in nested/joint models?
Null Hypothesis (H₀): The coefficients of the removed predictors are zero. i.e. useless Alternative Hypothesis (H₁): At least one of the removed predictors has a non-zero coefficient. i.e. not useless and at least one variable is pulling their weight in explaining the var in y
27
What is the conclusion if F statistic > critical value?
Reject null, test is statistically significant. Full model provides a significantly better fit than the nested model. The relative decrease in SSE due to the inclusion of q additional variables is statistically justified i.e. improve the model.
28
What is the purpose of the F-statistic in assessing overall model fit?
To compare the fit of the regression model to a model with no predictors i.e. no slope coefficients.
29
How do you calculate the F-statistic for overall model fit?
mean of predicted variation/mean of unpredicted variation MSE = SSE/n-k-1 | MSR = SSR/K
30
What are the hypotheses for the F-statistic in overall model fit?
Null Hypothesis (H₀): The model with no predictors fits the data as well as the regression model. i.e. the slope co-efficients on all x variables in the unrestricted model = 0 Alternative Hypothesis (H₁): The regression model provides a better fit than the model with no predictors.
31
What does a significant F-statistic indicate in overall model fit?
It indicates that the regression model explains a substantial portion of the variance in the response variable.
32
What are 4 model misspecifications?
1. omitting a variable that should be included 2. variable transformation i.e. for linearity 3. inappropriate scaling of the variable 4. incorrectly pooling data
33
Why might a variable need to be transformed for linearity? What assumptions may be violated?
A variable might need to be transformed to ensure the relationship between the predictor and response variable is linear. e.g. converting market cap to the log of market cap, logs makes it linear Violations: heteroskedasticity in the residuals Explanation: Transforming variables (e.g., using logarithms or square roots) can help linearize relationships, making the model more accurate and easier to interpret. Non-linear relationships can lead to poor model fit and misleading results.
34
What is the consequence of omitting a variable that should be included in a model? What assumptions may be violated?
Omitting a variable can lead to model misspecification, resulting in biased and inconsistent estimates. Potential violations: serial correlation or heteroskedasticity in the residuals Explanation: When a relevant variable is omitted, the model fails to account for its effect, which can distort the relationships between the included variables and the response variable. This can lead to incorrect conclusions and predictions.
35
What is the impact of inappropriate scaling of a variable? What assumptions may be violated?
Inappropriate scaling can affect the model's accuracy and interpretability. e.g. using number of free float shares rather than proprtion Potential violations: heteroskedasticity/multicollinearity Explanation: Variables should be scaled appropriately to ensure they contribute correctly to the model. Incorrect scaling can lead to disproportionate influence of certain variables, skewing the results and making the model less reliable.
36
What does incorrectly pooling data mean? What assumptions may be violated?
Incorrectly pooling data refers to combining data from different regimes or contexts without accounting for their differences. e.g. difference beween pre and post covid/GFC Potential violations: serial correlation or heteroskedasticity in the residuals Explanation: Pooling data from different regimes can lead to misleading results, as the underlying relationships may differ across contexts. It's important to account for these differences to ensure the model accurately reflects the data.
37
what is heteroskedasticity? how many types are there?
Heteroskedasticity occurs when the variance of the errors in a regression model is not constant There are two types: conditional and unconditional. Conditional is problematic as relates to independent variables
38
What is the effect of heteroskedasticity on regression output? * t and f stats * slope co-efficients + the effect on financial data
T and F stats (hypothesis tests and confidence intervals) become unreliable. Slope co-efficient estimates are not affected, however, the standard error becomes unreliable. For financial data, most likely the standard errors are understated and t stats inflated (too high causing type 1 errors) Explanation: When heteroskedasticity is present, the ordinary least squares (OLS) estimates remain unbiased, but they are no longer efficient.
39
What does heteroskedasticity look like on a graph?
40
How do we detect heteroskedasticity simply?
* Scatter diagram: plot residual against each independent variable and against time. e.g. as variable gets larger, error term gets larger, should be randomly distributed around x variable. Breusch Pagan test: regress squared residuals on X variables.
41
# as Detecting heteroskedasticity: What is the Breusch Pagan test statistic?
42
Detecting heteroskedasticity: What is the H0 and HA in a BP test and when do you reject the null?
Null hypothesis (H₀): The variance of the residuals is constant (homoskedasticity). Alternative (H₁): The variance depends on X (heteroskedasticity). if BP > critical value reject the null and conclude you have a problem. If the test statistic exceeds the critical value from the chi-squared distribution (or if the p-value is low), you reject H₀. This means your model likely has heteroskedasticity, which can affect inference.
43
What is serial correlation?
Explanation: In a regression model, serial correlation means that the errors from one observation are related to the errors from another observation, violating the assumption of independence. Answer: Serial correlation, also known as autocorrelation, occurs when the residuals (errors) in a regression model are correlated across observations. e.g. the residual in the current period is positive and the probability of the residiual in the next period being positive is greater than 50%
44
What is the effect of serial correlation on regression output? - t stat and f stat - slope co - standard errors
T and F stats (hypothesis tests and confidence intervals) become unreliable. inefficient estimates and biased standard errors. Slope co-efficient estimates are not affected, however, the standard error becomes unreliable. Explanation: When serial correlation is present, the ordinary least squares (OLS) estimates remain unbiased, but they are no longer efficient. This means that the standard errors of the coefficients are incorrect, leading to unreliable hypothesis tests and confidence intervals.
45
Explain the differecne and effects of positive/negative serial correlation
Positive serial correlation: Standard error is too low. (t stat too high) Negative serial correlation: standard error is too high (t stat too low) Positive serial correlation means errors tend to move in the same direction across observations (e.g., time periods). This makes the model seem more stable than it is → standard errors shrink → t-stats grow → you might wrongly think predictors are significant. Negative serial correlation means errors tend to alternate direction. This exaggerates variability → standard errors grow → t-stats shrink → you might miss real effects.
46
How do we detect serial correlation?
Answer: Serial correlation can be detected using graphical methods (e.g., residual plots/scatter) and statistical tests (e.g., Durbin-Watson test, Breusch-Godfrey test). Durbin-Watson: tests one lag Breusch-Godfrey: tests several lags, uses residuals as the y variable. residuals are run against initial regressors plus lagged residuals * F distribution * p (numerator)and n-p-k-1 (denominator) dof
47
What are the steps of the Breusch-Godfrey test for serial correlation?
* Run your original regression and save the residuals. * Create an auxiliary/second regression: regress residuals on original X variables plus lagged residuals (e.g., εt−1,εt−2,…) * Compute the test statistic: BG=n×R2 from the auxiliary or 2nd regression. * Compare to chi-squared distribution with degrees of freedom equal to the number of lags. Interpret: If BG > critical value → reject H₀ → serial correlation is present. If not → residuals are likely independent.
48
What are the hypotheses in the Breusch-Godfrey test? How is the Breusch-Godfrey test statistic calculated and how do you interpret the results?
H₀: No serial correlation (errors are independent) H₁: Serial correlation exists (errors are autocorrelated) BG=n×R2 from the auxiliary or 2nd regression. * Compare to chi-squared distribution with degrees of freedom equal to the number of lags. Interpret: If BG > critical value → reject H₀ → serial correlation is present. If not → residuals are likely independent.
49
How do you correct for serial corr/heteroskedasticity?
* use robust standard errors Newey West corrected standard errors for serial correlation White-corrected standard errors for conditional heteroskedasticity
50
What is multicollinearity?
Answer: Multicollinearity occurs when two or more predictor (x) variables in a regression model are highly correlated, making it difficult to isolate their individual effects on the response variable. e.g. if 4 friends are pushing a car when it breaks down, who is doing most of the work Explanation: When predictors are highly correlated, it becomes challenging to determine the unique contribution of each predictor to the response variable, leading to issues in the regression analysis.
51
What is the effect of multicollinearity on regression output? * standard errors * coefficicent estimates * T stats * Type 1/2 errors
Answer: * Multicollinearity can lead to inflated standard errors, * unreliable coefficient estimates, * reduces T stats, * increases chance of Type II errors (X variables seem less valuable because they are sharing credit with other variables) where t stats are artificially small, variables look falesly unimportant. Explanation: High correlation among predictors can cause instability in the coefficient estimates, making them sensitive to small changes in the model. This results in large standard errors and unreliable hypothesis tests.
52
How do we detect multicollinearity, what are the tell tale signs?
* Significant F stat (low p value f stats < 0.05 = significant) (and high r2), but all t stats/p values insignificant (p value > 0.05 = insiginificant) - model is doing a good job in explaining variation in y BUT doesn't know which variable is doing the work based on the regression therefore high F, low p val/t stat * high correlation between x variables (k=2 case only) * High Variance Inflation Factor (VIF). VIF = 1 = no correlation, therefore no evidence of mutlicollinearity VIF > 5 = further investigation VIF > 10 = SERIOUS multicollinerarity needs correction. VIF = 1/(1-r2)
53
How do you interpret VIF?
* High Variance Inflation Factor (VIF). VIF = 1 = no correlation, therefore no evidence of mutlicollinearity VIF > 5 = further investigation VIF > 10 = SERIOUS multicollinerarity needs correction. VIF = 1/(1-r2)
54
How do we correct for multicollinearity?
* Remove one or more regression variables * use different proxy for one of the variables e.g. liquidity can be bid ask instead of free float * increase the sample size, more statistically robust
55
What two types of observations can influence regression results?
high leverage point - obs with extreme independent/ x var outlier - obs with extreme dependent/ y var
56
What is leverage? When is an observation considered influential?
Standardised measure of distance of observation j from the mean and takes on a value between 0 and 1. 3 x (k+1/n) --> if leverage is greater than this, the observation is potentially influential k is the number of independent variables
57
What are studentised residuals? How do they work?
Measure for identifying an outlier. Delete observation j, estimate reression model using n-1 observations. Estimate y hat and ej then calculate studentised ej for each observation in dataset. critical value acts as a ceiling, if the absolute value of studentised residual is greater than the t value REJECT. (doesnt matter if positive or negative - two tail t test. rejected = outlier degrees of freedom for critical value n-k-2 (because we deleted an obs in the beginning)
58
What are Dummy Variables?
Purpose: They allow categorical variables (like gender, region, or type) to be included in regression models. Representation: Each category is represented by a binary variable (0 or 1).
59
How many dummy variabales are used and why?
n-1 to avoid multicollinerarity
60
What is an interecpt dummy, how does it work?
Purpose: Adjust the intercept of the regression model for different categories of a categorical variable. D either equals 0 or 1. If 0, whole term = 0, if 1 whole term = b1 How It Works: Each dummy variable shifts the intercept of the regression line for its respective category. The coefficients of intercept dummy variables represent the difference in the intercept for each category compared to the reference group.
61
What is a slope dummy? How does it work?
Purpose: Adjust the slope of the regression model for different categories of a categorical variable. DX captures the change in the slope on account of the dummy variable. How It Works: Each dummy variable interacts with a continuous predictor to change the slope of the regression line for its respective category. The coefficients of slope dummy variables represent the difference in the slope for each category compared to the reference group.
62
What is a logistic regression model?
A logistic regression model is a statistical method used to model the relationship between a binary dependent variable and one or more independent variables. i.e. failure, success or increase, decrease They estimate the probaility (log odds) of an event based on the logistic distribution.
63
What is the formula to convert the probability of an event to odds?
64
how do you calculate the probability once you have the estimated y variable?
65
How should you interpret the slope-coefficients for logit models?
The coefficients (beta) represent the change in the log-odds of the outcome for a one-unit increase in the x variable.
66
interpret this model predicting the probability of passing an exam based on study hours and attendance:
Intercept (b_0 = -2): The log-odds of passing the exam when study hours and attendance are zero. Study Hours (b_1 = 0.05): For each additional hour of study, the log-odds of passing the exam increase by 0.05. Odds Ratio: e^{0.05} \approx 1.051. Each additional hour of study multiplies the odds of passing by approximately 1.051. Attendance (b_2 = 0.3): For each additional unit of attendance, the log-odds of passing the exam increase by 0.3. Odds Ratio: e^{0.3} \approx 1.35. Each additional unit of attendance multiplies the odds of passing by approximately 1.35. Positive Coefficient: Indicates an increase in the log-odds (and thus the odds) of the outcome. Negative Coefficient: Indicates a decrease in the log-odds (and thus the odds) of the outcome.
67
What is pseudo r2 used for?
to evaluate competing models with the same dependent variable. higher value = better fit
68
how do you work out the probability given the co-efficient of the intercept
e to the co-efficient will convert the log of odds to odds. p/(1+p) will convert to probability
69
for logit regression, when do you reject the null?
when the p value is smaller than the critical value/alpha. means it is statistically significant
70
What is a likelihood ratio? How does it work?
A likelihood ratio is a statistical measure used to compare the goodness of fit between two models. In the context of regression, it helps determine whether a more complex model significantly improves the fit of the data compared to a simpler model. Chi square distribution with q dof. q = omitted variables in the restricted model 1 tail test reject null if chi square > critical value. means omitted values are useless, do not add to explanatory power (= far from 0) LR = -2 (log likelihood restricted model - log liklihood unrestricted model) log likelihood metric = negative higher values = better fitting model