Multiple Regression Flashcards

(84 cards)

1
Q

What is the definition of multiple regression

A

Used to determine the effect of two or more independent variables on a single dependant variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the definition of a partial slope coefficient

A

Partial slope coefficient measures the change in the dependant variable for a one unit change in a specific independent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the coefficient of determination

A

R squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the definition of r squared

A

The percentage of the total variation in the dependant variable which is explained by the independent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference conceptually between r squared and adjusted r squared

A

R squared does not take into account the number of different variables that are added to the model, it means that r squared goes up as variables are added, whereas adjusted r squared is factored down or up based on the number of variables, it means that mindlessly adding variables means that the adjusted r squared actually falls.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the underlying assumptions of multiple regression

A

Assumptions are:
Linearity
Non random independent variables independent variables are not random
Expected error is zero EV of the error when conditioned on the independent variables is zero
Variance of the error term is constant for all observations (homoskedasticity)
No serial correlation error terms are not correlated
Normality error terms are normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some of the things that we can use multiple regressions to do

A

Identify relationships between things
Forecast variables
Test existing theories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the definition of the error term

A

It is the difference between the observed value and the predicted value from the error term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the p value used for

A

To evaluate the null hypothesis that the slope coefficient is equal to zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you enterpret the p value when it is greater or less than your significance level, what does this imply

A

When he p value is less than your significance level, you can reject the null hypothesis. Meaning that the slope coefficient is NOT equal to zero
When the p value is more than your significance level you can accept the null hypothesis meaning that the slope coffeifient could be equal to zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Can you interpret what would happen to the y variable of the following equation: if we were to increase x1 by 1 unit holding x2 constant

Y = 1 + 2.5X1 + 6X2

A

when you do this, you would expect y to increase by 2.5 units assuming that oyu held x2 constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the purpose of a residual plot

A

A residual plot allows you to get a preliminary indication of assumption violations before performing statistical tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the key things that you are looking for regarding a residual plot

A

Linearity
Honoskedasticity
Normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Q-Q plot of residuals and how do you interpret it

A

The Q-Q plot of residuals is a tool where you compare the residuals to a normal distribution.
If the residuals are normally distributed then the points should fall along the diagonal line of the Q-Q plot
If there is a smile or a frown on the q-q plot, then you can interpret it as not being normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What percentage of the observations should fall beyond 1.65 standard deviations? Q-Q plot

A

In a normal distribution only 5% of the observations should fall beyond 1.65 standard deviations.
If there are lots of observations which are beyond 2 standard deviations then you have the mandate to be able to suggest that there are fat tails

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the columns of an ANOVA table

A

Degrees of freedom
Sum of squares
Mean sum of squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the rows of an anova table

A

Regression
Residual
Total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the regression sum of squares how many degrees of freedom are there

A

The regression sum of squares is the variation which can be expliained by the regression model there are K degrees of freedom where k is the number of independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the sum squared error

A

The sum squared errors are the variation not explained by the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the sst

A

SST is the total sum of squares which is the total variation in the dependent variable relative to the mean.
Equals the RSS + SSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the definition of R squared and how do you calculate it

A

R squared is defined as the variation that you can explain through the model, divided by the total deviation. Basically answers the question, given the total variation, how much of that movement were we able to predict with the model that we came up with.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the alternative equation for the R squared if you have the sum squared errors and the total sum squares

A

The alternative equation is 1 - SSE/SST
This is basically one minus the summ squared errors devided by the sum square total.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what is one of the main limitations of the r squared

A

It will alsways increase as you add more variables to the model. This is regardless of how relevant these variables are which means that you can end up with overfitting of the model if you are not careful.
§

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the purpose of the adjusted R squared value

A

Adjusted r squared basically takes into account the impact of overfitting, and it will penalise variables that have been added that are unnecessary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is the adjusted R squared formula
The adjusted r squared formula equals 1 - (n-1)/ n - k - 1 All times by (1-R SQUARED)
26
What is the rule of the adjusted r squared
Adjusted r squared is always less than or equal to r squared
27
What is the definition of the standard error of estimates
The standard error of estimates measures the standard deviation of residuals, and it indicates how well the model captures the relationship. It is calculated as the square root of hte man squared error SEE = MSE^1/2
28
What is the definition of the mean squared error and what is the calculation
The MSE is equat to the sum squared errors divided by the number of degrees of freedom assosiated with the sum squared errors Therefore the mean squared errors is equal to the SSE / n - k - 1
29
What is the AKAIKE information criterion (AIC) When should it be used
The AKAIKE or AIC is used when the goal is to forecast The AIC formula is n * ln (SSE/n) + 2(k+1) Lower values are better for the AIC because they indicate that the model fits the outcomes better.
30
What is scwarz’s Bayesian information criterion When should you use it What is the formula
You should use it if your goal is to produce a better goodness of fit. The BIC formula is n * ln (SSE/n) + ln (n) * (k + 1) Both models will include a penalty for including more variable , however the BIC will impose a higher penalty for overfitting than the AIC will
31
Which test should you use to evaluate the significance of individual coefficients
T test
32
What test should you use to evaluate more than one hypothesis
An f test
33
Why woudl you not just use F tests to evaluate if a nested model is more effective than a full model?
Because you would have to use individual t tests on each new variable, however when using the f test the independent variables are correlated with one another so it allows you to see if the variables are valuable collectively.
34
What does formulating a hypotheses on the significance of two or more coeffieeints actually mean as it relates to joint hypothesis tests.
A joint hypothesis test is used to determine if a group of variables collectively contributes to the explanatory power of the model. Basically you are testing wether a model where you have more variables added is better or worse than a model that has less variables
35
How do you formulate the hypothesis when you are testing if adding coefficients is a good idea
In a joint test, the null hypothesis is that all the slope coefficients are equal to zero. The alternative hypnotists is that at least one of the slope coefficients is not equal to zero.
36
How do you perform the test to check if having more or less variables is good for a regression model
To perform the test you compare the two versions of the model The restricted model where you have removed the variables that you are interested in testing And the unrestricted model, which is a simpler version which exclude the independent variables.
37
What is the equation for the F statistic for joint tests and what is the logic behind it
The f statistic for joint tests equals (SEE restricted - SSE unrestricted)/Q All divided by SSE unrestricted / (n - k - 1) Basically how much more of the error are you able to explain by adding in the additional variables.
38
What is the definition of Q in the F test to see if the model is better or worse with more variables
The definition of q is the number of extra variables that you are testing, so the difference between the number of variables in the unrestricted and the restricted model.
39
How do you test the overall significance of the entire model. What is the F statistic in this case
F = MSR / MSE MSR equals the mean squared residual MSE is equal to the mean squared error
40
What ithe descision rule for the test to see if the extra variables are relevant or not
It’s a one tailed testy because you are testing if all of the coefficients are equal to zero at the same time, which means that it’s a one tailed test. You have to reject the null hypothesis if the calculated f stat is greater than the critical f value If you reject the original hypothesis then you know that the excluded variables provide a lot of explanaritory power, however if you don’t then you know that those variables were basically useless
41
Would you include variables when trying to estimate the value using a model when some of the variables are not statistically significant Where do you have to be careful in the exam
Yes you have to include all of the variables, because if you were to just not include some of the variables in a model that they had given you, then you would end up mis calcuating the value becuase the variables interact with one another and if oyu were to just take one out then you woudl have to likely recalculate the whole thing. Uinits of measurement. Have to be careful that oyu put the correct units of measurement into the calculation.
42
What is the definition of model specification
Model specification is when you select the explanitory variables to include in the regression model
43
What are the principles of proper model specification
ECONOMIC RATIONAL Parsimony - the model should be simple and efficient Appropriate functional form - the relationship between the variables should be identified No assumption violations - the model must not violate the core regression assumptions Out of sample performance - the model should demonstrate a strong predictive power when applied to data not used in initial estimation
44
What are tome of the main failures with model specification
Omitting variables Inappropriate variable form Inappropriate scaling Incorrect pooling of data Time series misspecification
45
What are the effects of omitting important variables
If the variable is correlated with other independent variables the estimated coefficients will be BIASED If the omitted variable is uncorrelated with the other regressions the slope might be correct but the intercept incorrect
46
What is the effect of time series mis specifications
Lagged dependent variables basically whwen you include a variable from the wrong time series Including variables that are measured with significant error.
47
Whta does incorrectly pooling data mean
When you regress a relationship over a full time period, however the relationship between the variables actually changes over sub periods, this might mean that oyu have to split the data out into different periods of time like 3 year windows.
48
What is the definition of hetrosekedaticity
When the variance of the error term is not constant across all observations
49
What are the two types of hetroskedasticity
Unconditional and conditional hetroskedasticity
50
What is the definition of unconditional heteroskedasticity
Occurs when the error variance is not related to the values of the independent variables. Does NOT create major issues for statistical inference
51
What is the definition of conditional heteroskedasticity
Conditional heteroskedasticity is the type where the variance of the error term is correlated with the values of the independent variable, this is a big problem in statistic
52
What are the effects of conditional heteroskedasticity
Conditional heteroskedasticity means that you have biased standard errors, and they are usually underestimated You also likely have inflated t statistics And unreliable f tests You are also likely to run into type one errors where you incorrectly reject the null hypothesis.
53
What are some of the methods that you could use to detect errors
You could use the visual inspection, by plotting the error terms versus the predicted values You use use the BP breusch pagan test which si the formal statistical test
54
What is the breushc pagan test and what is it used for
the breusch pagan test involves regressing the squared residuals from the original model on the independent variables. T stat is calclulated as n * r squared The statistic follows a chi squared distribution and it has k degrees of freedom
55
What er the ways that you can correct for heteroskedasticity
Through robust standard errors. This is also known as the white corrected or Hansen method, where oyu adjust he standard errors upward to account for the heteroskedasticity which leads to more accurate t stats What is the generalised least squares method This is when you modify the original regression equation to eliminate the heteroskedasticity
56
What is the definition of autocorrelation and what is it also called When is this most commonly encountered
It’s also called serial correlation Serial correlation is when the error terms are correlated with one another Therefore the error in one period is related to the error in another period Most common in time series data
57
What are the two types of serial correlation
Positive serial correlation - when the positive error for one period increases the likelyhood of another positive error for another period If the stock price is up today they are likely to be up tomorrow Negative serial correlation - when a positive error for one observation increases the likelyhood of a negative error for another observation.
58
What are the main errors with serial correlation
Coefficient consistency. The serial correlation actually has no impact whatsoever on the regression coefficients. Biased standard errors. Standard errors for the regression coefficients are usually underestimated High t stats - because the standard errors that you estimate are too small, this means that the calculated t statistic can become artificially large, F stat also inflated
59
What ae rate detection methods to check for correlation of the error terms or serial correlations
Visual inspection The Durban Watson test. The test statistic of the Durban Watson test is quell to 2(1-r) where r is the sample correlation between the residuals. The Breusch Godfrey test. More robust method that can detect serial corelation beyond the first lag
60
What is the interpretation of the Durban Watson test What is the constraint of hte Durban Watson test
When you have a value of 2 it means that there was no serial correlation However values between 0 and 2 indicate a positive correlation. Values between 2 and 4 indicate a negative correlation. The Durban Watson test does not work on autoregressive models
61
How do you correct for autocorrelation
HANSEN METHOD - Newley west estimator - this calcuales the robust standard errors that adjust for homoskedasticity and heteroskedasticity symletaniously Modify the regression - one common fix is to add a lagged variable to account for the correlation Other methods. Such as using instrumental variables or panel data methods
62
What is multicolinearity
It is a violation of the assumption that there is not a linear relationship between one or more of the variables. Basically makes sure that two or more of hte independent variables are not correlated with one another
63
What are the impacts on statistical inferrence of multicolinearity
You have inflated standard errors - because the standard errors get larger Thou have depressed t stats - because the standard errors is inflated the t stats for each coefficient becomes small You have unreliable coefficients - the estimates of the regression coeffieincents become imprecise and unreliable And oyu have inconsistent power - small changes in the data can cause the estimated coefficients to change significantly.
64
What are the detection models used for multicolinearity
The classic symptom - model has a high r squared and a significant f stat, but hte individual t tests are not significant This basically means that the model on the whole is good at estimating the dependant variable but you cannot see which of the variables is actiualyl doing most of the estimation. Pairwise correlations - high pairwise correlations current the baroness are a good sign. However oyu can still have multicolinearity even if the pairwise correlations are low Varience inflation factor This is the most formal quantitave measure. VIF = 1 means taht there is no correlation between the variable and the other regressors VIF > 5 need further investigation VIF >10 you have serious multicolinearity
65
What are the main ways that you correct for multicollinearity
Most effective is to Omit variables - drop one or more of the highly correlated variables Use a different proxy - replace one of the variables that is correlated with another variable that exaplins the same economic reality it is not correlated with the other variable Increase the sample size - increasing the number of observations can help you model more accurately.
66
What effect does heteroskedasticity have on the overall r squared of the model or the overall f test
It actually doesn’t have any effect at all on the overall r squared score and also doesn’t have any impact on the overall f test.
67
What is the VIF equation
VIF = 1/(1-r^2)
68
Whta is influence analysis
Influence analysis is when you identify the specific things that have the biggest impact on the regression model
69
What are the three extreme data point types
Outliers - extreme variations of Y High leverage pints - extreme vatiables of X (the independant variables) Influential data points - extreme observations that when excluded cause significant change to the model coefficients.
70
What is the leverage method of detection as it relates to influence analysis.
Leverage Leverage measure the distance between an observation of an independent variable and the sample mean. Leverage can range between 0 and 1 A observation is influential if leverage is greater than three times the average leverage. The thereshold formula is as follows : 3(k+1)/n where k is the number of independent variables and n is the number of observations.
71
What is a studentized residual
A studentized residual specifically identifies outliers in the independent variable. You re estimate the model one variable at a time, then you compare the predicted value after you have deleted that variable to the predicted y value Then the difference between these two values is divided by the standard deviation. Analysis compare the absolute value of the studentized residual to a critical value o the t distribution with n - k - 2 degrees of freedom. If the RESIDUAL IS BIGGER THAN THE CRITICAL VALUE THEN THE POINT IS AN OUTLIER
72
What is cooks D and what is it used for
Zooms do is a composite that identifies the influential observations by considering the x and y values. Theresholds can vary but sometimes cooks d is influential if it exceeds a specific value (k/n) ^1/2
73
What are some methods you can use once oyu have identified the influential data points
Windsorisation Harmonisation Input error Omitted variables
74
What are dummy variables
They are variables that take a value of 0 or 1 depending on whether they are true or false.
75
What is the n-1 rule in dummy variables
In order to distinguish between n catagories you need to use n-1 dummy variables. If you don’t do n -1 then you violate the assumption that there is no exact linear relationship between variables.
76
How do you interpret the coefficients of dummy variables
The intercept is the average value of the dependent variable for the omitted category (the control) The dummy coefficient indicates the estimated difference in the dependant variable for that category relative to the average valeu of the reference category
77
What are the types of dummy variables
Intercept dummy’s These shift the intercept of the regression line up or down. The slope remains the same Slope dummies These change the slope fo the regression line for a specific catagory. The interaction term for the slope dummy is found by multiplying the dummy variable by a continuous independent variable. It captures how the relationship between x and y changes best on a given factor
78
How are the dummy variables used to predict values
To predict a value based on the dummy variable model, you just type 1 or 0 into the variable that is true. If the catagory is the reference group you SET ALL VARIABLES TO 0 the predicted value is based only on the intercept. If the catagory is the dummy Farouk oyu set that variable to 1 and the rest to 0 it gives you an estimate of that point.
79
Why would you use a logistic regression
Logicsti regression is when the dependent variable is qualitative this means for example that the value falls between zero and one. The logistic regression tronsofms a probability value into log odds
80
What is the logit transformation
It’s the natural log of the odds ln(p/1-p)
81
What are the assumptions and estimations of the logit regression model
Assumes that the residuals follow a logistic distribution. This is similar to a normal distribution but has fatter tails. Unlike the OLS which minimises the sum squared errors Lego’s coefficients are estimated using mle (maximum likelyhood estimation) MLE seeks values taht maximise the likelyhood of observing actual data.
82
How do you interpret the coefficients of the logit regression model
Because the model is non linear, the intercept represents log odds, when all independant variables are equal to zero The slope coefficient because the function is curved, the change in probability for a one unit change in independent variable is NON CONSTANT In order to estimate the coefficients, you should use the average value for each of the independent variables, and find out what hte valeu is. Then you increase one of the variables by one, and then look at what the independent variable is. The difference bwetween these two values is the impact of that variable.
83
How do you calculate the predicted probability using a logit model
1 you calculate y buy plugging the assumed x values into the equation 2 you calculate the odds. = e^y 3 you calculated the probability = p = odds / 1 + odds
84
How do you calculate the model fit for a logit model
You cannot use the r squared You use the likelyhood ratio test LR test This test is also used in nested model, and it follows a chi squared distribution which q degrees of freedom. You can also use the log likelyhood test, which is always negative. The higher values mean that there’s a better fit. Or you can use the pseudo r square which has values that are reported by software that compare competing models for the same variable.