What is the definition of multiple regression
Used to determine the effect of two or more independent variables on a single dependant variable.
What is the definition of a partial slope coefficient
Partial slope coefficient measures the change in the dependant variable for a one unit change in a specific independent variable.
What is the coefficient of determination
R squared
What is the definition of r squared
The percentage of the total variation in the dependant variable which is explained by the independent variable.
What is the difference conceptually between r squared and adjusted r squared
R squared does not take into account the number of different variables that are added to the model, it means that r squared goes up as variables are added, whereas adjusted r squared is factored down or up based on the number of variables, it means that mindlessly adding variables means that the adjusted r squared actually falls.
What are the underlying assumptions of multiple regression
Assumptions are:
Linearity
Non random independent variables independent variables are not random
Expected error is zero EV of the error when conditioned on the independent variables is zero
Variance of the error term is constant for all observations (homoskedasticity)
No serial correlation error terms are not correlated
Normality error terms are normally distributed
What are some of the things that we can use multiple regressions to do
Identify relationships between things
Forecast variables
Test existing theories.
What is the definition of the error term
It is the difference between the observed value and the predicted value from the error term
What is the p value used for
To evaluate the null hypothesis that the slope coefficient is equal to zero
How do you enterpret the p value when it is greater or less than your significance level, what does this imply
When he p value is less than your significance level, you can reject the null hypothesis. Meaning that the slope coefficient is NOT equal to zero
When the p value is more than your significance level you can accept the null hypothesis meaning that the slope coffeifient could be equal to zero.
Can you interpret what would happen to the y variable of the following equation: if we were to increase x1 by 1 unit holding x2 constant
Y = 1 + 2.5X1 + 6X2
when you do this, you would expect y to increase by 2.5 units assuming that oyu held x2 constant
What is the purpose of a residual plot
A residual plot allows you to get a preliminary indication of assumption violations before performing statistical tests.
What are the key things that you are looking for regarding a residual plot
Linearity
Honoskedasticity
Normal distribution
What is the Q-Q plot of residuals and how do you interpret it
The Q-Q plot of residuals is a tool where you compare the residuals to a normal distribution.
If the residuals are normally distributed then the points should fall along the diagonal line of the Q-Q plot
If there is a smile or a frown on the q-q plot, then you can interpret it as not being normally distributed.
What percentage of the observations should fall beyond 1.65 standard deviations? Q-Q plot
In a normal distribution only 5% of the observations should fall beyond 1.65 standard deviations.
If there are lots of observations which are beyond 2 standard deviations then you have the mandate to be able to suggest that there are fat tails
What are the columns of an ANOVA table
Degrees of freedom
Sum of squares
Mean sum of squares
What are the rows of an anova table
Regression
Residual
Total
What is the regression sum of squares how many degrees of freedom are there
The regression sum of squares is the variation which can be expliained by the regression model there are K degrees of freedom where k is the number of independent variables
What is the sum squared error
The sum squared errors are the variation not explained by the model.
What is the sst
SST is the total sum of squares which is the total variation in the dependent variable relative to the mean.
Equals the RSS + SSE
What is the definition of R squared and how do you calculate it
R squared is defined as the variation that you can explain through the model, divided by the total deviation. Basically answers the question, given the total variation, how much of that movement were we able to predict with the model that we came up with.
What is the alternative equation for the R squared if you have the sum squared errors and the total sum squares
The alternative equation is 1 - SSE/SST
This is basically one minus the summ squared errors devided by the sum square total.
what is one of the main limitations of the r squared
It will alsways increase as you add more variables to the model. This is regardless of how relevant these variables are which means that you can end up with overfitting of the model if you are not careful.
§
What is the purpose of the adjusted R squared value
Adjusted r squared basically takes into account the impact of overfitting, and it will penalise variables that have been added that are unnecessary.