variance and standard deviation equations and relationship to each other
LOS 10.a
Variance: σX2 =E(i=1 to n) (Xi - Xmean) / (n-1)
standard deviation is the square root of variance:
σX = sqrt(σX2)
Fill in the terms to this ANOVA table:
Source df SS MS
Regression ? ? ?
Error ? ? ?
Total ? ?
LOS 10.i
Source df SS MS
Regression k RSS MSR
Error n-k-1 SSE MSE
Total n-1 SST
NOTE: MSR = RSS / k; MSE = SSE / (n-k-1); R2 = RSS / SST; SEE = sqrt(MSE) ≈ sforecast for large n
Construct equations for MSE, MSR, R2, F, and SEE to show their relationship with terms in the ANOVA table:
Source df SS MS
Regression k RSS MSR
Error n-k-1 SSE MSE
Total n-1 SST
LOS 10.i
mean squared error: MSE = SSE / (n-k-1)
mean regression sum of squares: MSR = RSS / k
coefficient of determinantion: R2 = RSS / SST
F-statistic: F = MSR / MSE
standard error of estimate: SEE = sqrt(MSE)
standard error of forecast (large n): sforecast ≈ SEE
Compute the residual ^e “e-hat” for observation “i” from the observation data and the multi-variate regression
LOS 10.a
^ei = Yi - ^Yi = Yi - (^b0 + ^b1X1i + ^b2X21 + … + ^bkXki)
T-statistic used for testing regression coefficients for statistical significance
LOS 10.c, LOS 10.d
t = (^bj - bj) / s^b,j
df = n - k - 1
where:
Interpret estimated regression coefficients
LOS 10.b
intercept term - value of dependent variable when all independent variables are zero.
(partial) slope coefficients - estimated change in the dependent variable for a one-unit change in that independent variable, holding all other independent variables constant.
Interpret the p-value of an estimated regression coefficient
LOS 10.b
The p-value is the smallest level of significancefor for which the null hypothesis can be rejected.
Comparing p-value to the significance level:
Example: if ^b1 = 0.40 and its p-value = 0.032, at 1% significance level:
heteroskedasticity
LOS 10.k
detecting heteroskedasticity
LOS 10.k
correcting heteroskedasticity
LOS 10.k
serial correlation
LOS 10.k
detecting serial correlation
LOS 10.k
interpreting Durbin-Watson values
LOS 10.k
correcting serial correlation
LOS 10k
preferred method: “Hansen Method”
multicolinearity
LOS 10.l
multicolinearity - two or more “X’s” are correlated to each other
detecting/correcting multicolinearity
LOS 10.l
Tell-tale signs from regression data:
correction: omit one or more X variables
summary of regression analysis problems
LOS 10.k,l
Conditional Heteroskedasticity
Serial Correlation
Multicolinearity
regression model misspecification
LOS 10.m
model specification: process of variable selection and transformation; determines/affects quality of regression
Types of Model Misspecification:
multiple regression model flow chart
LOS 10
