covariance (def) of two random variables
LOS 9.a
a statistical measure of the degree to which two variables move together.
covXY =
LOS 9.a
covXY = E(i=1 to n) (Xi - Xmean)(Yi - Ymean) / (n-1)
where:
Why is covariance not very meaningful but correlation coefficient is?
LOS 9.a
Covariance:
Correlation coefficient converts covariance into a standardized measure that is easier to interpret.
sample correlation coefficent, rXY = ?
LOS 9.a
rXY = covXY / sX sY
Interpret for rXY (sample correlation coefficient):
r = +1
0 < r < 1
r = 0
-1 < r < 0
r = -1
LOS 9.a
perfect positive linear correlation
positive linear relationship
no linear relationship
negative linear relationship
perfect negative linear correlation
Note that for r = 1 and r = -1 the data points lie exactly on a line, but the slope is not necessarily +1 or -1.
What are the limitiations to correlation analysis?
LOS 9.b
How does one test for significance of a correlation of the population, p (rho), of two variables (from the sample correlation results)?
LOS 9.c
Test whether the correlation between the population of the two variables is equal to zero using the following null and alternative hypotheses for two-tailed test with n-2 degrees of freedom (df):
H0:p = 0 versus Ha:p != 0
test statistic t = r * sqrt(n-2) / sqrt(1 - r2)
Then compare computed t with the critical t-value for the appropriate degrees of freedom and level of significance. For a two-tailed test, the decision rule is stated as:
Reject H0 if +tcritical < t or t < -tcritical
Distinguish between the dependent and independent variables of a linear regression
LOS 9.d
Describe the six assumptions underlying linear regression
LOS 9.e
except for #1, it’s all about the residuals!
For X (independent) and Y (dependent) variables:
Interpret the linear regression coefficients
LOS 9.e
For linear relationship:
Yi = b0 + b1Xi + ei, i=1…n
the regression line equation is:
^Yi= ^b0 + ^b1^Xi , i=1…n ( ^ equals “hat” or “estimated”)
Standard error or estimate (def.)
LOS 9.f
Standard error of estimate (SEE) is the standard deviation of the error terms in the regression.
also called:
standard error of the residual
standard error of the regression
SEE measures the degree of variability of the actual Y-values realtive to estimated Y-values from a regression equation.
The SEE gauges the “fit” of the regression line.
The smaller the the standard error, the better the fit.
Coefficient of Determination (def.) for simple linear regression
LOS 9.f
Coeffient of determination (R2) i sthe percentage of the toal variation in the dependent variable (Y) explained by the independent variable (X).
For simple linear regression (not for multi-variate regression),
R2 = r2, where
r = sample correlation coefficient
Regression coefficient (^b1) confidence interval (equation)
LOS 9.f
^b1 +/- (tc x s^b1), where
tc = critical two-tailed t-value for the selected confidence level for df = n-2
Test for significance about a population value of a regression coefficient (e.g. b1)
LOS 9.g
Use two-tailed t-test with df = n-2:
tb1 = (^b1 - b1) / s^b1, where
b1 = the hypothesized value.
H0: b1 = 0; Ha: != 0
reject H0 if t < -tc or tc < t, which means that b1 is significantly different from the hypothesized value
For a simple linear regression, how does one predict the value of the dependent variable (Y)?
LOS 9.h
^Y = ^b0 + ^b1Xp, where
^Y = predicted value of the dependent variable
Xp = forecasted value of the dependent variable
what is the confidence interval for a predicted value of a dependent variable (^Y)?
LOS 9.i
^Y +/- (tc x sf), where
tc = teo-tailed critical t-value at desired signifiance level with df = n-2
sf = standard error of the forecast (will likely be provided)
sf2 = SEE2(1 + 1/n + (X - Xbar)2 / (n-1)sx2), where
X = value of independent variable for which forecast was made
What does acronym ANOVA stand for and definition?
ANOVA = “analysis of variance”
ANOVA is a statistical procedure for analyzing the total variability of the dependent variable.
Write out the ANOVA table for simple linear regression
LOS 9.j
source of variation df Sum of Squares Mean Sum of Squares
Regression (explained) k =1 RSS MSR = RSS/k = RSS
Error (unexplained) (n-k-1) n = 2 SSE MSE = SSE / (n-2)
TOTAL (k+n-2) n - 1 SST = RSS + SSE
Calculate R2 for simple linear regression from an ANOVA table
LOS 9.j
R2 = (SST - SSE) / SST = RSS / SST
Calculate SEE for simple linear regression from an ANOVA table
LOS 9.j
SEE = sqrt(MSE) = sqrt(SSE / n-2))
Recall that SEE is the standard deviation of the regression error terms
Calculate the F-statistic of a simple linear regression
LOS 9.j
F = MSR / MSE = (RSS/k) / (SSE/(n-k-1)), where
MSR = mean regression sum of squares
MSE = mean squared error
NOTE: This is always a 1-tailed test!
What is the purpose of the F-test?
LOS 9.j
The F-test is used to tell whether at least one indpendent variable explains a significant portion of the dependent variable (i.e. does at least bi explain the variation of Y?):
F = explained variance / unexplained variance
= MSR / MSE = RSS/k / SSE/(n-k-1)
NOTE: the F-statistic tests all dependent variables as a group.
NOTE: for a simple linear regression, the F-test tells us the same thing as a T-test. In fact, for a simple linear regression F = tbi2
What are the limitiations of regession analysis?
LOS 9.k