error term (residual)
The portion of the dependent variable that can’t be explained by the independent variable
dependent variable
Y - the variable we’re seeking to explain
independent variable
X - the explanatory variable
Cross-sectional
many observations on X & Y for the same time period
Time series
many observations on Y (and sometimes X) from different time periods
Assumptions underlying linear regression
Assumptions 2/3 -> ensure that correct estimates of b0 and b1 are produced
Assumption 4/5/6 -> determine the correct distribution of b0 & b1 so we can test the values of the coefficients
Standard error of estimate
Measures the standard deviation of error term
SEE = (SSE/n-2)^0.5 or (MSE)^0.5
Coefficient of determination
Measures the faction of the total variation in the dependent variable that’s explained by the independent variable
R^2 = EXPLAINED VARIATION/TOTAL VARIATION
Confidence interval for a regression coefficient
An interval of values that is believed to incl. the true parameter value of b1 w/a given degree of confidence
b1 +/- (critical t value) * (standard error of estimate of b1)
Hypothesis testing
t = (^b1-b1)/(std error of estimated regression coefficient)
if |t test statistic| > |critical t value| -> reject H0 -> conclude statistical significance
Usually H0: regression coefficient = 0
P-value
Smallest level of significance at which H0 can be rejected (2-sided test)
If p-value < significance level -> reject H0 -> conclude statistical significance
ANOVA
ANalysis Of VAriance - determine the usefulness of the independent variables in explaining the variance in the dependent variable
SSE = sum of squared errors (unexplained) RSS = regression sum of squares (explained) -> total variation in Y that's explained by the regression equation
TSS = SSE + RSS
Limitations of regression analysis
Multiple regression
Difference from linear regression is that now you have more than 1 independent variable
e. g. Y = -23 + 0.3X1 - 0.225X2
0. 3 represents the expected effect on Y of a 1-unit increase X1 after removing the part of X1 that is correlated w/X2
If X1 and X2 are uncorrelated, then a regression w/just X1 would have the same coefficient