Yi
Dependent and independent variables // Graph function
Yi = b0 + b1Xi + ei
Scatter Plots types
Correlation coefficient (p or r) (Formula)
Correlation standardizes covariance by dividing it by the product of the standard deviations
Perfect postive correlation: +1
Perfect negative correlation: -1
No correlation: 0
Covariance (Formula)
A statistical measure of the degree to which two variables move together
(Sample) Standard Deviation Formula
Sx = [E (xi - xmean)2 / n-1] 1/2
Easier with calculator!!
Using calculator for Data Series to get Sx, Sy, r
Does not calculate Covariance!
BUT
Cov = rxy * Sx *Sy
Limitations of correlation analysis
Assumptions underlying simple linear regression
Standard error of the estimate (SEE)
Standard error of the distribution of the errors about the regression line
The smaller the SEE, the better the fit of the estimated regression line. Tigther the points to the line
k = # of independent variables (single regression: 1)
Sum of squared errors (SSE)
UNEXPLAINED: Actual (yi) - Prediction (^y)
The estimated regression equation will not predict the values y, it will only estimate them
A measure of this error is SSE (^y is predicted)
The coefficient of determination (R2)
Describes the percentage variation in the dependent variable explained by movements in the independent variable
Just r2 (loses + / -) add back when calculating r again
R<strong>2</strong> = 80% = 0.8 > r = 0.81/2 = 0.89 = -0.89 (see below)
y^ (predicted) = 0.4 - 0.3x > b1 = -0.3
Alternatively: R2 = RSS / TSS (if the same, R2 = 1 > perfect fit)
R2 = 1 - SSE/ TSS (if SSE = 0, R2 = 1 > perfect fit)
Total sum of the squares (TSS)
ACTUAL (yi) - MEAN
Alternatively, TSS = RSS +SSE
Regression sum of the squares (RSS)
EXPLAINED: PREDICTION (^y) - MEAN
Difference between the estimated values for y and the mean value of y
Graphic: Relationship between TSS, RSS and SSE
Relationship between TSS, RSS and SSE
Hypothesis testing on regression parameters
ANOVA tables
Prediction intervals on the dependent variable
eg. 20 ——– 40
Limitations of regression analysis
Multiple Regression
Assumptions
ANOVA
Work out:
Using the regression equation to estimate the value
Becomes: Ŷ = 0.163 - (0.28 x 11) + (1.15 x 18) + (0.09 x 215) = 37.13
But this is only an estimate, we will want to apply confidence intervals to this
Individual test: T-test
Testing the significance of each of the individual regression coefficients and the
intercept
Tcalc: bi / S.E.
Tcrit: 2 (given in CFA)
TCalc > TCrit (in absolut) = REJECT NULL (H0: b1 = 0)
then b1 not equal to 0 = SIGNIFICANT
Global F-Test: Testing the validity of the whole regression
Testing to see whether or not all of the regression coefficients as a group are insignificant
FCalc > FCrit (in absolut) = REJECT NULL: at least one does not equal zero