SSTO
SSTO=SSE+SSR
SSTO=Observed-Mean
Degree of Freedom=n-1
Distance from observed point to mean
πππ=Ξ£(ππβπΜ )^2
SSTO=Y’(I-1/n J)Y (Quadratic Formula)
SSTO=Y’Y- (1/n)Y’JY
SSE
Error Sum of Squares
SSE=Observed - Predicted
Degree of Freedom=n-p
Simple Linear Degree of freedom is n-2
Distance from observed point to predicted πππΈ=Ξ£(π^πβππ )^2
SSE=Y’Y-Y’HY
=Y’(I-H)Y
SSR
Residual Sum of Square
SSR=Predicted-mean
Degree of freedom=p-1
Simple Linear Degree of Freedom=1
Distance from Predicted line to mean
πππ
=Ξ£(π^πβπΜ
)^2
SSR=Y’HY-(1/n)Y’JY
=Y’(H-(1/n)J)Y
What are the general assumptions of the linear model? How to assess it?
MSR
SSR/p-1
Variation that is explained by the fitted regression line
MSE
SSE/n-p
MSE is estimate of sigma Ο2
β(MSE) is estimate of Ο
Variation NOT explained by the fitted regression line
F statistic
F=MSR/MSE
If π½=0βπΉ=1, else π½β 0βπΉ>1
R2
What is R2=0.1885 means
What is R2 adjusted =0.1817 means
Is Adjusted R2 better than R2?
R2=SSR/SSTO
Coefficient of determination
Proportion of variance of Y explained (linearly) by the variation in X
βR2 with the appropriate sign is equal to the correlation coefficient (r) between X and Y (only in simple linear regression
R2=0.18 means, 19% variation in inverse BP is explained by the predictors in the model
R2=0.18 means, 18% variation in inverse BP is explained by the predictors in the model after adjusting for the number of variables in the model
Adjusted R2 is better because it accounts for cost of number of variables
Diagnostics using Raw Residuals
Limitation of R2
Box Cox Transformation
Method to find a useful transformation
Chose the best Ξ»
Modeling Strategy
Boxplot
Boxplot is used to describe the symmetry of the data, can’t directly draw the conclusion of normality from boxplot
Collinearity is tested by
Correlation Matrix
Outlier can be tested by
Plotting semistudentized residual against X or predicted Y
Omission of important predictors can be tested by
Plotting residuals against omitted variables
Non linearity can be fixed by
Transformation
Non variance consistency can be fixed by
Variance stabilizing transformation
Weighted least squares
Non independence of the error terms can be fixed by
Adding a time covariates to the model
Omission of important predictors can be fixed by
Adding them
Transformation
Interpret: 1/SBP=0.00985-0.0000416*AGE
For every 1 year increase in age, we expect the mean inverse of SBP decrease by 00.0000416 mmHG-1 (while considering other variables holding it constant)
What is the Null Hypothesis and Alternative Hypothesis for F test for multiple variable.
Decision Rule
F critical value
F test equation
H0: π½1=π½2=0β
H1: Not all π½s are equal to 0
Decision rule:
If F^β€F(1-Ξ±;p-1,n-p) conclude H0
If F^>F(1-Ξ±;p-1,n-p) conclude H1
F critical value= F(1-Ξ±;p-1,n-p)
F test equation=MSR/MSE
What is null hypothesis for the Spearman correlation?
H0=There is no relationship between two variables