How to decide on appropriate statistical test for regression?
Depends on the type of dependent variable
Simple vs Multiple regression
Applies to continuous, ordinal and nominal variables
Simple regression
- only 1 independent variable
Multiple regression
- more than 1 independent variables
Correlation vs Simple linear regression (2)
Correlation
Simple linear regression
Applications of simple linear regression (2)
2. Predict or estimate the value of Y associated with a fixed value of X
Can extrapolate values of Y beyond the observed range?
Cannot extrapolate beyond the observed range as the relationship between X and Y may not be linear
Simple linear regression model
Y = alpha + beta(X)
alpha = y-intercept beta = slope
Alpha meaning
Mean value of Y when X=0
Beta meaning
The change in the mean value of Y that corresponds to a one-unit change in X
Does linear regression test for linear relationship between the 2 variables?
No.
- it assumes linear relationship
- finds the best-fitting straight line with the y-intercept and slope
Hence, always plot scatter plot to determine if there is any linear relationship
Linear relationship : linear regression
Non-linear relationship : non-linear regression
Scatter plot to determine use of linear regression
Scatter plot must suggest :
Assumptions of simple linear regression model (4)
How to determine the best-fitting straight line?
Method of Least Squares
- best-fitting line = line with the smallest residual sum of squares
Residual plot
Residual against Y values
Each residual data is randomly scattered above and below ei=0
Test statistics for beta (slope)
Ho & H1
Two-tailed test
Ho :
H1 :
Test statistics for alpha (constant)
Seldom done cos not really important
Two-tailed test
Ho :
- alpha = 0
H1 :
- alpha =/ 0
Evaluation of the goodness-of-fit regression model
Coefficient of determination (R^2)
In simple linear regression, R^2 = r^2,
r = Pearson product-moment correlation coefficient
R^2
R^2 = 1
All data points lie exactly on the best-fitting line
R^2 = 0
There is no linear relationship between X and Y
R^2
Coefficient of determination
Significance level for constant value in statistical report
Not important
Significance level for ANOVA report (2)
- same significance level value for Coefficients report
Sum of squares (regression) in ANOVA report
Sum of squares (residual) in ANOVA report