Assumptions of multiple linear regression (5)
alpha
- mean value of y when all independent variables = 0
beta
Multiple linear regression model dimension
How to find the best-fitting model?
Method of least squares
- the model with the smallest residual sum of squares
Can nominal variables be incorporated into regression model?
Yes, using dummy variables
Dummy variables
For nominal variable, if there are k categories, what is the number of dummy variables needed?
k-1
How to evaluate goodness-of-fit of regression model
- use adjusted R^2 if model contain different numbers of independent variables
Coefficient of determination (R^2) (3)
Adjusted R^2 (3)
Multiple linear regression
Y = alpha + beta1(x1) + … + betak(xk)
Assumptions of Simple linear regression vs Multiple linear regression
Simple linear regression
1. There is linear relationship between the variables
Y = alpha + beta(x)
2. Each observations are independent of one another
3. For any specified values of X, the distribution of the Y values is normal
4. For any set of values of X, the variance is constant (equal variance)
Multiple linear regression
1. The relationship among the variables is represented by the equation
Y = alpha + beta1(x1) + … + betak(xk)
2. The observations are independent of one another
3. For any specified values of x, the distribution of the y values is normal
4. For any set of values of x, the variance is equal
5. There is little or no multicollinearity among the independent variables (not highly correlated)
eg weight and BMI are highly correlated
Why use adjusted R^2 > normal R^2 when assessing for best fit linear regression model for multiple variables (2)
- additional independent variable will always increase R^2, hence it is more meaningful to look at adjusted R^2
Model selection types (2)
Independent variables included in Multiple linear regression