regression
a method to understand how an outcome Y changes as predictor X1, X2,…, Xp vary
Regression finds the best-fitting line through the cloud of points
single regression formula
Y = B0+ B1X1
B is beta
Y =
outcome variable (DV)
X =
predictor variables (IV)
B0 =
intercept (model’s predicted value of Y when X = 0)
Bi =
Slope coefficients (change in Y per unit change in X)
How much the outcome changes for each 1-unit increase in 𝑋𝑖, holding all else constant
E =
Error term (unexplained variance)
R^2
coefficient of determination
represents the proportion of the variance in a dependent variable that is predictable from the independent variables in a regression model.
It indicates how well the regression line fits the data, with a value of (1) meaning all data points fall perfectly on the line and (0) meaning the line explains none of the variability.
ex: R^2 = 0.215; About 21.5% of burnout (Y) variation is explained by age (X)
residuals
the distance between a point and the reference line (linear) – error
added up, squared, know how much error it present → want them to be around 0, want the least amount of error
The more slope (steeper slope)….
the more of a relationship the variables have
Large beta
assumptions of regression
Relationship between predictors and outcome is learn
Check: scatterplots, residual plots
Violation: curved relationships, U-shaped effects
Observations are independent of each other
Violation: clustered data, repeated measures
Solution: multilevel modeling
Constant variance of residuals across predictor values
Check: residual vs fitted plots
Violation: funnel-shaped patterns
Residuals are normally distributed
Check: Q-Q plots, histograms
Violation: skewed distributions
Predictors are not highly correlated
Check: correlation matrix, VIF value
Rule: VIF < 5 (or < 10)
key takeaways
Regression quantifies relationships between variables
Coefficients tell us the size and direction of effects
Confidence intervals show uncertainty in estimates
P-values indicate statistical significance
multiple regression
Multiple regression is an extension of simple regression — we’re simply adding more predictors to the model.
multiple regression formula
Y = B0+ B1X1+ B2BX2 + … + E
B is beta
How to interpret the coefficients
Each coefficient represents the expected change in the outcome for a one-unit change in that predictor, holding all other variables constant.
In multiple regression, each regression coefficient (β) represents the unique contribution of its predictor to the outcome, after controlling for the effects of the other predictors in the model
which predictor has the strongest effect
We can use standardized coefficients to compare the strength of the effects of the predictors
we need to know if it has a scale function
without it, we don’t know if they are standardized – can draw conclusions about effect size