linear regression
used when the relationship between two variables can be described with a straight line
correlation vs regression
terminology in regression
variable being predicted: y
variable used to predict: x
when might we use regression
what does regression assume + what does it not tell us
3 stages of linear regression
regression line
(step 2)
- line of best fit
- intercept: value of y (on line of best fit) when x is 0
- slope: how much y changes as a result of 1 unit increase in x
evaluating the model; simplest model vs best model
simplest model:
- using average/mean value of y (predictor) to make estimates
- assumes no relationship between x and y
best model:
- based on relationship between x and y
- regression line
sum of squares total
the difference between observed values of y and the mean of y
sum of squares residual
the difference between the observed values of y and those predicted by the regression line
difference between SST and SSR
reflects improvement in prediction using the regression model compared to simplest mode
the larger the SSm…
… the bigger the improvement in prediction using the regression model over the simplest model
final thing in goodness-of-fit test
F-ratio
measure of how much the model has improved the prediction of y, relative to the level of inaccuracy of the model
interpreting F-ratio
e.g. F value further from 0
H0 when assessing goodness of fit
regression model and simplest model are equal (in terms of predicting y)
MSm = 0
p < .05 reject H0, regression model is better for the data than simplest model
note of SS
you never need to calculate it by hand
regression equation
y = bx + a
a-intercept
b-slop
y = predicted value of y
linear regression assumptions
homoscedasticity of residuals
variance of residuals about the outcome should be the same for all predicted scores
SPSS output for regression
in model summary
- don’t need this in write-up
ANOVA SPPS output for regression
F = MSm / MSr
if p < .05 it is significant improvement when using regression model vs simplest model