Simple Regression Flashcards by Elizabeth Reczek

linear regression

used when the relationship between two variables can be described with a straight line

proposes a model of the relationship

How well did you know this?

Not at all

Perfectly

correlation vs regression

correlation determines strength of relationship between X and y
regression allows us to estimate how much Y will change as a result of a given change in X

How well did you know this?

Not at all

Perfectly

terminology in regression

regression distinguishes between variable being predicted and variable(s) used to predict

How well did you know this?

Not at all

Perfectly

variable being predicted: y

outcome variable
DV (only ever one)
criterion variable
verticle axis

How well did you know this?

Not at all

Perfectly

variable used to predict: x

predictor variable
IV(s)
explanatory variable
horizontal axis

How well did you know this?

Not at all

Perfectly

when might we use regression

to investigate strength of effect x has on y
estimate how much y will change as a result of a given change in x
predict future value of y based on x

How well did you know this?

Not at all

Perfectly

what does regression assume + what does it not tell us

y is dependent (to some extent) on x
regression doesn’t tell us if this dependency is causal

How well did you know this?

Not at all

Perfectly

3 stages of linear regression

analysing the relationship between variables: strength and direction (correlation)
proposing a model to explain that relationship: model is a line of best fit
evaluating the model: assessing goodness of fit

How well did you know this?

Not at all

Perfectly

regression line

(step 2)
- line of best fit
- intercept: value of y (on line of best fit) when x is 0
- slope: how much y changes as a result of 1 unit increase in x

How well did you know this?

Not at all

Perfectly

evaluating the model; simplest model vs best model

simplest model:
- using average/mean value of y (predictor) to make estimates
- assumes no relationship between x and y

best model:
- based on relationship between x and y
- regression line

How well did you know this?

Not at all

Perfectly

sum of squares total

the difference between observed values of y and the mean of y

variance in y not explained by simplest model
not required to perform in exam

How well did you know this?

Not at all

Perfectly

sum of squares residual

the difference between the observed values of y and those predicted by the regression line

variance in y not explained by regression model
not required to perform in exam

How well did you know this?

Not at all

Perfectly

difference between SST and SSR

reflects improvement in prediction using the regression model compared to simplest mode

goodness-of-fit
sum of squares of the model
not required to perform in exam

How well did you know this?

Not at all

Perfectly

the larger the SSm…

… the bigger the improvement in prediction using the regression model over the simplest model

How well did you know this?

Not at all

Perfectly

final thing in goodness-of-fit test

use ANOVA for F-test to evaluate the improvement due to the model (SSm), relative to the variance the model does not explain (SSr)
ANOVA uses mean square values instead of SS
this takes d.f. into account
provides f-ratio

How well did you know this?

Not at all

Perfectly

F-ratio

Study These Flashcards

measure of how much the model has improved the prediction of y, relative to the level of inaccuracy of the model

interpreting F-ratio

Study These Flashcards

if regression model is good at predicting y (relative to simplest model) the improvement in prediction of the model (MSm) will be larger, while the level of accuracy of the model (MSr) will be small

e.g. F value further from 0

H0 when assessing goodness of fit

Study These Flashcards

regression model and simplest model are equal (in terms of predicting y)

MSm = 0
p < .05 reject H0, regression model is better for the data than simplest model

note of SS

Study These Flashcards

you never need to calculate it by hand

regression equation

Study These Flashcards

y = bx + a

a-intercept
b-slop

y = predicted value of y

linear regression assumptions

Study These Flashcards

linearity: x and y must be linearly related
absence of outliers (should be removed)
normality, linearity and homoscedasticity, independece of residuals
NO PARAMETRIC EQUIVALENT

homoscedasticity of residuals

Study These Flashcards

variance of residuals about the outcome should be the same for all predicted scores

SPSS output for regression

Study These Flashcards

in model summary
- don’t need this in write-up

ANOVA SPPS output for regression

Study These Flashcards

F = MSm / MSr

if p < .05 it is significant improvement when using regression model vs simplest model

SPSS Coefficient table

gives us elements for regression equation beta: as standard deviation units (others as normal units e.g. £)

SPSS coefficient table outputs: t-test

- t-test tests the null hypothesis that value of b is 0 - provides us CIs for slope which we need in write up simple regression)

how is r^2 calculated

= SSm/SSt - (multiple r^2 x100 for a percentage) - in regression we use this to assume that x explains the variance in y e.g. distance traveled explains a significant amount of variance in taxi fair, F...P... R^2 = .814 or distance traveled explained 81% of variance in taxi fair

square root of r^2

= r IF WE ONLY HAVE ONE PREDICTOR (remember we will lose the sign)

how do we calculate variance not explained by model

1 - R^2

write up

no design - results in text - we conducted a linear regression to examine the influence of Y on X. Mean Y (SD, CIs)(from descriptive stats at top of output) and mean X (SD, CIs). - preliminary analysis confirmed no violation of normality, linearity or homoscedasticity assumptions - Y explained/ did not explain a significant/not significant amount of variance in X, F(_,_) = __.__, p < .__, R^2 = __. (ANOVA table for F and p, R in model summary table) - for every (1 unit e.g. mile) increase in Y (e.e.g journey), X (taxi fair) increased by (slope) (coefficients table), 95% confidence interval limits for slope were [_,_] (coefficients table)

simple regression discussion

the findings suggets that X can be predicted by Y, with longer/shorter/higher/lower Y resulting in higher/lower X

Simple Regression Flashcards

(31 cards)