linear regression Flashcards

(21 cards)

1
Q

regression

A

a method to understand how an outcome Y changes as predictor X1, X2,…, Xp vary

Regression finds the best-fitting line through the cloud of points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

single regression formula

A

Y = B0+ B1X1

B is beta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Y =

A

outcome variable (DV)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

X =

A

predictor variables (IV)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

B0 =

A

intercept (model’s predicted value of Y when X = 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Bi =

A

Slope coefficients (change in Y per unit change in X)

How much the outcome changes for each 1-unit increase in 𝑋𝑖, holding all else constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

E =

A

Error term (unexplained variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

R^2

A

coefficient of determination

represents the proportion of the variance in a dependent variable that is predictable from the independent variables in a regression model.

It indicates how well the regression line fits the data, with a value of (1) meaning all data points fall perfectly on the line and (0) meaning the line explains none of the variability.

ex: R^2 = 0.215; About 21.5% of burnout (Y) variation is explained by age (X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

residuals

A

the distance between a point and the reference line (linear) – error

added up, squared, know how much error it present → want them to be around 0, want the least amount of error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The more slope (steeper slope)….

A

the more of a relationship the variables have

Large beta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

assumptions of regression

A
  1. linearity
  2. independence
  3. Homoscedasticity
  4. Normality of residuals
  5. No multicollinearity - only for multiple regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  1. linearity
A

Relationship between predictors and outcome is learn

Check: scatterplots, residual plots

Violation: curved relationships, U-shaped effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  1. independence
A

Observations are independent of each other

Violation: clustered data, repeated measures

Solution: multilevel modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
  1. Homoscedasticity
A

Constant variance of residuals across predictor values

Check: residual vs fitted plots

Violation: funnel-shaped patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
  1. Normality of residuals
A

Residuals are normally distributed

Check: Q-Q plots, histograms

Violation: skewed distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
  1. No multicollinearity - only for multiple regression
A

Predictors are not highly correlated

Check: correlation matrix, VIF value

Rule: VIF < 5 (or < 10)

17
Q

key takeaways

A

Regression quantifies relationships between variables

Coefficients tell us the size and direction of effects

Confidence intervals show uncertainty in estimates

P-values indicate statistical significance

18
Q

multiple regression

A

Multiple regression is an extension of simple regression — we’re simply adding more predictors to the model.

19
Q

multiple regression formula

A

Y = B0+ B1X1+ B2BX2 + … + E

B is beta

20
Q

How to interpret the coefficients

A

Each coefficient represents the expected change in the outcome for a one-unit change in that predictor, holding all other variables constant.

In multiple regression, each regression coefficient (β) represents the unique contribution of its predictor to the outcome, after controlling for the effects of the other predictors in the model

21
Q

which predictor has the strongest effect

A

We can use standardized coefficients to compare the strength of the effects of the predictors

we need to know if it has a scale function

without it, we don’t know if they are standardized – can draw conclusions about effect size