Linear Regression Flashcards

Question

What level of SSE is an indicator of good and poor fit

Answer 1

- If SSE = 0, then the line is perfect - "Large" SSE is an indicator of poor fit

Answer 2

- df = n - p - 1, where n is the number of cases and p is number of predictors in the model

Answer 3

- Model coefficients (slope, intercept) - Residual standard deviation - Multiple correlation coefficient R2 and adjusted R2 - F-statistic - Model degrees of freedom

Answer 4

- Asks the question: overall, how well does the model explain or predict Y - It’s used as an overall test of the model to assess whether the predictors in the model, considered as a group, are associated with the response

Answer 5

- H0 = β1 = β2 = ……. = βp = 0 - HA = At least one of the slope coefficients is not 0

Answer 6

- An estimate of σ2. It’s a measure of unexplained variation in the y variable and is calculated as the average of the squared residuals - In R output, the σ hat is defined as the residual standard error (RSE)

Answer 7

As the residual standard error (RSE)

Answer 8

- Multiple correlation coefficient It has many interpretations: - The square of correlation between observed and predicted values - The proportion of variation explained by the fitted model

Answer 9

Simple linear, not multiple linear regression

Answer 10

If a linear model perfectly captured the variability in the observed data, then R2 would be 1

Answer 11

- A measure that incorporates a penalty for including predictors that do not contribute much towards explaining the observed variation in the response variable - It is often used to balance predictive ability with model complexity - Unlike R2, R2adj does not have an inherent interpretation

Answer 12

- We want a high R2 and low σ

Answer 13

- Linearity - The relationship between the response variable and the predictor is linear - This can also be determined by calculating the mean of errors, which ideally should be 0

Answer 14

- Independence - Observations are independent of each other, and thus errors are independent

Answer 15

- Normality - Errors appear normally distributed - This is the least important assumption due to the CLT

Answer 16

- Equal variances or homoscedasticity - Constant variance of errors/residuals across all levels of the predictor variables

Answer 17

- Using scatter plots, boxplots, and q-q plots - Image shows how you can interpret linearity from scatter plots

Answer 18

- Using statistical tests - Inferring from information provided

Answer 19

- Using a q-q plot - Using histograms - Using statistical tests

Answer 20

- Look at whether the plot is linear

Answer 21

- H0: errors/residuals are normally distributed (no difference between data and normal curve) - HA: there is a difference, or not normally distributed

Answer 22

- Using plots of residual vs. predictor - Or plots of residuals vs. fitted values

Answer 23

- nO influential outliers - No strong Multicolinearity

Answer 24

- A point with a very large residual

Answer 25

- Outlier: large residual (large value in X-Y direction) - Leverage: outside the typical range of X values - Influence: influential on model estimates

Answer 26

- We want good coverage for a range of x values to avoid extrapolating too much - High leverage points have the potential to exert influence on estimated coefficients

Answer 27

- The actual influence a point has on the estimated coefficients - A point may have high leverage but low influence (or vice versa) - You can determine this by removing a point fro the data and looking at how the model changes (DFBETA)

Answer 28

- A popular way to automatically find a good transformation on the outcome variable y - Objective: to find the best exponent λ for the transformation of y in into y^λ - It is designed for a strictly positive y outcome variable and chooses the transformation to find the best fit to the data

Answer 29

- Transformation of predictors to achieve linearity (L) - Transforming of outcome to normalize residuals (N) - Transformation of outcome to stabilize variance (E)

Answer 30

- Add one higher-order term at a time until the added term is not statistically significant - Start with a large d and eliminate terms starting with the highest order terms, and eliminate the non-statistically significant terms

Answer 31

- Collinearity - Overfitting

Answer 32

- Collinearity arises when two or more predictors measuring similar things (e.g. BMI and weight) are both included in an MLR - Essentially, you can get one predictor that is a linear combination of other predictors - Estimated regression coefficients become unstable and change dramatically - Standard errors for regression coefficients 'blow up'

Answer 33

- Assess the correlation matrix of predictors - Check R2 for a regression of a predictor X1, with all other predictors and repeat the process. If it's close to 1, this is a problem because this means that one predictors can be a linear combination of other predictors

Answer 34

- In multivariable modeling, you get highly significant but meaningless results if you keep adding predictors - The model is fit perfectly to the quirks of your particular sample, but you have no predictive ability in a new sample

Linear Regression Flashcards

(62 cards)