Logistic Regression Flashcards

(45 cards)

1
Q

This is an example of a logistic regression model with one predictor (personality type). How do you interpret β1?

A
  • The difference in log-odds between those with type-B personalities and those with type-A personalities
  • This means that the odds ratio is exp(β1)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In this example, the predictor is personality type and the outcome is CHD. Based on the estimated coefficients, how do you interpret the intercept and slope?

A
  • Th odds of CHD among type-A personalities (baseline) are 0.126
  • Type-B personalities have 0.42 times the odds of having CHD compared to those with type-A. We can also convert this to a percentage (-0.42) x 100 = 58%, so the odds of CHD are 58% lower in type-B compared to type-A individuals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does the sign of β affect the change in odds?

A

In general
- When β>0, then exp(β) > 1, so increase in odds
- When β<0, then exp(β) <1, so decrease in odds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

This is an example of a model with multiple predictors. How do you interpret the estimated coefficients?

A
  • dibpat Type B: Type B individuals have 0.435 times the odds of CHD compared to Type A, so Type B individuals have 56.5% lower odds of CHD than Type A
  • smokeYes: those reported smoking have 1.80 times the odds of CHD compared to non-smokers, so those reported smoking have 80% higher odds of CHD
  • Intercept: the odds of CHD for type A smokers are 0.0906, also referred to as baseline odds
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the different types of categorical variables?

A
  • Nominal, not ordered (e.g. career choice)
  • Ordered (e.g. patterns of cigarette use: never used, used only occasionally, used regularly, etc.)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is deviance?

A
  • Similar to RSS (residual sum of squares) for residuals
  • Allows you to test nested models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

For a nested model, what are the null and alternative hypotheses?

A
  • Null: βk + 1 = βk + 2 = … = βp = 0
  • The additional estimated coefficients are 0 (predictors are not informative)
  • Alternative: at least one of the betas is not equal to zero
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the assumptions of logistic regression?

A
  • Linearity: there is a linear relationship between the link function and the systematic component
  • Independence: errors are independently distributed
  • Distribution: Y follows a distribution in the exponential family (normal, poisson, exponential, binomial, bernoulli)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the four main diagnostic plots used logistic regression?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can you assess the linearity assumption and what is the problem with this in logistic regression?

A
  • We can use a residuals plot, but since the outcome variables are 0 and 1, we will see a pattern in the residuals
  • Instead, we can plot predicted logit values against a predictor. We are looking to see evidence of nonlinearity. We are not checking for equal variance or centering around 0.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the major difference between linear and logistic regression?

A
  • In logistic, the predictions are probabilities where all Y values are binary (0/1)
  • We need a threshold probability to predict 0/1. Typically, we use 0.5, where predicted probability >0.5 predicts to 1, otherwise predict to 0.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When evaluating accuracy (true and false positives and negatives), what are Y and Y hat?

A

Y: observed event, Y hat: predicted event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are Y and Y hat for a true positive?

A

Y = 1 and Y hat = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are Y and Y hat for a true negative?

A

Y = 0 and Y hat = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are Y and Y hat for a false positive?

A

Y = 0 and Y hat = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are Y and Y hat for a false negative?

A

Y = 1 and Y hat = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Confusion matrix for accuracy

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is accuracy and how do you calculate it?

A
  • Measure of how well the model detects positive and negative tests
  • (TP+TN) / (TP+TN+FP+FN)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is precision and how do you calculate it?

A
  • How well does the model detect true positives and recommend further testing?
  • TP / (TP+FP)
20
Q

What is specificity, and how do you calculate it?

A
  • How well does the model detect true negatives?
  • TN / (TN + FP)
21
Q

What is sensitivity/recall, and how do you calculate it?

A
  • How well does the model detect true positive cases? (They have TB and are referred for testing and have it)
  • TP / (TP+FN)
22
Q

What is the difference between precision and sensitivity?

A
  • Sensitivity (recall) measures the proportion of actual positives correctly identified, focusing on minimizing false negatives
  • Precision measures the accuracy of positive predictions, focusing on minimizing false positives
23
Q

What is calibration and how can you measure it?

A
  • The predicted and observed distributions look similar
  • Idea: If we look at all observations with a predicted probability between 0.2 and 0.3, we would want the proportion with y = 1 to be around this range too.
  • Can measure using the Brier score or a Goodness-of-Fit test
24
Q

What is discrimination?

A
  • Those with observed y = 1 have high predicted probabilities, those with observed y = 0 have low predicted probabilities
  • Idea: we want our model to be able to distinguish the two classes using the variables in the model
25
Goodness of Fit Test Plot (measure of calibration)
Want to see all points fall on the line (overlap of estimated and observed)
26
What is the null hypothesis for the Homsler-Lemeshow Goodness-Of-Fit Test? (measure of calibration)
- H0: Model follows a chi-squared distribution, so the model doesn’t fit. There is a difference in distribution - Based on the p-value here, we reject the H0 – the model doesn’t follow a chi-squared distribution. Not a good fit
27
How do you measure discrimination?
- ROC curve - AUC
28
What is an ROC curve?
- For each possible threshold probability, you can find the sensitivity and specificity
29
What is the AUC and what is considered a good/bad AUC?
- Area under the ROC curve - Gives a numerical value to discrimination - We want this value to be closer to 1
30
What are the three methods for evaluating models?
- Nested models - AIC - BIC
31
What are the null and alternative hypothesis for a nested model?
Assessed by: - For linear model, F-statistic, ANOVA - For logistic model, Chi2 test
32
Example of a nested model, with the null and alternative hypotheses tested
33
What is AIC?
- Like Adjusted R2, it penalizes for more parameters (overfitting) - AIC can be interpreted as estimating prediction error from the model error by considering how much information is lost by using our fitted model rather than knowing the true process - We want the model with the lowest AIC
34
We want the model with the ___ AIC
Lowest
35
What is BIC?
- Similar to AIC but comes from a Bayesian perspective - Has a harsher penalty on the number of predictors - We want the model with the lowest BIC
36
How does backward stepwise selection work?
- It uses AIC (or another specified measure) to repeatedly remove predictors fro the model until it no longer improves the AIC value - The starting model includes all identified predictors Then, with each iteration, - For each predictor currently in the model, remove that variable and find the AIC - If the best found AIC on the previous step is an improvement from the current model, remove the corresponding variable and update the current model - Otherwise, stop the algorithm
37
How does forward stepwise selection work?
- Uses AIC (or other specified measure) to repeatedly add predictors to the model until it no longer improves the AIC value - Starting model: the null model or a model with a subset of necessary predictors Then, in each iteration, - For each predictor not currently in the model, add that variable and find the AIC - If the best found AIC on the previous step is an improvement from the current model, add the corresponding variable and update the current model - Otherwise, stop the algorithm
38
Will forward and backward selection always choose the same model?
No
39
What is the advantage of forward selection, and when might we prefer to use it?
- Can isolate the effect of each variable since adding one at a time - In a hypothesis-driven model with specific predictors
40
What is the advantage of forward selection, and when might we prefer to use it?
More helpful for many variables → helps with interpretation
41
How does stepwise selection work for interactions?
- Can use stepwise selection to determine which interactions to include - Starting model: model with no interactions - Possible terms to add: all possible terms - This is an example of forward stepwise selection
42
When might we split the training and test data?
- When we do not have access to an external validation set - Common split sizes are 70 or 75% training
43
Example of test-train split with polynomial
- Idea: we want to compare using a polynomial transformation on bascre to a log-log transformation - Use a train-test split. We will use the training data to fit the polynomial, then the test data to compare the two models
44
We want the model with the ___ R2
Highest
45
We want the ___ RMSE (root mean square error) and MAE (mean absolute error)
Lowest