- Similar to RSS (residual sum of squares) for residuals - Allows you to test nested models

Logistic Regression Flashcards by Justina Michaels

This is an example of a logistic regression model with one predictor (personality type). How do you interpret β1?

The difference in log-odds between those with type-B personalities and those with type-A personalities
This means that the odds ratio is exp(β1)

How well did you know this?

Not at all

Perfectly

In this example, the predictor is personality type and the outcome is CHD. Based on the estimated coefficients, how do you interpret the intercept and slope?

Th odds of CHD among type-A personalities (baseline) are 0.126
Type-B personalities have 0.42 times the odds of having CHD compared to those with type-A. We can also convert this to a percentage (-0.42) x 100 = 58%, so the odds of CHD are 58% lower in type-B compared to type-A individuals

How well did you know this?

Not at all

Perfectly

How does the sign of β affect the change in odds?

In general
- When β>0, then exp(β) > 1, so increase in odds
- When β<0, then exp(β) <1, so decrease in odds

How well did you know this?

Not at all

Perfectly

This is an example of a model with multiple predictors. How do you interpret the estimated coefficients?

dibpat Type B: Type B individuals have 0.435 times the odds of CHD compared to Type A, so Type B individuals have 56.5% lower odds of CHD than Type A
smokeYes: those reported smoking have 1.80 times the odds of CHD compared to non-smokers, so those reported smoking have 80% higher odds of CHD
Intercept: the odds of CHD for type A smokers are 0.0906, also referred to as baseline odds

How well did you know this?

Not at all

Perfectly

What are the different types of categorical variables?

Nominal, not ordered (e.g. career choice)
Ordered (e.g. patterns of cigarette use: never used, used only occasionally, used regularly, etc.)

How well did you know this?

Not at all

Perfectly

What is deviance?

Similar to RSS (residual sum of squares) for residuals
Allows you to test nested models

How well did you know this?

Not at all

Perfectly

For a nested model, what are the null and alternative hypotheses?

Null: βk + 1 = βk + 2 = … = βp = 0
The additional estimated coefficients are 0 (predictors are not informative)
Alternative: at least one of the betas is not equal to zero

How well did you know this?

Not at all

Perfectly

What are the assumptions of logistic regression?

Linearity: there is a linear relationship between the link function and the systematic component
Independence: errors are independently distributed
Distribution: Y follows a distribution in the exponential family (normal, poisson, exponential, binomial, bernoulli)

How well did you know this?

Not at all

Perfectly

What are the four main diagnostic plots used logistic regression?

How well did you know this?

Not at all

Perfectly

How can you assess the linearity assumption and what is the problem with this in logistic regression?

We can use a residuals plot, but since the outcome variables are 0 and 1, we will see a pattern in the residuals
Instead, we can plot predicted logit values against a predictor. We are looking to see evidence of nonlinearity. We are not checking for equal variance or centering around 0.

How well did you know this?

Not at all

Perfectly

What is the major difference between linear and logistic regression?

In logistic, the predictions are probabilities where all Y values are binary (0/1)
We need a threshold probability to predict 0/1. Typically, we use 0.5, where predicted probability >0.5 predicts to 1, otherwise predict to 0.

How well did you know this?

Not at all

Perfectly

When evaluating accuracy (true and false positives and negatives), what are Y and Y hat?

Y: observed event, Y hat: predicted event

How well did you know this?

Not at all

Perfectly

What are Y and Y hat for a true positive?

Y = 1 and Y hat = 1

How well did you know this?

Not at all

Perfectly

What are Y and Y hat for a true negative?

Y = 0 and Y hat = 0

How well did you know this?

Not at all

Perfectly

What are Y and Y hat for a false positive?

Y = 0 and Y hat = 1

How well did you know this?

Not at all

Perfectly

What are Y and Y hat for a false negative?

Y = 1 and Y hat = 0

How well did you know this?

Not at all

Perfectly

Confusion matrix for accuracy

How well did you know this?

Not at all

Perfectly

What is accuracy and how do you calculate it?

Measure of how well the model detects positive and negative tests
(TP+TN) / (TP+TN+FP+FN)

How well did you know this?

Not at all

Perfectly

What is precision and how do you calculate it?

Study These Flashcards

How well does the model detect true positives and recommend further testing?
TP / (TP+FP)

What is specificity, and how do you calculate it?

Study These Flashcards

How well does the model detect true negatives?
TN / (TN + FP)

What is sensitivity/recall, and how do you calculate it?

Study These Flashcards

How well does the model detect true positive cases? (They have TB and are referred for testing and have it)
TP / (TP+FN)

What is the difference between precision and sensitivity?

Study These Flashcards

Sensitivity (recall) measures the proportion of actual positives correctly identified, focusing on minimizing false negatives
Precision measures the accuracy of positive predictions, focusing on minimizing false positives

What is calibration and how can you measure it?

Study These Flashcards

The predicted and observed distributions look similar
Idea: If we look at all observations with a predicted probability between 0.2 and 0.3, we would want the proportion with y = 1 to be around this range too.
Can measure using the Brier score or a Goodness-of-Fit test

What is discrimination?

Study These Flashcards

Those with observed y = 1 have high predicted probabilities, those with observed y = 0 have low predicted probabilities
Idea: we want our model to be able to distinguish the two classes using the variables in the model

Goodness of Fit Test Plot (measure of calibration)

Want to see all points fall on the line (overlap of estimated and observed)

What is the null hypothesis for the Homsler-Lemeshow Goodness-Of-Fit Test? (measure of calibration)

- H0: Model follows a chi-squared distribution, so the model doesn’t fit. There is a difference in distribution - Based on the p-value here, we reject the H0 – the model doesn’t follow a chi-squared distribution. Not a good fit

How do you measure discrimination?

- ROC curve - AUC

What is an ROC curve?

- For each possible threshold probability, you can find the sensitivity and specificity

What is the AUC and what is considered a good/bad AUC?

- Area under the ROC curve - Gives a numerical value to discrimination - We want this value to be closer to 1

What are the three methods for evaluating models?

- Nested models - AIC - BIC

What are the null and alternative hypothesis for a nested model?

Assessed by: - For linear model, F-statistic, ANOVA - For logistic model, Chi2 test

Example of a nested model, with the null and alternative hypotheses tested

What is AIC?

- Like Adjusted R2, it penalizes for more parameters (overfitting) - AIC can be interpreted as estimating prediction error from the model error by considering how much information is lost by using our fitted model rather than knowing the true process - We want the model with the lowest AIC

We want the model with the ___ AIC

Lowest

What is BIC?

- Similar to AIC but comes from a Bayesian perspective - Has a harsher penalty on the number of predictors - We want the model with the lowest BIC

How does backward stepwise selection work?

- It uses AIC (or another specified measure) to repeatedly remove predictors fro the model until it no longer improves the AIC value - The starting model includes all identified predictors Then, with each iteration, - For each predictor currently in the model, remove that variable and find the AIC - If the best found AIC on the previous step is an improvement from the current model, remove the corresponding variable and update the current model - Otherwise, stop the algorithm

How does forward stepwise selection work?

- Uses AIC (or other specified measure) to repeatedly add predictors to the model until it no longer improves the AIC value - Starting model: the null model or a model with a subset of necessary predictors Then, in each iteration, - For each predictor not currently in the model, add that variable and find the AIC - If the best found AIC on the previous step is an improvement from the current model, add the corresponding variable and update the current model - Otherwise, stop the algorithm

Will forward and backward selection always choose the same model?

What is the advantage of forward selection, and when might we prefer to use it?

- Can isolate the effect of each variable since adding one at a time - In a hypothesis-driven model with specific predictors

What is the advantage of forward selection, and when might we prefer to use it?

More helpful for many variables → helps with interpretation

How does stepwise selection work for interactions?

- Can use stepwise selection to determine which interactions to include - Starting model: model with no interactions - Possible terms to add: all possible terms - This is an example of forward stepwise selection

When might we split the training and test data?

- When we do not have access to an external validation set - Common split sizes are 70 or 75% training

Example of test-train split with polynomial

- Idea: we want to compare using a polynomial transformation on bascre to a log-log transformation - Use a train-test split. We will use the training data to fit the polynomial, then the test data to compare the two models

We want the model with the ___ R2

Highest

We want the ___ RMSE (root mean square error) and MAE (mean absolute error)

Lowest

Logistic Regression Flashcards

(45 cards)