5b. Logistic Regression Flashcards

(24 cards)

1
Q

What is a logistic regression used for?

A

To predict the probability of a certain class existing or an event occurring (yes/no) depending on one or several independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What measurement level can the independent variables be (in a logistic regression)?

A

SAME as the IVs in a LINEAR regression

  • continuous (eg. age)
  • categorical (eg. gender)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What maesurement level can the dependent variables be? (in a logistic regression)

A

Categorical (nominal)

binary (0, 1) or dummy-coded (eg. Yes/ No)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What shape does the logistic curve have?

A

S curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the dependent variable represent? (S-curve)

A

Represents probabilities, with values ranging from 0 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

At what level of probability is the turning point?

A

p = 0.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How are regression coefficients estimated?

A

Using maximum likelihood estimation (highest possible for y = 1, lowest possible for y = 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the same size requirement for logistic regression?

A

To allow meaningful interpretation each category should contain at least 25 observations. Assuming normal distribution, minimum sample of n = 50.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you interpret the logistic regression coefficients?

A

Logistic is NOT linear. You can only interpret based on whether the coefficient is positive or negative..

eg. a positive regression coefficient will lead to increasing profitability that an event (DV) will be occurring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the “logit value”?

A

Calculated by taking the logarithm of the odds.

An odds value less than 1 will have a negative logit value, and an odds ratio greater than 1 will have a positive logit value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the “odds”?

A

p/(1-p)

[The ratio of an event happening to it not happening.]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the extreme values of probability (p)?

A

1 = 100%
0 = 0%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

This shows the odds and the logit value of combinations of p:

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What was the movie streaming case of logistic regression?

A

Does access to movie streaming services (e.g., Netflix, Amazon Prime) lead to lower student success of the marketing analytics subject (pass vs. fail)?
Does the “self-control” of students increase student success?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does p in this case refer to? (movie streaming case)

A

The probability of passing the exam

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does successExam variable say?

A

What actually happened, whether they, 1 = passed, or 0 = failed

17
Q

Why should you ignore Block 0 in SPSS? (logistic regression)

A

Because Block 0 is the “null” (constant-only) model:

The classification/table and fit stats in Block 0 are just a baseline and aren’t evidence your predictors work.

18
Q

What is important in this model summary?

A

R-Square can be interpreted in the same way. So according to Nagelkerke R Square we can infer 11% of the variance

19
Q

What does the -2 Log likelihood tell us?

A

Not much, it is often reported but is not an informative number by itself, it can however be used to compare different models. The smaller, the better.

20
Q

What do the model results show?

A

The model correctly predicts 169 out of 192 students who passed the exam.

21
Q

What is problematic about these results?

A

The model is actually really poor. The overall accuracy is noted as 63.7%. However, the model has a clear overaccuracy towards passing the exam. Even though that 88% looks great, it really is not, because in the context, the model ALREADY overcompensates on believing people will pass, == because its terrible in predicting people will FAIL, it’ll obviously be great at predicting people will PASS.

22
Q

What does the logistic regression coefficient tell you?

A

REMEMBER, it can only be interpreted using positive/ negative sign. It is NOT linear.

23
Q

What does the significance of the coefficients tell you?

A

Self control is significant at the 1% level. The access to movie streaming service is MARGINALLY significant (between 5-10%).

24
Q

What does the odds-ratios tell you about self-control?

A

Increasing self-control by one unit, will lead to a 42% increase in the odds of passing the exam.