What is a logistic regression used for?
To predict the probability of a certain class existing or an event occurring (yes/no) depending on one or several independent variables
What measurement level can the independent variables be (in a logistic regression)?
SAME as the IVs in a LINEAR regression
What maesurement level can the dependent variables be? (in a logistic regression)
Categorical (nominal)
binary (0, 1) or dummy-coded (eg. Yes/ No)
What shape does the logistic curve have?
S curve
What does the dependent variable represent? (S-curve)
Represents probabilities, with values ranging from 0 to 1
At what level of probability is the turning point?
p = 0.5
How are regression coefficients estimated?
Using maximum likelihood estimation (highest possible for y = 1, lowest possible for y = 0)
What is the same size requirement for logistic regression?
To allow meaningful interpretation each category should contain at least 25 observations. Assuming normal distribution, minimum sample of n = 50.
How do you interpret the logistic regression coefficients?
Logistic is NOT linear. You can only interpret based on whether the coefficient is positive or negative..
eg. a positive regression coefficient will lead to increasing profitability that an event (DV) will be occurring.
What is the “logit value”?
Calculated by taking the logarithm of the odds.
An odds value less than 1 will have a negative logit value, and an odds ratio greater than 1 will have a positive logit value.
What is the “odds”?
p/(1-p)
[The ratio of an event happening to it not happening.]
What are the extreme values of probability (p)?
1 = 100%
0 = 0%
This shows the odds and the logit value of combinations of p:
What was the movie streaming case of logistic regression?
Does access to movie streaming services (e.g., Netflix, Amazon Prime) lead to lower student success of the marketing analytics subject (pass vs. fail)?
Does the “self-control” of students increase student success?
What does p in this case refer to? (movie streaming case)
The probability of passing the exam
What does successExam variable say?
What actually happened, whether they, 1 = passed, or 0 = failed
Why should you ignore Block 0 in SPSS? (logistic regression)
Because Block 0 is the “null” (constant-only) model:
The classification/table and fit stats in Block 0 are just a baseline and aren’t evidence your predictors work.
What is important in this model summary?
R-Square can be interpreted in the same way. So according to Nagelkerke R Square we can infer 11% of the variance
What does the -2 Log likelihood tell us?
Not much, it is often reported but is not an informative number by itself, it can however be used to compare different models. The smaller, the better.
What do the model results show?
The model correctly predicts 169 out of 192 students who passed the exam.
What is problematic about these results?
The model is actually really poor. The overall accuracy is noted as 63.7%. However, the model has a clear overaccuracy towards passing the exam. Even though that 88% looks great, it really is not, because in the context, the model ALREADY overcompensates on believing people will pass, == because its terrible in predicting people will FAIL, it’ll obviously be great at predicting people will PASS.
What does the logistic regression coefficient tell you?
REMEMBER, it can only be interpreted using positive/ negative sign. It is NOT linear.
What does the significance of the coefficients tell you?
Self control is significant at the 1% level. The access to movie streaming service is MARGINALLY significant (between 5-10%).
What does the odds-ratios tell you about self-control?
Increasing self-control by one unit, will lead to a 42% increase in the odds of passing the exam.