Introduction to Statistical Modeling Flashcards

(330 cards)

1
Q

What is a statistical model?

A

A mathematical description of how data are generated: systematic signal + random noise. Intuition: data vary even at same X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the response variable Y?

A

The outcome variable you want to explain or predict.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a predictor X?

A

A variable used to explain or predict Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the conditional mean E[Y|X]?

A

The average value of Y among units with the same X; the regression function targets this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an error term ε?

A

Random deviation of observed Y from the model mean at X; captures unmeasured factors + randomness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a residual e_i?

A

e_i = y_i − ŷ_i (observed minus fitted); estimate of ε_i.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a fitted value ŷ_i?

A

The model’s predicted mean response at X_i.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does ‘linear in parameters’ mean?

A

β’s enter as a linear combination: β0 + β1 g1(x)+…; β not multiplied together or inside nonlinear functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Give an example of linear-in-β but nonlinear-in-x model.

A

y = β0 + β1 x + β2 x^2 + ε (polynomial regression).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Interpret β0 in regression.

A

Expected/average Y when all predictors equal 0 (may be non-meaningful if 0 is outside range).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Interpret β1 in simple regression.

A

Average change in E[Y|x] for a 1-unit increase in x; association, not necessarily causation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Interpret βj in multiple regression.

A

Average change in E[Y|X] for 1-unit increase in x_j holding other predictors constant (adjusted effect).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is confounding (informal)?

A

A variable related to both X and Y that can distort the observed X–Y association.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an observational study?

A

Data collected without random assignment; causal claims require stronger assumptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an experiment?

A

Study with random assignment of treatments; supports causal inference under proper design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the sample?

A

Observed subset of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the population?

A

Full set of units you want to generalize to.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a statistical unit?

A

Single entity measured (person, firm, day, site, etc.).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is representativeness?

A

Sample resembles population; supports generalization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is operationalization?

A

Turning an abstract construct into a measurable variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is concept validity?

A

Whether the measured variable truly captures the intended construct (measurement matches concept).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is internal validity?

A

Whether observed association reflects causal effect within the studied sample (no major bias/confounding).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is external validity?

A

Whether results generalize to other populations/settings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is model specification?

A

Choice of predictors/functional form (transformations/interactions) included in the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is omitted variable bias?
Bias in coefficient estimates when a relevant predictor is left out and correlated with included predictors.
26
What is multicollinearity?
Strong correlation among predictors causing unstable coefficient estimates and large SEs.
27
What is a dummy variable?
0/1 indicator variable used to encode categories.
28
What is a reference category?
Baseline group omitted from dummies; other category coefficients are relative to it.
29
What is an interaction term?
Product of predictors allowing effect of one predictor to depend on another.
30
What is heteroskedasticity?
Non-constant error variance across X or fitted values.
31
What is homoskedasticity?
Constant error variance: Var(ε_i)=σ^2.
32
What is autocorrelation?
Correlation between errors over time/order (violates independence).
33
What is leverage?
How extreme a point’s X values are relative to others (potential influence on fit).
34
What is influence?
How much removing a point changes estimates/predictions.
35
What is an outlier?
Point with unusual Y given X (large residual).
36
Outlier vs influential point?
Outlier: large residual; influential: changes model fit/coefficients strongly (often high leverage).
37
What is a model assumption?
A condition about data-generating process that must hold for standard inference to be valid.
38
List the 4 common regression assumptions.
Linearity, Independence, Homoskedasticity, Normality (often ranked with first two most critical).
39
What does the linearity assumption mean?
Mean function E[Y|X] is correctly modeled (no systematic structure left in residuals).
40
What does independence assumption mean?
Errors/observations are independent: Cov(ε_i, ε_j)=0 for i≠j (no clustering/time dependence).
41
What does normality assumption mean?
Errors are normally distributed (mainly affects exact small-sample inference).
42
What is double-dipping?
Using same data to explore/generate hypotheses then confirm/test them → inflated false positives.
43
How do you avoid double-dipping?
Pre-specify or split data into exploratory vs confirmatory sets; use CV for prediction.
44
What is overfitting?
Model fits noise in training data, reducing out-of-sample performance.
45
What is underfitting?
Model too simple to capture true structure; systematic errors remain.
46
What is bias-variance tradeoff?
More complex models reduce bias but increase variance; optimal complexity balances both.
47
What is a statistical model? (version 47)
A mathematical description of how data are generated: systematic signal + random noise. Intuition: data vary even at same X.
48
What is the response variable Y? (version 48)
The outcome variable you want to explain or predict.
49
What is a predictor X? (version 49)
A variable used to explain or predict Y.
50
What is the conditional mean E[Y|X]? (version 50)
The average value of Y among units with the same X; the regression function targets this.
51
What is an error term ε? (version 51)
Random deviation of observed Y from the model mean at X; captures unmeasured factors + randomness.
52
What is a residual e_i? (version 52)
e_i = y_i − ŷ_i (observed minus fitted); estimate of ε_i.
53
What is a fitted value ŷ_i? (version 53)
The model’s predicted mean response at X_i.
54
What does 'linear in parameters' mean? (version 54)
β's enter as a linear combination: β0 + β1 g1(x)+…; β not multiplied together or inside nonlinear functions.
55
Give an example of linear-in-β but nonlinear-in-x model. (version 55)
y = β0 + β1 x + β2 x^2 + ε (polynomial regression).
56
Interpret β0 in regression. (version 56)
Expected/average Y when all predictors equal 0 (may be non-meaningful if 0 is outside range).
57
Interpret β1 in simple regression. (version 57)
Average change in E[Y|x] for a 1-unit increase in x; association, not necessarily causation.
58
Interpret βj in multiple regression. (version 58)
Average change in E[Y|X] for 1-unit increase in x_j holding other predictors constant (adjusted effect).
59
What is confounding (informal)? (version 59)
A variable related to both X and Y that can distort the observed X–Y association.
60
What is an observational study? (version 60)
Data collected without random assignment; causal claims require stronger assumptions.
61
What is an experiment? (version 61)
Study with random assignment of treatments; supports causal inference under proper design.
62
What is the sample? (version 62)
Observed subset of the population.
63
What is the population? (version 63)
Full set of units you want to generalize to.
64
What is a statistical unit? (version 64)
Single entity measured (person, firm, day, site, etc.).
65
What is representativeness? (version 65)
Sample resembles population; supports generalization.
66
What is operationalization? (version 66)
Turning an abstract construct into a measurable variable.
67
What is concept validity? (version 67)
Whether the measured variable truly captures the intended construct (measurement matches concept).
68
What is internal validity? (version 68)
Whether observed association reflects causal effect within the studied sample (no major bias/confounding).
69
What is external validity? (version 69)
Whether results generalize to other populations/settings.
70
What is model specification? (version 70)
Choice of predictors/functional form (transformations/interactions) included in the model.
71
What is omitted variable bias? (version 71)
Bias in coefficient estimates when a relevant predictor is left out and correlated with included predictors.
72
What is multicollinearity? (version 72)
Strong correlation among predictors causing unstable coefficient estimates and large SEs.
73
What is a dummy variable? (version 73)
0/1 indicator variable used to encode categories.
74
What is a reference category? (version 74)
Baseline group omitted from dummies; other category coefficients are relative to it.
75
What is an interaction term? (version 75)
Product of predictors allowing effect of one predictor to depend on another.
76
What is heteroskedasticity? (version 76)
Non-constant error variance across X or fitted values.
77
What is homoskedasticity? (version 77)
Constant error variance: Var(ε_i)=σ^2.
78
What is autocorrelation? (version 78)
Correlation between errors over time/order (violates independence).
79
What is leverage? (version 79)
How extreme a point’s X values are relative to others (potential influence on fit).
80
What is influence? (version 80)
How much removing a point changes estimates/predictions.
81
What is an outlier? (version 81)
Point with unusual Y given X (large residual).
82
Outlier vs influential point? (version 82)
Outlier: large residual; influential: changes model fit/coefficients strongly (often high leverage).
83
What is a model assumption? (version 83)
A condition about data-generating process that must hold for standard inference to be valid.
84
List the 4 common regression assumptions. (version 84)
Linearity, Independence, Homoskedasticity, Normality (often ranked with first two most critical).
85
What does the linearity assumption mean? (version 85)
Mean function E[Y|X] is correctly modeled (no systematic structure left in residuals).
86
What does independence assumption mean? (version 86)
Errors/observations are independent: Cov(ε_i, ε_j)=0 for i≠j (no clustering/time dependence).
87
What does normality assumption mean? (version 87)
Errors are normally distributed (mainly affects exact small-sample inference).
88
What is double-dipping? (version 88)
Using same data to explore/generate hypotheses then confirm/test them → inflated false positives.
89
How do you avoid double-dipping? (version 89)
Pre-specify or split data into exploratory vs confirmatory sets; use CV for prediction.
90
What is overfitting? (version 90)
Model fits noise in training data, reducing out-of-sample performance.
91
What is underfitting? (version 91)
Model too simple to capture true structure; systematic errors remain.
92
What is bias-variance tradeoff? (version 92)
More complex models reduce bias but increase variance; optimal complexity balances both.
93
What is a statistical model? (version 93)
A mathematical description of how data are generated: systematic signal + random noise. Intuition: data vary even at same X.
94
What is the response variable Y? (version 94)
The outcome variable you want to explain or predict.
95
What is a predictor X? (version 95)
A variable used to explain or predict Y.
96
What is the conditional mean E[Y|X]? (version 96)
The average value of Y among units with the same X; the regression function targets this.
97
What is an error term ε? (version 97)
Random deviation of observed Y from the model mean at X; captures unmeasured factors + randomness.
98
What is a residual e_i? (version 98)
e_i = y_i − ŷ_i (observed minus fitted); estimate of ε_i.
99
What is a fitted value ŷ_i? (version 99)
The model’s predicted mean response at X_i.
100
What does 'linear in parameters' mean? (version 100)
β's enter as a linear combination: β0 + β1 g1(x)+…; β not multiplied together or inside nonlinear functions.
101
Give an example of linear-in-β but nonlinear-in-x model. (version 101)
y = β0 + β1 x + β2 x^2 + ε (polynomial regression).
102
Interpret β0 in regression. (version 102)
Expected/average Y when all predictors equal 0 (may be non-meaningful if 0 is outside range).
103
Interpret β1 in simple regression. (version 103)
Average change in E[Y|x] for a 1-unit increase in x; association, not necessarily causation.
104
Interpret βj in multiple regression. (version 104)
Average change in E[Y|X] for 1-unit increase in x_j holding other predictors constant (adjusted effect).
105
What is confounding (informal)? (version 105)
A variable related to both X and Y that can distort the observed X–Y association.
106
What is an observational study? (version 106)
Data collected without random assignment; causal claims require stronger assumptions.
107
What is an experiment? (version 107)
Study with random assignment of treatments; supports causal inference under proper design.
108
What is the sample? (version 108)
Observed subset of the population.
109
What is the population? (version 109)
Full set of units you want to generalize to.
110
What is a statistical unit? (version 110)
Single entity measured (person, firm, day, site, etc.).
111
What is representativeness? (version 111)
Sample resembles population; supports generalization.
112
What is operationalization? (version 112)
Turning an abstract construct into a measurable variable.
113
What is concept validity? (version 113)
Whether the measured variable truly captures the intended construct (measurement matches concept).
114
What is internal validity? (version 114)
Whether observed association reflects causal effect within the studied sample (no major bias/confounding).
115
What is external validity? (version 115)
Whether results generalize to other populations/settings.
116
What is model specification? (version 116)
Choice of predictors/functional form (transformations/interactions) included in the model.
117
What is omitted variable bias? (version 117)
Bias in coefficient estimates when a relevant predictor is left out and correlated with included predictors.
118
What is multicollinearity? (version 118)
Strong correlation among predictors causing unstable coefficient estimates and large SEs.
119
What is a dummy variable? (version 119)
0/1 indicator variable used to encode categories.
120
What is a reference category? (version 120)
Baseline group omitted from dummies; other category coefficients are relative to it.
121
Given ___, Fit simple linear regression in R.
Steps/Answer: fit <- lm(y ~ x, data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
122
Given ___, Fit multiple regression in R.
Steps/Answer: fit <- lm(y ~ x1 + x2 + x3, data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
123
Given ___, Extract coefficient estimates in R.
Steps/Answer: coef(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
124
Given ___, Extract full coefficient table.
Steps/Answer: summary(fit)$coefficients # Estimate, SE, t, p Warning: Always check assumptions/diagnostics before interpreting p-values.
125
Given ___, Get fitted values and residuals.
Steps/Answer: yhat <- fitted(fit); e <- resid(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
126
Given ___, Compute 95% CI for coefficients.
Steps/Answer: confint(fit, level=0.95) Warning: Always check assumptions/diagnostics before interpreting p-values.
127
Given ___, Make basic diagnostic plots.
Steps/Answer: plot(fit) # residuals-fitted, QQ, scale-location, leverage Warning: Always check assumptions/diagnostics before interpreting p-values.
128
Given ___, Fit quadratic regression.
Steps/Answer: lm(y ~ x + I(x^2), data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
129
Given ___, Fit cubic regression.
Steps/Answer: lm(y ~ x + I(x^2) + I(x^3), data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
130
Given ___, Center a predictor x in R.
Steps/Answer: df$xc <- df$x - mean(df$x) Warning: Always check assumptions/diagnostics before interpreting p-values.
131
Given ___, Standardize a predictor (z-score) in R.
Steps/Answer: df$xz <- scale(df$x) Warning: Always check assumptions/diagnostics before interpreting p-values.
132
Given ___, Run Breusch–Pagan test.
Steps/Answer: library(lmtest); bptest(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
133
Given ___, Compute robust SE (HC1).
Steps/Answer: library(sandwich); library(lmtest); coeftest(fit, vcov=vcovHC(fit, type='HC1')) Warning: Always check assumptions/diagnostics before interpreting p-values.
134
Given ___, Add interaction term in R.
Steps/Answer: lm(y ~ x1 * x2, data=df) # includes x1,x2,x1:x2 Warning: Always check assumptions/diagnostics before interpreting p-values.
135
Given ___, Add dummy variables automatically.
Steps/Answer: lm(y ~ factor(group), data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
136
Given ___, Change reference category.
Steps/Answer: df$group <- relevel(factor(df$group), ref='A') Warning: Always check assumptions/diagnostics before interpreting p-values.
137
Given ___, Compute predictions at new data.
Steps/Answer: predict(fit, newdata=new_df) Warning: Always check assumptions/diagnostics before interpreting p-values.
138
Given ___, Get prediction interval.
Steps/Answer: predict(fit, newdata=new_df, interval='prediction', level=0.95) Warning: Always check assumptions/diagnostics before interpreting p-values.
139
Given ___, Get confidence interval for mean response.
Steps/Answer: predict(fit, newdata=new_df, interval='confidence', level=0.95) Warning: Always check assumptions/diagnostics before interpreting p-values.
140
Given ___, Check residual normality visually.
Steps/Answer: qqnorm(resid(fit)); qqline(resid(fit)) Warning: Always check assumptions/diagnostics before interpreting p-values.
141
Given ___, Plot residuals vs fitted.
Steps/Answer: plot(fitted(fit), resid(fit)); abline(h=0) Warning: Always check assumptions/diagnostics before interpreting p-values.
142
Given ___, Detect autocorrelation visually.
Steps/Answer: plot(resid(fit), type='l') Warning: Always check assumptions/diagnostics before interpreting p-values.
143
Given ___, Compute VIF.
Steps/Answer: library(car); vif(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
144
Given ___, Fit log-transform response.
Steps/Answer: lm(log(y) ~ x1 + x2, data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
145
Given ___, Fit log-transform predictor.
Steps/Answer: lm(y ~ log(x1) + x2, data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
146
Given ___, Compute ANOVA table.
Steps/Answer: anova(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
147
Given ___, Compare nested models (F test).
Steps/Answer: anova(fit_small, fit_large) Warning: Always check assumptions/diagnostics before interpreting p-values.
148
Given ___, Fit simple linear regression in R.
Steps/Answer: fit <- lm(y ~ x, data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
149
Given ___, Fit multiple regression in R.
Steps/Answer: fit <- lm(y ~ x1 + x2 + x3, data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
150
Given ___, Extract coefficient estimates in R.
Steps/Answer: coef(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
151
Given ___, Extract full coefficient table.
Steps/Answer: summary(fit)$coefficients # Estimate, SE, t, p Warning: Always check assumptions/diagnostics before interpreting p-values.
152
Given ___, Get fitted values and residuals.
Steps/Answer: yhat <- fitted(fit); e <- resid(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
153
Given ___, Compute 95% CI for coefficients.
Steps/Answer: confint(fit, level=0.95) Warning: Always check assumptions/diagnostics before interpreting p-values.
154
Given ___, Make basic diagnostic plots.
Steps/Answer: plot(fit) # residuals-fitted, QQ, scale-location, leverage Warning: Always check assumptions/diagnostics before interpreting p-values.
155
Given ___, Fit quadratic regression.
Steps/Answer: lm(y ~ x + I(x^2), data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
156
Given ___, Fit cubic regression.
Steps/Answer: lm(y ~ x + I(x^2) + I(x^3), data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
157
Given ___, Center a predictor x in R.
Steps/Answer: df$xc <- df$x - mean(df$x) Warning: Always check assumptions/diagnostics before interpreting p-values.
158
Given ___, Standardize a predictor (z-score) in R.
Steps/Answer: df$xz <- scale(df$x) Warning: Always check assumptions/diagnostics before interpreting p-values.
159
Given ___, Run Breusch–Pagan test.
Steps/Answer: library(lmtest); bptest(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
160
Given ___, Compute robust SE (HC1).
Steps/Answer: library(sandwich); library(lmtest); coeftest(fit, vcov=vcovHC(fit, type='HC1')) Warning: Always check assumptions/diagnostics before interpreting p-values.
161
Given ___, Add interaction term in R.
Steps/Answer: lm(y ~ x1 * x2, data=df) # includes x1,x2,x1:x2 Warning: Always check assumptions/diagnostics before interpreting p-values.
162
Given ___, Add dummy variables automatically.
Steps/Answer: lm(y ~ factor(group), data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
163
Given ___, Change reference category.
Steps/Answer: df$group <- relevel(factor(df$group), ref='A') Warning: Always check assumptions/diagnostics before interpreting p-values.
164
Given ___, Compute predictions at new data.
Steps/Answer: predict(fit, newdata=new_df) Warning: Always check assumptions/diagnostics before interpreting p-values.
165
Given ___, Get prediction interval.
Steps/Answer: predict(fit, newdata=new_df, interval='prediction', level=0.95) Warning: Always check assumptions/diagnostics before interpreting p-values.
166
Given ___, Get confidence interval for mean response.
Steps/Answer: predict(fit, newdata=new_df, interval='confidence', level=0.95) Warning: Always check assumptions/diagnostics before interpreting p-values.
167
Given ___, Check residual normality visually.
Steps/Answer: qqnorm(resid(fit)); qqline(resid(fit)) Warning: Always check assumptions/diagnostics before interpreting p-values.
168
Given ___, Plot residuals vs fitted.
Steps/Answer: plot(fitted(fit), resid(fit)); abline(h=0) Warning: Always check assumptions/diagnostics before interpreting p-values.
169
Given ___, Detect autocorrelation visually.
Steps/Answer: plot(resid(fit), type='l') Warning: Always check assumptions/diagnostics before interpreting p-values.
170
Given ___, Compute VIF.
Steps/Answer: library(car); vif(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
171
Given ___, Fit log-transform response.
Steps/Answer: lm(log(y) ~ x1 + x2, data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
172
Given ___, Fit log-transform predictor.
Steps/Answer: lm(y ~ log(x1) + x2, data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
173
Given ___, Compute ANOVA table.
Steps/Answer: anova(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
174
Given ___, Compare nested models (F test).
Steps/Answer: anova(fit_small, fit_large) Warning: Always check assumptions/diagnostics before interpreting p-values.
175
Given ___, Fit simple linear regression in R.
Steps/Answer: fit <- lm(y ~ x, data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
176
Given ___, Fit multiple regression in R.
Steps/Answer: fit <- lm(y ~ x1 + x2 + x3, data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
177
Given ___, Extract coefficient estimates in R.
Steps/Answer: coef(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
178
Given ___, Extract full coefficient table.
Steps/Answer: summary(fit)$coefficients # Estimate, SE, t, p Warning: Always check assumptions/diagnostics before interpreting p-values.
179
Given ___, Get fitted values and residuals.
Steps/Answer: yhat <- fitted(fit); e <- resid(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
180
Given ___, Compute 95% CI for coefficients.
Steps/Answer: confint(fit, level=0.95) Warning: Always check assumptions/diagnostics before interpreting p-values.
181
Given ___, Make basic diagnostic plots.
Steps/Answer: plot(fit) # residuals-fitted, QQ, scale-location, leverage Warning: Always check assumptions/diagnostics before interpreting p-values.
182
Given ___, Fit quadratic regression.
Steps/Answer: lm(y ~ x + I(x^2), data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
183
Given ___, Fit cubic regression.
Steps/Answer: lm(y ~ x + I(x^2) + I(x^3), data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
184
Given ___, Center a predictor x in R.
Steps/Answer: df$xc <- df$x - mean(df$x) Warning: Always check assumptions/diagnostics before interpreting p-values.
185
Given ___, Standardize a predictor (z-score) in R.
Steps/Answer: df$xz <- scale(df$x) Warning: Always check assumptions/diagnostics before interpreting p-values.
186
Given ___, Run Breusch–Pagan test.
Steps/Answer: library(lmtest); bptest(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
187
Given ___, Compute robust SE (HC1).
Steps/Answer: library(sandwich); library(lmtest); coeftest(fit, vcov=vcovHC(fit, type='HC1')) Warning: Always check assumptions/diagnostics before interpreting p-values.
188
Given ___, Add interaction term in R.
Steps/Answer: lm(y ~ x1 * x2, data=df) # includes x1,x2,x1:x2 Warning: Always check assumptions/diagnostics before interpreting p-values.
189
Given ___, Add dummy variables automatically.
Steps/Answer: lm(y ~ factor(group), data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
190
Given ___, Change reference category.
Steps/Answer: df$group <- relevel(factor(df$group), ref='A') Warning: Always check assumptions/diagnostics before interpreting p-values.
191
Given ___, Compute predictions at new data.
Steps/Answer: predict(fit, newdata=new_df) Warning: Always check assumptions/diagnostics before interpreting p-values.
192
Given ___, Get prediction interval.
Steps/Answer: predict(fit, newdata=new_df, interval='prediction', level=0.95) Warning: Always check assumptions/diagnostics before interpreting p-values.
193
Given ___, Get confidence interval for mean response.
Steps/Answer: predict(fit, newdata=new_df, interval='confidence', level=0.95) Warning: Always check assumptions/diagnostics before interpreting p-values.
194
Given ___, Check residual normality visually.
Steps/Answer: qqnorm(resid(fit)); qqline(resid(fit)) Warning: Always check assumptions/diagnostics before interpreting p-values.
195
Given ___, Plot residuals vs fitted.
Steps/Answer: plot(fitted(fit), resid(fit)); abline(h=0) Warning: Always check assumptions/diagnostics before interpreting p-values.
196
Given ___, Detect autocorrelation visually.
Steps/Answer: plot(resid(fit), type='l') Warning: Always check assumptions/diagnostics before interpreting p-values.
197
Given ___, Compute VIF.
Steps/Answer: library(car); vif(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
198
Given ___, Fit log-transform response.
Steps/Answer: lm(log(y) ~ x1 + x2, data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
199
Given ___, Fit log-transform predictor.
Steps/Answer: lm(y ~ log(x1) + x2, data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
200
Given ___, Compute ANOVA table.
Steps/Answer: anova(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
201
Given ___, Compare nested models (F test).
Steps/Answer: anova(fit_small, fit_large) Warning: Always check assumptions/diagnostics before interpreting p-values.
202
Given ___, Fit simple linear regression in R.
Steps/Answer: fit <- lm(y ~ x, data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
203
Given ___, Fit multiple regression in R.
Steps/Answer: fit <- lm(y ~ x1 + x2 + x3, data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
204
Given ___, Extract coefficient estimates in R.
Steps/Answer: coef(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
205
Given ___, Extract full coefficient table.
Steps/Answer: summary(fit)$coefficients # Estimate, SE, t, p Warning: Always check assumptions/diagnostics before interpreting p-values.
206
Given ___, Get fitted values and residuals.
Steps/Answer: yhat <- fitted(fit); e <- resid(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
207
Given ___, Compute 95% CI for coefficients.
Steps/Answer: confint(fit, level=0.95) Warning: Always check assumptions/diagnostics before interpreting p-values.
208
Given ___, Make basic diagnostic plots.
Steps/Answer: plot(fit) # residuals-fitted, QQ, scale-location, leverage Warning: Always check assumptions/diagnostics before interpreting p-values.
209
Given ___, Fit quadratic regression.
Steps/Answer: lm(y ~ x + I(x^2), data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
210
Given ___, Fit cubic regression.
Steps/Answer: lm(y ~ x + I(x^2) + I(x^3), data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
211
Given ___, Center a predictor x in R.
Steps/Answer: df$xc <- df$x - mean(df$x) Warning: Always check assumptions/diagnostics before interpreting p-values.
212
Given ___, Standardize a predictor (z-score) in R.
Steps/Answer: df$xz <- scale(df$x) Warning: Always check assumptions/diagnostics before interpreting p-values.
213
Given ___, Run Breusch–Pagan test.
Steps/Answer: library(lmtest); bptest(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
214
Given ___, Compute robust SE (HC1).
Steps/Answer: library(sandwich); library(lmtest); coeftest(fit, vcov=vcovHC(fit, type='HC1')) Warning: Always check assumptions/diagnostics before interpreting p-values.
215
Given ___, Add interaction term in R.
Steps/Answer: lm(y ~ x1 * x2, data=df) # includes x1,x2,x1:x2 Warning: Always check assumptions/diagnostics before interpreting p-values.
216
Given ___, Add dummy variables automatically.
Steps/Answer: lm(y ~ factor(group), data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
217
Given ___, Change reference category.
Steps/Answer: df$group <- relevel(factor(df$group), ref='A') Warning: Always check assumptions/diagnostics before interpreting p-values.
218
Given ___, Compute predictions at new data.
Steps/Answer: predict(fit, newdata=new_df) Warning: Always check assumptions/diagnostics before interpreting p-values.
219
Given ___, Get prediction interval.
Steps/Answer: predict(fit, newdata=new_df, interval='prediction', level=0.95) Warning: Always check assumptions/diagnostics before interpreting p-values.
220
Given ___, Get confidence interval for mean response.
Steps/Answer: predict(fit, newdata=new_df, interval='confidence', level=0.95) Warning: Always check assumptions/diagnostics before interpreting p-values.
221
Given ___, Check residual normality visually.
Steps/Answer: qqnorm(resid(fit)); qqline(resid(fit)) Warning: Always check assumptions/diagnostics before interpreting p-values.
222
Given ___, Plot residuals vs fitted.
Steps/Answer: plot(fitted(fit), resid(fit)); abline(h=0) Warning: Always check assumptions/diagnostics before interpreting p-values.
223
Given ___, Detect autocorrelation visually.
Steps/Answer: plot(resid(fit), type='l') Warning: Always check assumptions/diagnostics before interpreting p-values.
224
Given ___, Compute VIF.
Steps/Answer: library(car); vif(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
225
Given ___, Fit log-transform response.
Steps/Answer: lm(log(y) ~ x1 + x2, data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
226
Given ___, Fit log-transform predictor.
Steps/Answer: lm(y ~ log(x1) + x2, data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
227
Given ___, Compute ANOVA table.
Steps/Answer: anova(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
228
Given ___, Compare nested models (F test).
Steps/Answer: anova(fit_small, fit_large) Warning: Always check assumptions/diagnostics before interpreting p-values.
229
Given ___, Fit simple linear regression in R.
Steps/Answer: fit <- lm(y ~ x, data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
230
Given ___, Fit multiple regression in R.
Steps/Answer: fit <- lm(y ~ x1 + x2 + x3, data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
231
Given ___, Extract coefficient estimates in R.
Steps/Answer: coef(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
232
Given ___, Extract full coefficient table.
Steps/Answer: summary(fit)$coefficients # Estimate, SE, t, p Warning: Always check assumptions/diagnostics before interpreting p-values.
233
Given ___, Get fitted values and residuals.
Steps/Answer: yhat <- fitted(fit); e <- resid(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
234
Given ___, Compute 95% CI for coefficients.
Steps/Answer: confint(fit, level=0.95) Warning: Always check assumptions/diagnostics before interpreting p-values.
235
Given ___, Make basic diagnostic plots.
Steps/Answer: plot(fit) # residuals-fitted, QQ, scale-location, leverage Warning: Always check assumptions/diagnostics before interpreting p-values.
236
Given ___, Fit quadratic regression.
Steps/Answer: lm(y ~ x + I(x^2), data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
237
Given ___, Fit cubic regression.
Steps/Answer: lm(y ~ x + I(x^2) + I(x^3), data=df) Warning: Always check assumptions/diagnostics before interpreting p-values.
238
Given ___, Center a predictor x in R.
Steps/Answer: df$xc <- df$x - mean(df$x) Warning: Always check assumptions/diagnostics before interpreting p-values.
239
Given ___, Standardize a predictor (z-score) in R.
Steps/Answer: df$xz <- scale(df$x) Warning: Always check assumptions/diagnostics before interpreting p-values.
240
Given ___, Run Breusch–Pagan test.
Steps/Answer: library(lmtest); bptest(fit) Warning: Always check assumptions/diagnostics before interpreting p-values.
241
TRICK: 'Linear regression requires straight-line relationship in x.'
False. Linear = linear in β. Curves can be modeled with transformations/polynomials. How they trick you: they swap precise statistical meaning for everyday meaning.
242
TRICK: 'β1 applies to each individual exactly.'
False. β1 is average effect on conditional mean; individuals vary by ε. How they trick you: they swap precise statistical meaning for everyday meaning.
243
TRICK: 'Non-normal residuals means coefficients are biased.'
Not necessarily. Normality mainly affects exact inference; linearity/independence often more critical. How they trick you: they swap precise statistical meaning for everyday meaning.
244
TRICK: 'Heteroskedasticity always biases β.'
Often leaves β unbiased but invalidates SE/t-tests/CIs unless corrected. How they trick you: they swap precise statistical meaning for everyday meaning.
245
TRICK: 'High R² implies correct causal model.'
False. R² ≠ causality; can be high with confounding or misspecification. How they trick you: they swap precise statistical meaning for everyday meaning.
246
TRICK: 'If residuals sum to 0, model is correct.'
Residuals sum to ~0 by construction; you can still violate assumptions. How they trick you: they swap precise statistical meaning for everyday meaning.
247
TRICK: 'If p<0.05, effect is large.'
False. p-value depends on SE and sample size; examine effect size & CI. How they trick you: they swap precise statistical meaning for everyday meaning.
248
TRICK: 'Add predictors to increase R² automatically improves model.'
More predictors can overfit; adjusted R²/CV preferred. How they trick you: they swap precise statistical meaning for everyday meaning.
249
TRICK: 'Intercept must be meaningful.'
Intercept is value at x=0; if x=0 not in data it’s extrapolation. How they trick you: they swap precise statistical meaning for everyday meaning.
250
TRICK: 'Omitted variable doesn't matter if model fits well.'
OVB can bias coefficients even if fit looks good. How they trick you: they swap precise statistical meaning for everyday meaning.
251
TRICK: 'Correlation implies causation.'
Regression estimates association; causality needs stronger design assumptions. How they trick you: they swap precise statistical meaning for everyday meaning.
252
TRICK: 'Independence means different x values.'
Independence is probabilistic; clustering/time dependence breaks it. How they trick you: they swap precise statistical meaning for everyday meaning.
253
TRICK: 'Linear regression requires straight-line relationship in x.'
False. Linear = linear in β. Curves can be modeled with transformations/polynomials. How they trick you: they swap precise statistical meaning for everyday meaning.
254
TRICK: 'β1 applies to each individual exactly.'
False. β1 is average effect on conditional mean; individuals vary by ε. How they trick you: they swap precise statistical meaning for everyday meaning.
255
TRICK: 'Non-normal residuals means coefficients are biased.'
Not necessarily. Normality mainly affects exact inference; linearity/independence often more critical. How they trick you: they swap precise statistical meaning for everyday meaning.
256
TRICK: 'Heteroskedasticity always biases β.'
Often leaves β unbiased but invalidates SE/t-tests/CIs unless corrected. How they trick you: they swap precise statistical meaning for everyday meaning.
257
TRICK: 'High R² implies correct causal model.'
False. R² ≠ causality; can be high with confounding or misspecification. How they trick you: they swap precise statistical meaning for everyday meaning.
258
TRICK: 'If residuals sum to 0, model is correct.'
Residuals sum to ~0 by construction; you can still violate assumptions. How they trick you: they swap precise statistical meaning for everyday meaning.
259
TRICK: 'If p<0.05, effect is large.'
False. p-value depends on SE and sample size; examine effect size & CI. How they trick you: they swap precise statistical meaning for everyday meaning.
260
TRICK: 'Add predictors to increase R² automatically improves model.'
More predictors can overfit; adjusted R²/CV preferred. How they trick you: they swap precise statistical meaning for everyday meaning.
261
TRICK: 'Intercept must be meaningful.'
Intercept is value at x=0; if x=0 not in data it’s extrapolation. How they trick you: they swap precise statistical meaning for everyday meaning.
262
TRICK: 'Omitted variable doesn't matter if model fits well.'
OVB can bias coefficients even if fit looks good. How they trick you: they swap precise statistical meaning for everyday meaning.
263
TRICK: 'Correlation implies causation.'
Regression estimates association; causality needs stronger design assumptions. How they trick you: they swap precise statistical meaning for everyday meaning.
264
TRICK: 'Independence means different x values.'
Independence is probabilistic; clustering/time dependence breaks it. How they trick you: they swap precise statistical meaning for everyday meaning.
265
TRICK: 'Linear regression requires straight-line relationship in x.'
False. Linear = linear in β. Curves can be modeled with transformations/polynomials. How they trick you: they swap precise statistical meaning for everyday meaning.
266
TRICK: 'β1 applies to each individual exactly.'
False. β1 is average effect on conditional mean; individuals vary by ε. How they trick you: they swap precise statistical meaning for everyday meaning.
267
TRICK: 'Non-normal residuals means coefficients are biased.'
Not necessarily. Normality mainly affects exact inference; linearity/independence often more critical. How they trick you: they swap precise statistical meaning for everyday meaning.
268
TRICK: 'Heteroskedasticity always biases β.'
Often leaves β unbiased but invalidates SE/t-tests/CIs unless corrected. How they trick you: they swap precise statistical meaning for everyday meaning.
269
TRICK: 'High R² implies correct causal model.'
False. R² ≠ causality; can be high with confounding or misspecification. How they trick you: they swap precise statistical meaning for everyday meaning.
270
TRICK: 'If residuals sum to 0, model is correct.'
Residuals sum to ~0 by construction; you can still violate assumptions. How they trick you: they swap precise statistical meaning for everyday meaning.
271
TRICK: 'If p<0.05, effect is large.'
False. p-value depends on SE and sample size; examine effect size & CI. How they trick you: they swap precise statistical meaning for everyday meaning.
272
TRICK: 'Add predictors to increase R² automatically improves model.'
More predictors can overfit; adjusted R²/CV preferred. How they trick you: they swap precise statistical meaning for everyday meaning.
273
TRICK: 'Intercept must be meaningful.'
Intercept is value at x=0; if x=0 not in data it’s extrapolation. How they trick you: they swap precise statistical meaning for everyday meaning.
274
TRICK: 'Omitted variable doesn't matter if model fits well.'
OVB can bias coefficients even if fit looks good. How they trick you: they swap precise statistical meaning for everyday meaning.
275
TRICK: 'Correlation implies causation.'
Regression estimates association; causality needs stronger design assumptions. How they trick you: they swap precise statistical meaning for everyday meaning.
276
TRICK: 'Independence means different x values.'
Independence is probabilistic; clustering/time dependence breaks it. How they trick you: they swap precise statistical meaning for everyday meaning.
277
TRICK: 'Linear regression requires straight-line relationship in x.'
False. Linear = linear in β. Curves can be modeled with transformations/polynomials. How they trick you: they swap precise statistical meaning for everyday meaning.
278
TRICK: 'β1 applies to each individual exactly.'
False. β1 is average effect on conditional mean; individuals vary by ε. How they trick you: they swap precise statistical meaning for everyday meaning.
279
TRICK: 'Non-normal residuals means coefficients are biased.'
Not necessarily. Normality mainly affects exact inference; linearity/independence often more critical. How they trick you: they swap precise statistical meaning for everyday meaning.
280
TRICK: 'Heteroskedasticity always biases β.'
Often leaves β unbiased but invalidates SE/t-tests/CIs unless corrected. How they trick you: they swap precise statistical meaning for everyday meaning.
281
TRICK: 'High R² implies correct causal model.'
False. R² ≠ causality; can be high with confounding or misspecification. How they trick you: they swap precise statistical meaning for everyday meaning.
282
TRICK: 'If residuals sum to 0, model is correct.'
Residuals sum to ~0 by construction; you can still violate assumptions. How they trick you: they swap precise statistical meaning for everyday meaning.
283
TRICK: 'If p<0.05, effect is large.'
False. p-value depends on SE and sample size; examine effect size & CI. How they trick you: they swap precise statistical meaning for everyday meaning.
284
TRICK: 'Add predictors to increase R² automatically improves model.'
More predictors can overfit; adjusted R²/CV preferred. How they trick you: they swap precise statistical meaning for everyday meaning.
285
TRICK: 'Intercept must be meaningful.'
Intercept is value at x=0; if x=0 not in data it’s extrapolation. How they trick you: they swap precise statistical meaning for everyday meaning.
286
TRICK: 'Omitted variable doesn't matter if model fits well.'
OVB can bias coefficients even if fit looks good. How they trick you: they swap precise statistical meaning for everyday meaning.
287
TRICK: 'Correlation implies causation.'
Regression estimates association; causality needs stronger design assumptions. How they trick you: they swap precise statistical meaning for everyday meaning.
288
TRICK: 'Independence means different x values.'
Independence is probabilistic; clustering/time dependence breaks it. How they trick you: they swap precise statistical meaning for everyday meaning.
289
TRICK: 'Linear regression requires straight-line relationship in x.'
False. Linear = linear in β. Curves can be modeled with transformations/polynomials. How they trick you: they swap precise statistical meaning for everyday meaning.
290
TRICK: 'β1 applies to each individual exactly.'
False. β1 is average effect on conditional mean; individuals vary by ε. How they trick you: they swap precise statistical meaning for everyday meaning.
291
TRICK: 'Non-normal residuals means coefficients are biased.'
Not necessarily. Normality mainly affects exact inference; linearity/independence often more critical. How they trick you: they swap precise statistical meaning for everyday meaning.
292
TRICK: 'Heteroskedasticity always biases β.'
Often leaves β unbiased but invalidates SE/t-tests/CIs unless corrected. How they trick you: they swap precise statistical meaning for everyday meaning.
293
TRICK: 'High R² implies correct causal model.'
False. R² ≠ causality; can be high with confounding or misspecification. How they trick you: they swap precise statistical meaning for everyday meaning.
294
TRICK: 'If residuals sum to 0, model is correct.'
Residuals sum to ~0 by construction; you can still violate assumptions. How they trick you: they swap precise statistical meaning for everyday meaning.
295
TRICK: 'If p<0.05, effect is large.'
False. p-value depends on SE and sample size; examine effect size & CI. How they trick you: they swap precise statistical meaning for everyday meaning.
296
TRICK: 'Add predictors to increase R² automatically improves model.'
More predictors can overfit; adjusted R²/CV preferred. How they trick you: they swap precise statistical meaning for everyday meaning.
297
TRICK: 'Intercept must be meaningful.'
Intercept is value at x=0; if x=0 not in data it’s extrapolation. How they trick you: they swap precise statistical meaning for everyday meaning.
298
TRICK: 'Omitted variable doesn't matter if model fits well.'
OVB can bias coefficients even if fit looks good. How they trick you: they swap precise statistical meaning for everyday meaning.
299
TRICK: 'Correlation implies causation.'
Regression estimates association; causality needs stronger design assumptions. How they trick you: they swap precise statistical meaning for everyday meaning.
300
TRICK: 'Independence means different x values.'
Independence is probabilistic; clustering/time dependence breaks it. How they trick you: they swap precise statistical meaning for everyday meaning.
301
TRICK: 'Linear regression requires straight-line relationship in x.'
False. Linear = linear in β. Curves can be modeled with transformations/polynomials. How they trick you: they swap precise statistical meaning for everyday meaning.
302
TRICK: 'β1 applies to each individual exactly.'
False. β1 is average effect on conditional mean; individuals vary by ε. How they trick you: they swap precise statistical meaning for everyday meaning.
303
TRICK: 'Non-normal residuals means coefficients are biased.'
Not necessarily. Normality mainly affects exact inference; linearity/independence often more critical. How they trick you: they swap precise statistical meaning for everyday meaning.
304
TRICK: 'Heteroskedasticity always biases β.'
Often leaves β unbiased but invalidates SE/t-tests/CIs unless corrected. How they trick you: they swap precise statistical meaning for everyday meaning.
305
TRICK: 'High R² implies correct causal model.'
False. R² ≠ causality; can be high with confounding or misspecification. How they trick you: they swap precise statistical meaning for everyday meaning.
306
TRICK: 'If residuals sum to 0, model is correct.'
Residuals sum to ~0 by construction; you can still violate assumptions. How they trick you: they swap precise statistical meaning for everyday meaning.
307
TRICK: 'If p<0.05, effect is large.'
False. p-value depends on SE and sample size; examine effect size & CI. How they trick you: they swap precise statistical meaning for everyday meaning.
308
TRICK: 'Add predictors to increase R² automatically improves model.'
More predictors can overfit; adjusted R²/CV preferred. How they trick you: they swap precise statistical meaning for everyday meaning.
309
TRICK: 'Intercept must be meaningful.'
Intercept is value at x=0; if x=0 not in data it’s extrapolation. How they trick you: they swap precise statistical meaning for everyday meaning.
310
TRICK: 'Omitted variable doesn't matter if model fits well.'
OVB can bias coefficients even if fit looks good. How they trick you: they swap precise statistical meaning for everyday meaning.
311
TRICK: 'Correlation implies causation.'
Regression estimates association; causality needs stronger design assumptions. How they trick you: they swap precise statistical meaning for everyday meaning.
312
TRICK: 'Independence means different x values.'
Independence is probabilistic; clustering/time dependence breaks it. How they trick you: they swap precise statistical meaning for everyday meaning.
313
TRICK: 'Linear regression requires straight-line relationship in x.'
False. Linear = linear in β. Curves can be modeled with transformations/polynomials. How they trick you: they swap precise statistical meaning for everyday meaning.
314
TRICK: 'β1 applies to each individual exactly.'
False. β1 is average effect on conditional mean; individuals vary by ε. How they trick you: they swap precise statistical meaning for everyday meaning.
315
TRICK: 'Non-normal residuals means coefficients are biased.'
Not necessarily. Normality mainly affects exact inference; linearity/independence often more critical. How they trick you: they swap precise statistical meaning for everyday meaning.
316
TRICK: 'Heteroskedasticity always biases β.'
Often leaves β unbiased but invalidates SE/t-tests/CIs unless corrected. How they trick you: they swap precise statistical meaning for everyday meaning.
317
TRICK: 'High R² implies correct causal model.'
False. R² ≠ causality; can be high with confounding or misspecification. How they trick you: they swap precise statistical meaning for everyday meaning.
318
TRICK: 'If residuals sum to 0, model is correct.'
Residuals sum to ~0 by construction; you can still violate assumptions. How they trick you: they swap precise statistical meaning for everyday meaning.
319
TRICK: 'If p<0.05, effect is large.'
False. p-value depends on SE and sample size; examine effect size & CI. How they trick you: they swap precise statistical meaning for everyday meaning.
320
TRICK: 'Add predictors to increase R² automatically improves model.'
More predictors can overfit; adjusted R²/CV preferred. How they trick you: they swap precise statistical meaning for everyday meaning.
321
TRICK: 'Intercept must be meaningful.'
Intercept is value at x=0; if x=0 not in data it’s extrapolation. How they trick you: they swap precise statistical meaning for everyday meaning.
322
TRICK: 'Omitted variable doesn't matter if model fits well.'
OVB can bias coefficients even if fit looks good. How they trick you: they swap precise statistical meaning for everyday meaning.
323
TRICK: 'Correlation implies causation.'
Regression estimates association; causality needs stronger design assumptions. How they trick you: they swap precise statistical meaning for everyday meaning.
324
TRICK: 'Independence means different x values.'
Independence is probabilistic; clustering/time dependence breaks it. How they trick you: they swap precise statistical meaning for everyday meaning.
325
TRICK: 'Linear regression requires straight-line relationship in x.'
False. Linear = linear in β. Curves can be modeled with transformations/polynomials. How they trick you: they swap precise statistical meaning for everyday meaning.
326
TRICK: 'β1 applies to each individual exactly.'
False. β1 is average effect on conditional mean; individuals vary by ε. How they trick you: they swap precise statistical meaning for everyday meaning.
327
TRICK: 'Non-normal residuals means coefficients are biased.'
Not necessarily. Normality mainly affects exact inference; linearity/independence often more critical. How they trick you: they swap precise statistical meaning for everyday meaning.
328
TRICK: 'Heteroskedasticity always biases β.'
Often leaves β unbiased but invalidates SE/t-tests/CIs unless corrected. How they trick you: they swap precise statistical meaning for everyday meaning.
329
TRICK: 'High R² implies correct causal model.'
False. R² ≠ causality; can be high with confounding or misspecification. How they trick you: they swap precise statistical meaning for everyday meaning.
330
TRICK: 'If residuals sum to 0, model is correct.'
Residuals sum to ~0 by construction; you can still violate assumptions. How they trick you: they swap precise statistical meaning for everyday meaning.