What is a statistical model?
A mathematical description of how data are generated: systematic signal + random noise. Intuition: data vary even at same X.
What is the response variable Y?
The outcome variable you want to explain or predict.
What is a predictor X?
A variable used to explain or predict Y.
What is the conditional mean E[Y|X]?
The average value of Y among units with the same X; the regression function targets this.
What is an error term ε?
Random deviation of observed Y from the model mean at X; captures unmeasured factors + randomness.
What is a residual e_i?
e_i = y_i − ŷ_i (observed minus fitted); estimate of ε_i.
What is a fitted value ŷ_i?
The model’s predicted mean response at X_i.
What does ‘linear in parameters’ mean?
β’s enter as a linear combination: β0 + β1 g1(x)+…; β not multiplied together or inside nonlinear functions.
Give an example of linear-in-β but nonlinear-in-x model.
y = β0 + β1 x + β2 x^2 + ε (polynomial regression).
Interpret β0 in regression.
Expected/average Y when all predictors equal 0 (may be non-meaningful if 0 is outside range).
Interpret β1 in simple regression.
Average change in E[Y|x] for a 1-unit increase in x; association, not necessarily causation.
Interpret βj in multiple regression.
Average change in E[Y|X] for 1-unit increase in x_j holding other predictors constant (adjusted effect).
What is confounding (informal)?
A variable related to both X and Y that can distort the observed X–Y association.
What is an observational study?
Data collected without random assignment; causal claims require stronger assumptions.
What is an experiment?
Study with random assignment of treatments; supports causal inference under proper design.
What is the sample?
Observed subset of the population.
What is the population?
Full set of units you want to generalize to.
What is a statistical unit?
Single entity measured (person, firm, day, site, etc.).
What is representativeness?
Sample resembles population; supports generalization.
What is operationalization?
Turning an abstract construct into a measurable variable.
What is concept validity?
Whether the measured variable truly captures the intended construct (measurement matches concept).
What is internal validity?
Whether observed association reflects causal effect within the studied sample (no major bias/confounding).
What is external validity?
Whether results generalize to other populations/settings.
What is model specification?
Choice of predictors/functional form (transformations/interactions) included in the model.