Correlation, Dependence, and Causality Flashcards

(43 cards)

1
Q

What is correlation between two variables X and Y at a high level?

A

A standardized measure of the strength and direction of a linear relationship between X and Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is Pearson correlation coefficient defined in terms of covariance?

A

ρ(X,Y) = Cov(X,Y) / (σ_X σ_Y), where σ_X and σ_Y are the standard deviations of X and Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the possible values of Pearson correlation?

A

Between −1 and 1 inclusive, where −1 is perfect negative linear, 0 is no linear correlation, and 1 is perfect positive linear.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does ρ(X,Y) = 0 imply about X and Y?

A

There is no linear correlation; they may still be dependent in a nonlinear way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Does high correlation imply that X causes Y?

A

No; correlation alone does not establish causality, only association.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is spurious correlation?

A

An observed correlation between variables that arises from coincidence or common causes rather than a direct relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a confounder in causal reasoning?

A

A variable that influences both the potential cause and the outcome, potentially creating a misleading association.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can confounding lead to incorrect causal conclusions?

A

If not controlled, changes in the confounder may be mistaken for causal effects of the variable of interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why is ‘correlation ≠ causation’ particularly important in ML contexts?

A

Models can capture patterns that are predictive but unstable or non-causal; deploying them can fail when underlying associations change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is rank-based (Spearman) correlation at a high level?

A

A correlation measure based on ranks of the data, capturing monotonic relationships (not just linear) and being more robust to outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When might Spearman correlation be preferred over Pearson?

A

When relationships are monotonic but not linear, or when outliers and non-Normality make Pearson unreliable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is partial correlation?

A

The correlation between two variables after removing the linear effects of one or more other variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why can partial correlation be more informative than raw correlation?

A

It helps isolate associations that are not explained by obvious third variables, though it still does not prove causality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is dependence in probability theory?

A

Any situation where the joint distribution of variables does not factor into the product of their marginals; knowing one gives information about the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is independence stronger than zero correlation?

A

Independence implies zero correlation for variables with finite variance, but zero correlation does not rule out nonlinear dependence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is mutual information (MI) at a high level?

A

A nonnegative measure of how much knowing one variable reduces uncertainty about another, capturing general dependence, not just linear.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why is mutual information useful in feature selection?

A

It can detect nonlinear and non-monotonic relationships between features and targets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the danger of selecting features solely based on correlation with the label?

A

You may pick many redundant or spurious features and inflate false discoveries due to multiple testing.

19
Q

What is the difference between predictive and causal relationships?

A

Predictive relationships help forecast outcomes, while causal relationships describe how interventions on one variable would change another.

20
Q

Why can a non-causal feature still be useful in ML?

A

Even non-causal features can be strongly predictive if they are stable proxies or capture useful information about the outcome.

21
Q

When can using non-causal features be dangerous?

A

When underlying associations can change under new policies, environments, or behaviors, causing models to fail or behave unfairly.

22
Q

What is a causal effect at a high level?

A

The difference in an outcome that would occur under one intervention versus another, holding everything else constant in a conceptual experiment.

23
Q

What is the role of randomized experiments in discovering causal effects?

A

Random assignment breaks confounding, so differences in outcomes between groups can be interpreted as causal under reasonable assumptions.

24
Q

Why are randomized experiments not always feasible?

A

They can be expensive, unethical, or logistically impossible in some settings.

25
What is an observational study?
An analysis of data where treatments or exposures are not randomized, so confounding must be addressed statistically.
26
Why is causal inference from observational data more fragile than from experiments?
It relies on untestable assumptions about no unmeasured confounding and correct model specification.
27
What is a directed acyclic graph (DAG) in causal modeling?
A graph where nodes are variables and directed edges represent assumed causal relationships, with no cycles.
28
How can DAGs help reason about confounding and conditioning?
They make assumptions explicit, showing which variables to adjust for to block backdoor paths and avoid opening colliders.
29
What is a backdoor path in a causal DAG?
A path from treatment to outcome that starts with an arrow into the treatment, typically representing confounding influences.
30
What is the backdoor criterion (informally)?
A condition that identifies sets of variables to adjust for to block all backdoor paths and estimate causal effects.
31
What is a collider in a DAG?
A node where two arrows meet head-to-head; conditioning on it can introduce spurious associations between its parents.
32
Why is conditioning on colliders dangerous?
It can create dependence between variables that were previously independent, biasing causal estimates.
33
What is Simpson’s paradox at a high level?
A phenomenon where an association reverses direction when data are aggregated versus stratified by a third variable.
34
Why is Simpson’s paradox relevant to ML and analytics?
It illustrates how ignoring relevant stratification or confounders can lead to completely misleading conclusions from correlations.
35
What is selection bias?
Bias introduced when the data we observe are not a random sample from the target population, often due to the selection mechanism.
36
How can selection bias affect ML models?
Models can learn relationships that only hold in the selected sample and fail when deployed to a broader or different population.
37
What is covariate shift in ML?
A change in the distribution of input features between training and deployment while the conditional distribution of output given input remains the same.
38
How does covariate shift relate to correlation and dependence?
Correlations learned in the training distribution may change under covariate shift, affecting model performance.
39
Why should ML engineers care about causal structure even if they only build predictive models?
Understanding causal structure helps anticipate which patterns are stable under interventions or policy changes and which may break.
40
What is an example of a stable causal feature vs an unstable proxy?
Underlying health status may be a stable cause of hospital visits, while insurance plan might be a proxy that changes with policy or market shifts.
41
What is the high-level idea of invariant risk minimization (IRM) and related causal ML methods?
To learn predictors whose relationships to the outcome remain stable across different environments, approximating causal predictors.
42
Why is overfitting to spurious correlations a central risk in ML?
Models may latch onto patterns that are strong in historical data but vanish under new conditions, causing sharp drops in performance.
43
In one sentence, what should you remember about correlation, dependence, and causality for ML?
Correlation and dependence tell you what predicts well in current data, but only causal reasoning and experiments can tell you which relationships will remain stable under change.