ML: Dimensionality Reduction Flashcards

(27 cards)

1
Q

What is the goal of dimensionality reduction?

A

To reduce the number of features while retaining as much information (variance, structure, or signal) as possible. Benefits include:

Visualization

Noise reduction

Faster computation

Avoiding curse of dimensionality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Difference between feature selection and feature extraction?

A

Selection: choose a subset of original features

Extraction: create new features (linear/nonlinear combinations) from original features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the “curse of dimensionality”?

A

As dimensions increase, data becomes sparse, distance metrics lose meaning, and models overfit → DR mitigates this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is dimensionality reduction important for LLM embeddings?

A

Reduce 1536/4096-dim embeddings → faster retrieval and similarity search

Noise reduction → improves clustering & vector search ranking

Visualizing embedding space to debug retrieval or tool routing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is PCA?

A

PCA is a linear DR method that projects data onto orthogonal directions (principal components) that maximize variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is PCA computed?

A

Center data

Compute covariance matrix

Eigen decomposition of covariance matrix

Select top-k eigenvectors → project data

OR via SVD:

𝑋=𝑈Σ𝑉^𝑇

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do eigenvalues in PCA represent?

A

Variance explained by each principal component. Larger eigenvalue → more variance captured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you choose the number of components in PCA?

A

Cumulative explained variance (e.g., 90–95%)

Scree plot (elbow method)

Downstream task performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What preprocessing is required before PCA?

A

Centering (subtract mean)

Standardization (divide by std) if features have different scales

Optional whitening for decorrelated components

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are limitations of PCA?

A

Linear assumption → cannot capture nonlinear manifolds

Sensitive to outliers

Components may be hard to interpret

Variance ≠ predictive power

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When would you use Kernel PCA?

A

Data lies on a nonlinear manifold

Classical PCA fails to capture structure

Kernel trick maps to higher-dim space before PCA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are t-SNE and UMAP?

A

t-SNE: nonlinear DR for visualization, preserves local neighborhood structure

UMAP: faster nonlinear DR, preserves local + some global structure, scalable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a risk of using t-SNE for downstream ML tasks?

A

t-SNE is stochastic → inconsistent embeddings

Distances are not globally meaningful

Mainly for visualization, not feature extraction for classifiers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How are autoencoders used for DR?

A

Neural networks learn bottleneck representation (compressed latent space)

Nonlinear DR → capture complex manifolds

Can reconstruct original features → minimize information loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is whitening?

A

Scaling PCA components so they have unit variance → decorrelated features, used in some preprocessing pipelines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is SVD useful for DR?

A

Decomposes data into orthogonal modes → captures variance

Can compute PCA efficiently via SVD

Handles rectangular matrices (more samples than features or vice versa)

16
Q

When prefer feature selection over extraction?

A

When interpretability is important

When computation is cheap and features are meaningful individually

17
Q

How do you evaluate PCA or DR performance?

A

Explained variance ratio

Downstream task accuracy

Reconstruction error (for autoencoders)

Visual inspection for clustering or separation

18
Q

When prefer feature extraction?

A

High-dimensional data (text, images, embeddings)

Reduce noise, redundancy

Downstream ML benefits from compressed representation

19
Q

How do you decide between linear PCA and nonlinear methods?

A

Start with PCA for interpretability and speed

Use t-SNE, UMAP, or autoencoders if linear PCA fails to capture important structure

20
Q

How is DR evaluated in retrieval/RAG pipelines?

A

Embedding compression → check retrieval recall@k

Clustering quality in latent space → silhouette score or kNN accuracy

Speed vs quality trade-off in vector search

21
Q

Your linear regression on PCA components fails. Why?

A

PCA maximizes variance, not predictive power

Some important low-variance features may be lost

Consider supervised DR (PLS, LDA, or autoencoder with supervised loss)

21
Q

You reduce embeddings from 4096 → 128 dims via PCA. Recall@10 in retrieval drops slightly. What do you do?

A

Check explained variance → increase components

Try whitening/scaling

Consider nonlinear DR (autoencoder, UMAP)

Evaluate reconstruction or downstream metrics

22
Q

t-SNE shows clusters but nearest neighbors in original space don’t match. Is this a problem?

A

No, t-SNE preserves local structure and is stochastic → not suitable for quantitative neighbor tasks.

23
You’re building an LLM retrieval pipeline and embeddings are too high-dimensional for FAISS.
Apply PCA/Truncated SVD for linear compression Possibly use product quantization Verify retrieval metrics (recall@k, MRR) after compression
24
Autoencoder latent space produces poor downstream classification accuracy.
Latent dimension too small → information loss Nonlinear features not learned → adjust network depth or activation Consider hybrid supervised + reconstruction loss
25