ML: Dimensionality Reduction Flashcards by Henry Ye

What is the goal of dimensionality reduction?

To reduce the number of features while retaining as much information (variance, structure, or signal) as possible. Benefits include:

Visualization

Noise reduction

Faster computation

Avoiding curse of dimensionality

How well did you know this?

Not at all

Perfectly

Difference between feature selection and feature extraction?

Selection: choose a subset of original features

Extraction: create new features (linear/nonlinear combinations) from original features

How well did you know this?

Not at all

Perfectly

What is the “curse of dimensionality”?

As dimensions increase, data becomes sparse, distance metrics lose meaning, and models overfit → DR mitigates this.

How well did you know this?

Not at all

Perfectly

Why is dimensionality reduction important for LLM embeddings?

Reduce 1536/4096-dim embeddings → faster retrieval and similarity search

Noise reduction → improves clustering & vector search ranking

Visualizing embedding space to debug retrieval or tool routing

How well did you know this?

Not at all

Perfectly

What is PCA?

PCA is a linear DR method that projects data onto orthogonal directions (principal components) that maximize variance.

How well did you know this?

Not at all

Perfectly

How is PCA computed?

Center data

Compute covariance matrix

Eigen decomposition of covariance matrix

Select top-k eigenvectors → project data

OR via SVD:

𝑋=𝑈Σ𝑉^𝑇

How well did you know this?

Not at all

Perfectly

What do eigenvalues in PCA represent?

Variance explained by each principal component. Larger eigenvalue → more variance captured.

How well did you know this?

Not at all

Perfectly

How do you choose the number of components in PCA?

Cumulative explained variance (e.g., 90–95%)

Scree plot (elbow method)

Downstream task performance

How well did you know this?

Not at all

Perfectly

What preprocessing is required before PCA?

Centering (subtract mean)

Standardization (divide by std) if features have different scales

Optional whitening for decorrelated components

How well did you know this?

Not at all

Perfectly

What are limitations of PCA?

Linear assumption → cannot capture nonlinear manifolds

Sensitive to outliers

Components may be hard to interpret

Variance ≠ predictive power

How well did you know this?

Not at all

Perfectly

When would you use Kernel PCA?

Data lies on a nonlinear manifold

Classical PCA fails to capture structure

Kernel trick maps to higher-dim space before PCA

How well did you know this?

Not at all

Perfectly

What are t-SNE and UMAP?

t-SNE: nonlinear DR for visualization, preserves local neighborhood structure

UMAP: faster nonlinear DR, preserves local + some global structure, scalable

How well did you know this?

Not at all

Perfectly

What is a risk of using t-SNE for downstream ML tasks?

t-SNE is stochastic → inconsistent embeddings

Distances are not globally meaningful

Mainly for visualization, not feature extraction for classifiers

How well did you know this?

Not at all

Perfectly

How are autoencoders used for DR?

Neural networks learn bottleneck representation (compressed latent space)

Nonlinear DR → capture complex manifolds

Can reconstruct original features → minimize information loss

How well did you know this?

Not at all

Perfectly

What is whitening?

Scaling PCA components so they have unit variance → decorrelated features, used in some preprocessing pipelines.

How well did you know this?

Not at all

Perfectly

Why is SVD useful for DR?

Study These Flashcards

Decomposes data into orthogonal modes → captures variance

Can compute PCA efficiently via SVD

Handles rectangular matrices (more samples than features or vice versa)

When prefer feature selection over extraction?

Study These Flashcards

When interpretability is important

When computation is cheap and features are meaningful individually

How do you evaluate PCA or DR performance?

Study These Flashcards

Explained variance ratio

Downstream task accuracy

Reconstruction error (for autoencoders)

Visual inspection for clustering or separation

When prefer feature extraction?

Study These Flashcards

High-dimensional data (text, images, embeddings)

Reduce noise, redundancy

Downstream ML benefits from compressed representation

How do you decide between linear PCA and nonlinear methods?

Study These Flashcards

Start with PCA for interpretability and speed

Use t-SNE, UMAP, or autoencoders if linear PCA fails to capture important structure

How is DR evaluated in retrieval/RAG pipelines?

Study These Flashcards

Embedding compression → check retrieval recall@k

Clustering quality in latent space → silhouette score or kNN accuracy

Speed vs quality trade-off in vector search

Your linear regression on PCA components fails. Why?

Study These Flashcards

PCA maximizes variance, not predictive power

Some important low-variance features may be lost

Consider supervised DR (PLS, LDA, or autoencoder with supervised loss)

You reduce embeddings from 4096 → 128 dims via PCA. Recall@10 in retrieval drops slightly. What do you do?

Study These Flashcards

Check explained variance → increase components

Try whitening/scaling

Consider nonlinear DR (autoencoder, UMAP)

Evaluate reconstruction or downstream metrics

t-SNE shows clusters but nearest neighbors in original space don’t match. Is this a problem?

Study These Flashcards

No, t-SNE preserves local structure and is stochastic → not suitable for quantitative neighbor tasks.

You’re building an LLM retrieval pipeline and embeddings are too high-dimensional for FAISS.

Apply PCA/Truncated SVD for linear compression Possibly use product quantization Verify retrieval metrics (recall@k, MRR) after compression

Autoencoder latent space produces poor downstream classification accuracy.

Latent dimension too small → information loss Nonlinear features not learned → adjust network depth or activation Consider hybrid supervised + reconstruction loss

ML: Dimensionality Reduction Flashcards

(27 cards)