ML: Linear Algebra Flashcards

Question 1

Q

What is the geometric interpretation of a matrix multiplying a vector?

Answer

A

A linear transformation that can scale, rotate, shear, reflect, or project the vector in space.

Question 2

Q

What determines whether a matrix is invertible?

Answer

A

It must be square.

Its determinant must be non-zero.

Its columns/rows must be linearly independent.

Question 3

Q

What does it mean for vectors to be linearly independent?

Answer

A

No vector can be written as a linear combination of the others.

Question 4

Q

What is the rank of a matrix, and why is it important in ML?

Answer

A

Rank = number of linearly independent columns.

In ML it affects:

Feature redundancy

Identifiability of solutions

Behavior of least-squares

Condition number (stability)

Question 5

Q

What is the null space of a matrix?

Answer

A

Set of all vectors x such that Ax = 0.

In ML, it identifies feature directions that have no effect on predictions.

Question 6

Q

What is the determinant intuitively?

Answer

A

A scaling factor for volume under the matrix transformation.

Question 7

Q

Why is linear algebra foundational to gradient-based optimization?

Answer

A

Gradients and Hessians are vectors/matrices; optimization depends on:

Vector calculus

Matrix multiplications

Eigenvalues of Hessian → curvature

Question 8

Q

Why is linear regression often solved using QR decomposition instead of the normal equation?

Answer

A

Because QR is numerically more stable;
𝑋^𝑇 𝑋 amplifies conditioning issues.

Question 9

Q

What is the connection between PCA and the SVD?

Answer

A

PCA = SVD of centered data matrix:

𝑋=𝑈Σ𝑉^𝑇

Principal components = columns of V.

Question 10

Q

Why do eigenvalues matter for training stability in deep learning?

Answer

A

Hessian eigenvalues indicate curvature:

Very large → exploding gradients

Very small → vanishing gradients

Spread → ill-conditioned optimization

Question 11

Q

In vectorized ML code, why is broadcasting important?

Answer

A

Allows computation across batches or dimensions without explicit loops → faster GPU/TPU execution.

Question 12

Q

How does matrix multiplication underpin transformer attention?

Answer

A

Attention = Q Kᵀ to compute pairwise similarities.

Then softmax and multiply by V.

Question 13

Q

Why is the softmax attention matrix low-rank in many real tasks?

Answer

A

Tokens often lie in a lower-dimensional semantic subspace.

This enables techniques like:

FlashAttention

Low-rank adapters (LoRA)

Speculative decoding

Agent routing to specialized LLMs

Question 14

Q

How does LoRA use linear algebra to reduce fine-tuning cost?

Answer

A

LoRA decomposes weight updates into a low-rank decomposition:

Δ𝑊=𝐵𝐴

where B and A are small matrices.

Reduces trainable parameters by orders of magnitude.

Question 15

Q

In agentic workflows, why is vector-space embedding similarity critical?

Answer

A

Agents must:

Retrieve memory

Plan next steps

Rank tools/skills

Route tasks between models

Similarity computations = dot products (cosine similarity).

Question 16

Q

How does linear algebra support vector databases used in agent memory?

Answer

Study These Flashcards

A

Vector search = nearest neighbors in high-dimensional space using:

L2 distance

Inner product

Approximate search using low-rank projections

Question 17

Q

What is the Moore-Penrose pseudoinverse?

Answer

Study These Flashcards

A

Generalized inverse for non-square or rank-deficient matrices.

Used for:

Linear regression

Solving Ax = b when no exact solution exists

Question 18

Q

Explain how power iteration finds dominant eigenvalues.

Answer

Study These Flashcards

A

Repeatedly apply A to a vector; it aligns with the largest eigenvector.

Useful in:

PageRank

Large-scale graph analysis

Real-time agent routing

Question 19

Q

Why do RNNs suffer from vanishing gradients in terms of linear algebra?

Answer

Study These Flashcards

A

Repeated multiplication by weight matrix W:

If all eigenvalues of matrix W < 1 in magnitude → spectral radius < 1 → shrink exponentially.

Network can’t learn long-term dependencies because loss of earlier signals as gradients vanish.

If > 1 → explode.

Question 20

Q

What is the Kronecker product and where is it used?

Answer

Study These Flashcards

A

Block matrix product.

Used in:

Gaussian processes

Structured neural networks

Fourier features

ML: Linear Algebra Flashcards

(20 cards)