ML: Linear Algebra Flashcards

(20 cards)

1
Q

What is the geometric interpretation of a matrix multiplying a vector?

A

A linear transformation that can scale, rotate, shear, reflect, or project the vector in space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What determines whether a matrix is invertible?

A

It must be square.

Its determinant must be non-zero.

Its columns/rows must be linearly independent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does it mean for vectors to be linearly independent?

A

No vector can be written as a linear combination of the others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the rank of a matrix, and why is it important in ML?

A

Rank = number of linearly independent columns.

In ML it affects:

Feature redundancy

Identifiability of solutions

Behavior of least-squares

Condition number (stability)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the null space of a matrix?

A

Set of all vectors x such that Ax = 0.

In ML, it identifies feature directions that have no effect on predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the determinant intuitively?

A

A scaling factor for volume under the matrix transformation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is linear algebra foundational to gradient-based optimization?

A

Gradients and Hessians are vectors/matrices; optimization depends on:

Vector calculus

Matrix multiplications

Eigenvalues of Hessian β†’ curvature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is linear regression often solved using QR decomposition instead of the normal equation?

A

Because QR is numerically more stable;
𝑋^𝑇 𝑋 amplifies conditioning issues.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the connection between PCA and the SVD?

A

PCA = SVD of centered data matrix:

𝑋=π‘ˆΞ£π‘‰^𝑇

Principal components = columns of V.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why do eigenvalues matter for training stability in deep learning?

A

Hessian eigenvalues indicate curvature:

Very large β†’ exploding gradients

Very small β†’ vanishing gradients

Spread β†’ ill-conditioned optimization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In vectorized ML code, why is broadcasting important?

A

Allows computation across batches or dimensions without explicit loops β†’ faster GPU/TPU execution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does matrix multiplication underpin transformer attention?

A

Attention = Q Kα΅€ to compute pairwise similarities.

Then softmax and multiply by V.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is the softmax attention matrix low-rank in many real tasks?

A

Tokens often lie in a lower-dimensional semantic subspace.

This enables techniques like:

FlashAttention

Low-rank adapters (LoRA)

Speculative decoding

Agent routing to specialized LLMs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does LoRA use linear algebra to reduce fine-tuning cost?

A

LoRA decomposes weight updates into a low-rank decomposition:

Ξ”π‘Š=𝐡𝐴

where B and A are small matrices.

Reduces trainable parameters by orders of magnitude.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In agentic workflows, why is vector-space embedding similarity critical?

A

Agents must:

Retrieve memory

Plan next steps

Rank tools/skills

Route tasks between models

Similarity computations = dot products (cosine similarity).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does linear algebra support vector databases used in agent memory?

A

Vector search = nearest neighbors in high-dimensional space using:

L2 distance

Inner product

Approximate search using low-rank projections

17
Q

What is the Moore-Penrose pseudoinverse?

A

Generalized inverse for non-square or rank-deficient matrices.

Used for:

Linear regression

Solving Ax = b when no exact solution exists

18
Q

Explain how power iteration finds dominant eigenvalues.

A

Repeatedly apply A to a vector; it aligns with the largest eigenvector.

Useful in:

PageRank

Large-scale graph analysis

Real-time agent routing

19
Q

Why do RNNs suffer from vanishing gradients in terms of linear algebra?

A

Repeated multiplication by weight matrix W:

If all eigenvalues of matrix W < 1 in magnitude β†’ spectral radius < 1 β†’ shrink exponentially.

Network can’t learn long-term dependencies because loss of earlier signals as gradients vanish.

If > 1 β†’ explode.

20
Q

What is the Kronecker product and where is it used?

A

Block matrix product.

Used in:

Gaussian processes

Structured neural networks

Fourier features