Chapter 4.4: Regularisation- Advanced analysis of Lasso Flashcards by Dylan Ottey

What is the Lasso optimisation problem?

Can drop the constant term to make the algebra easier

How well did you know this?

Not at all

Perfectly

What is the prediction error of the Lasso solution?

How well did you know this?

Not at all

Perfectly

What is Theorem 2: the Slow Rate of the prediction error of the Lasso solution?

How well did you know this?

Not at all

Perfectly

What is the proof of the Theorem 2?

CHECK: Why does the proof he uses not contain the Y-bar part of the proof?

How well did you know this?

Not at all

Perfectly

What can we note about the slow-rate of convergence of Lasso?

How well did you know this?

Not at all

Perfectly

What is some useful notation we will use for column and component extraction operations?

How well did you know this?

Not at all

Perfectly

How do we denote β* as a sparse vector, though using signal coordinates ?

What can we do once we have set up these sets in an artificial sort of situation?

How does this relate to Lasso, and what do we need to do this?

How well did you know this?

Not at all

Perfectly

What is Definition 2: The compatibility condition?

As the infimum of all δ in R^P such that δ_S is not equal to to the zero vector and the L₁ norm of the noise coodinates is not too big compared to the L₁ of the signal coordinates

numerator:
* Xδ –> average magnitude over all the N coordinates

denominator:
* δ₂ –> looking at the signal part of delta
* as |S| is the size of S, the denominator is roughly the average of the S coordinates

So roughly speaking if your vector is not that concentrated on the noise coordinate, it has at least some significant component in the support set (signal coordinate)
Then, once you apply the design matrix transformation, the average magnitude will not shrink compared to the average magnitude of the signal set

How well did you know this?

Not at all

Perfectly

IF X^TX/n has a minimum eigenvalue c_min 0, what do we know about the compatibility condition? How do we verify this?

GO THROUGH THIS

around eigenvector and c_min being the answer

12 Feb 30mins

How well did you know this?

Not at all

Perfectly

How does this change in high dimensions?

How well did you know this?

Not at all

Perfectly

Theorem 3 (fast rate) of convergence of Lasso?

The first equation is the prediction error of Lasso
The second equation is the estimation error of Lasso

How well did you know this?

Not at all

Perfectly

What is the proof of Theorem 3 (fast rate) of convergence of Lasso?

How well did you know this?

Not at all

Perfectly

How do we know that the compatibility condition is a property of the design matrix?

How well did you know this?

Not at all

Perfectly

Why does Lasso not have a closed-form solution, unlike ridge regression?

How well did you know this?

Not at all

Perfectly

What is the definition of a subgradient of a convex function?

What do we call the set of subgradients of f at x?

How well did you know this?

Not at all

Perfectly

What is Proposition 3 regarding the value of ∂f(x) given f is convex and differentiable at x ∈ int(C)

Study These Flashcards

What is the subgradient of:

f: x I–> |x|

Study These Flashcards

Proposition 4: Let f₁, f₂: R^d –> R be convex

a) for any α > 0, what is ∂(αf₁)(x)

b) ∂(f₁ + f₂))(x)

c) h(x) = f₁(Ax +b) what is ∂h(x)

Study These Flashcards

Proposition 5: What is the (simplified) KKT condition?

Study These Flashcards

allows us to quantify the maximiser of an objective function, which is convex but not necessarily differentiable
You can find the minimiser of a differential function such that its gradient is 0