Chapter 4.4: Regularisation- Advanced analysis of Lasso Flashcards

(28 cards)

1
Q

What is the Lasso optimisation problem?

A

Can drop the constant term to make the algebra easier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the prediction error of the Lasso solution?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Theorem 2: the Slow Rate of the prediction error of the Lasso solution?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the proof of the Theorem 2?

A

CHECK: Why does the proof he uses not contain the Y-bar part of the proof?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What can we note about the slow-rate of convergence of Lasso?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is some useful notation we will use for column and component extraction operations?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we denote β* as a sparse vector, though using signal coordinates ?

What can we do once we have set up these sets in an artificial sort of situation?

How does this relate to Lasso, and what do we need to do this?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Definition 2: The compatibility condition?

A

As the infimum of all δ in RP such that δS is not equal to to the zero vector and the L1 norm of the noise coodinates is not too big compared to the L1 of the signal coordinates

numerator:
* Xδ –> average magnitude over all the N coordinates

denominator:
* δ2 –> looking at the signal part of delta
* as |S| is the size of S, the denominator is roughly the average of the S coordinates

  • So roughly speaking if your vector is not that concentrated on the noise coordinate, it has at least some significant component in the support set (signal coordinate)
  • Then, once you apply the design matrix transformation, the average magnitude will not shrink compared to the average magnitude of the signal set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

IF XTX/n has a minimum eigenvalue cmin 0, what do we know about the compatibility condition? How do we verify this?

A

GO THROUGH THIS

around eigenvector and c_min being the answer

12 Feb 30mins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does this change in high dimensions?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Theorem 3 (fast rate) of convergence of Lasso?

A
  • The first equation is the prediction error of Lasso
  • The second equation is the estimation error of Lasso
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the proof of Theorem 3 (fast rate) of convergence of Lasso?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do we know that the compatibility condition is a property of the design matrix?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why does Lasso not have a closed-form solution, unlike ridge regression?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the definition of a subgradient of a convex function?

What do we call the set of subgradients of f at x?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Proposition 3 regarding the value of ∂f(x) given f is convex and differentiable at x ∈ int(C)

16
Q

What is the subgradient of:

f: x I–> |x|

17
Q

Proposition 4: Let f1, f2: Rd –> R be convex

a) for any α > 0, what is ∂(αf1)(x)

b) ∂(f1 + f2))(x)

c) h(x) = f1(Ax +b) what is ∂h(x)

18
Q

Proposition 5: What is the (simplified) KKT condition?

A
  • allows us to quantify the maximiser of an objective function, which is convex but not necessarily differentiable
  • You can find the minimiser of a differential function such that its gradient is 0
19
Q

What is the proof of Proposition 5: What is the (simplified) KKT condition?

20
Q

What is Proposition 6 relating to the subdifferential of the l1-norm ||.||?

21
Q

What is the proof of Proposition 6

A

12(2) Feb 20 mins–> go through the proof

22
Q

What is the theorem that fully characterises Lasso solutions?

23
Q

What is the proof of Theorem 4, the fully characterised Lasso solution?

A

EXPAND AND FILL IN THIS PROOF

24
What is Corollary 3: Lasso solution under orthogonal design?
25
What is Proposition 7 regarding the uniqueness of the Lasso solution?
26
What are the variable selection properties of Lasso? What Theorem relates to this, and how can you interpret that?
Irrepresentable condition : ||∆|| <= 1
27
What is the partial converse of the Variable Selection Theorem of Lasso?