Perceptron Learning Algorithm P2 Flashcards

(9 cards)

1
Q

How do you find the best set of weights (W’)

A

Define the cost function J(W’)
Apply the gradient descent algorithm (iterative algorithm that finds the minimum of the cost function)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define the cost function J(W’)

A

It is the sum of losses L(W’) for the misclassified samples
If the sample is misclassified as positive and is actually negative, it is multiplied by -1 and then added to the sum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Is J(W’) always positive or negative

A

Positive. as L(W’) is always positive because if the sample being added is negative, it is *-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define L(W’)

A

This function measures the distance between a misclassified sample and the decision boundary

The first variable on the RHS is either 1 or -1 depending on the sign of the misclassified sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe gradient descent to minimise the cost function

A

The gradient of the cost function dJ/dw1 shows the slope of J at any point w1

If the gradient is positive we move left, towards the negative

If the gradient is negative, we move right towards the positive

We are trying to find the minimum of the cost function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the formula for the new set of weights using the gradient descent rule

A

= W’(t) - rate of learning * gradient of the cost function

The gradient of the cost function is a vector of partial derivatives that points in the direction of the steepest increase in cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the main effect of choosing a too large learning rate in gradient descent

A

The updates may overshoot the minimum and bounce back and forth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Advantage and disadvantage of having a small learning rate

A

It is more stable but converges more slowly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Formula for the new set of weights using gradient dexcent

A

Old weights - learning rate * derivative of loss function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly