Reinforcement Learning Flashcards

(20 cards)

1
Q

What is operant conditioning?

A

The learning process by which behavior is shaped in response to reinforcement punishment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is reinforcement?

A

Making a behavior more likely to occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is punishment?

A

Making a behavior less likely to occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the four types of operant conditioning?

A

Positive reinforcement
Negative reinforcement
Positive punishment
Negative punishment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is positive reinforcement?

A

Provide something good to increase behavior
Ex. Do homework -> get ice cream -> more likely to do homework in future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is negative reinforcement?

A

Take something bad (aversive) away to increase behavior
Ex. Do homework -> don’t have to take out trash -> more likely to do homework in future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is positive punishment?

A

Provide something bad to decrease behavior
Ex. Be mean to your brother -> take out the trash -> less likely to be mean to your brother
Spanking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is negative punishment?

A

Take something good away to decrease the behavior
Ex. be mean to your brother -> don’t get any ice cream -> less likely to be mean to your brother

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is classical conditioning?

A

The learning process by which an initially neutral stimulus (a conditioned stimulus, CS) after being paired with another stimulus (an unconditioned stimulus, US) that naturally evokes some automatic response or reflect (unconditioned response, UR), starts to evoke that same response or reflex (conditioned response, CR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between classical and operant conditioning?

A

The goal of classical conditioning is to predict a behavior
The goal of operant conditioning is to control behavior

In operant conditioning, but not classical conditioning, reward or punishment is contingent on behavior

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Kamin Blocking Study

A

If you condition the mouse so that when the light flashes and a beep is made to expect cheese. When you test them on light or the sound by itself, the mouse in either condition expects cheese.

If you condition/train the mouse so that they associate the light with cheese. But then add a sound so that the light flashes, and then hear the sound and get cheese.
When you test them, with only light, they expect the cheese. But if you just play the sound, they don’t expect the cheese.

This is known as blocking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is there blocking?

A

Conditioning only happens when there is surprise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Rescorla-Wagner Model?

A

We update our associations between the conditioned stimulus and the unconditioned stimulus (expected values) only to the extent we are surprised.

Expected Value(t+1) = Expected Value(t) + Learning rate*Prediction error(t)
where Prediction error(t) = Reward(t) - Expected value(t)
δ (or Δ) = Prediction error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the strengths of the Rescorla-Wagner Model?

A

Error-driven learning is intuitive
Simple - few variables and parameters
Correctly predicted several experimental results ahead of time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the shortcomings of the Rescorla-Wagner Model?

A

Can’t explain all the classical conditioning findings, specifically referring to timing (stronger conditioning the closer CS and US are in time) and higher-order conditioning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the Temporal Difference (TD) model?

A

It is a generalized version of the Rescorla-Wagner model, which calculates predictions and error and updates associations as stimuli are perceived in real time

17
Q

What is the TD equation?

A

New value of A = V(A) + LR[(R(A) + V(B) - V(A)]

V(A) = Value of A
LR = learning rate
R(A) = Reward for A
V(B) = Value of B

18
Q

What are Marr’s Level for classical conditioning?

A

Computational level: Predictions of unconditioned stimulus based on conditioned stimulus (usually)

Algorithmic: Rescorla-Wagner model, and temporal difference model are possible algorithmic explanations

19
Q

Schultz, Dayan & Montague (Monkey) Study

A

Procedures:
Taught monkeys to associate light bursts with juice rewards (classical conditioning)

Recorded action potentials from dopamine neurons in the ventral tegmental area (VTA)

Findings:
Before conditioning, dopamine neurons spike more when monkeys get a reward

After conditioning, dopamine neurons spike in response to the conditioned stimulus (light), but no longer the reward

After conditioning, dopamine neurons actually suppress firing when the expected reward is omitted

Dopamine neuron firing correlated with prediction error from TD model

Conclusion:
Prediction error can drive decision-making

20
Q

What are does the TD model predict of the dopamine neuron firing?

A

Reward Unexpected
We weren’t expecting a reward, so when a reward occurs, neurons fire in response to the reward

Reward Expected
After being conditioned, we expect the reward in relation to the conditioned stimulus, so when we see the conditioned stimulus, we get a spike when we see the light instead of the reward itself, but then it goes back to normal

Reward Absent
After being conditioned, we expect the reward to happen after the light, but when we don’t get the reward we are expecting, the value is decreased, so we see a suppression in neurons firing.