Reinforcement Learning Flashcards

Question 1

Q

What is operant conditioning?

Answer

A

The learning process by which behavior is shaped in response to reinforcement punishment

Question 2

Q

What is reinforcement?

Answer

A

Making a behavior more likely to occur

Question 3

Q

What is punishment?

Answer

A

Making a behavior less likely to occur

Question 4

Q

What are the four types of operant conditioning?

Answer

A

Positive reinforcement
Negative reinforcement
Positive punishment
Negative punishment

Question 5

Q

What is positive reinforcement?

Answer

A

Provide something good to increase behavior
Ex. Do homework -> get ice cream -> more likely to do homework in future

Question 6

Q

What is negative reinforcement?

Answer

A

Take something bad (aversive) away to increase behavior
Ex. Do homework -> don’t have to take out trash -> more likely to do homework in future

Question 7

Q

What is positive punishment?

Answer

A

Provide something bad to decrease behavior
Ex. Be mean to your brother -> take out the trash -> less likely to be mean to your brother
Spanking

Question 8

Q

What is negative punishment?

Answer

A

Take something good away to decrease the behavior
Ex. be mean to your brother -> don’t get any ice cream -> less likely to be mean to your brother

Question 9

Q

What is classical conditioning?

Answer

A

The learning process by which an initially neutral stimulus (a conditioned stimulus, CS) after being paired with another stimulus (an unconditioned stimulus, US) that naturally evokes some automatic response or reflect (unconditioned response, UR), starts to evoke that same response or reflex (conditioned response, CR)

Question 10

Q

What is the difference between classical and operant conditioning?

Answer

A

The goal of classical conditioning is to predict a behavior
The goal of operant conditioning is to control behavior

In operant conditioning, but not classical conditioning, reward or punishment is contingent on behavior

Question 11

Q

Kamin Blocking Study

Answer

A

If you condition the mouse so that when the light flashes and a beep is made to expect cheese. When you test them on light or the sound by itself, the mouse in either condition expects cheese.

If you condition/train the mouse so that they associate the light with cheese. But then add a sound so that the light flashes, and then hear the sound and get cheese.
When you test them, with only light, they expect the cheese. But if you just play the sound, they don’t expect the cheese.

This is known as blocking

Question 12

Q

Why is there blocking?

Answer

A

Conditioning only happens when there is surprise

Question 13

Q

What is the Rescorla-Wagner Model?

Answer

A

We update our associations between the conditioned stimulus and the unconditioned stimulus (expected values) only to the extent we are surprised.

Expected Value(t+1) = Expected Value(t) + Learning rate*Prediction error(t)
where Prediction error(t) = Reward(t) - Expected value(t)
δ (or Δ) = Prediction error

Question 14

Q

What are the strengths of the Rescorla-Wagner Model?

Answer

A

Error-driven learning is intuitive
Simple - few variables and parameters
Correctly predicted several experimental results ahead of time

Question 15

Q

What are the shortcomings of the Rescorla-Wagner Model?

Answer

A

Can’t explain all the classical conditioning findings, specifically referring to timing (stronger conditioning the closer CS and US are in time) and higher-order conditioning

Question 16

Q

What is the Temporal Difference (TD) model?

Answer

Study These Flashcards

A

It is a generalized version of the Rescorla-Wagner model, which calculates predictions and error and updates associations as stimuli are perceived in real time

Question 17

Q

What is the TD equation?

Answer

Study These Flashcards

A

New value of A = V(A) + LR[(R(A) + V(B) - V(A)]

V(A) = Value of A
LR = learning rate
R(A) = Reward for A
V(B) = Value of B

Question 18

Q

What are Marr’s Level for classical conditioning?

Answer

Study These Flashcards

A

Computational level: Predictions of unconditioned stimulus based on conditioned stimulus (usually)

Algorithmic: Rescorla-Wagner model, and temporal difference model are possible algorithmic explanations

Question 19

Q

Schultz, Dayan & Montague (Monkey) Study

Answer

Study These Flashcards

A

Procedures:
Taught monkeys to associate light bursts with juice rewards (classical conditioning)

Recorded action potentials from dopamine neurons in the ventral tegmental area (VTA)

Findings:
Before conditioning, dopamine neurons spike more when monkeys get a reward

After conditioning, dopamine neurons spike in response to the conditioned stimulus (light), but no longer the reward

After conditioning, dopamine neurons actually suppress firing when the expected reward is omitted

Dopamine neuron firing correlated with prediction error from TD model

Conclusion:
Prediction error can drive decision-making

Question 20

Q

What are does the TD model predict of the dopamine neuron firing?

Answer

Study These Flashcards

A

Reward Unexpected
We weren’t expecting a reward, so when a reward occurs, neurons fire in response to the reward

Reward Expected
After being conditioned, we expect the reward in relation to the conditioned stimulus, so when we see the conditioned stimulus, we get a spike when we see the light instead of the reward itself, but then it goes back to normal

Reward Absent
After being conditioned, we expect the reward to happen after the light, but when we don’t get the reward we are expecting, the value is decreased, so we see a suppression in neurons firing.

Reinforcement Learning Flashcards

(20 cards)