What is operant conditioning?
The learning process by which behavior is shaped in response to reinforcement punishment
What is reinforcement?
Making a behavior more likely to occur
What is punishment?
Making a behavior less likely to occur
What are the four types of operant conditioning?
Positive reinforcement
Negative reinforcement
Positive punishment
Negative punishment
What is positive reinforcement?
Provide something good to increase behavior
Ex. Do homework -> get ice cream -> more likely to do homework in future
What is negative reinforcement?
Take something bad (aversive) away to increase behavior
Ex. Do homework -> don’t have to take out trash -> more likely to do homework in future
What is positive punishment?
Provide something bad to decrease behavior
Ex. Be mean to your brother -> take out the trash -> less likely to be mean to your brother
Spanking
What is negative punishment?
Take something good away to decrease the behavior
Ex. be mean to your brother -> don’t get any ice cream -> less likely to be mean to your brother
What is classical conditioning?
The learning process by which an initially neutral stimulus (a conditioned stimulus, CS) after being paired with another stimulus (an unconditioned stimulus, US) that naturally evokes some automatic response or reflect (unconditioned response, UR), starts to evoke that same response or reflex (conditioned response, CR)
What is the difference between classical and operant conditioning?
The goal of classical conditioning is to predict a behavior
The goal of operant conditioning is to control behavior
In operant conditioning, but not classical conditioning, reward or punishment is contingent on behavior
Kamin Blocking Study
If you condition the mouse so that when the light flashes and a beep is made to expect cheese. When you test them on light or the sound by itself, the mouse in either condition expects cheese.
If you condition/train the mouse so that they associate the light with cheese. But then add a sound so that the light flashes, and then hear the sound and get cheese.
When you test them, with only light, they expect the cheese. But if you just play the sound, they don’t expect the cheese.
This is known as blocking
Why is there blocking?
Conditioning only happens when there is surprise
What is the Rescorla-Wagner Model?
We update our associations between the conditioned stimulus and the unconditioned stimulus (expected values) only to the extent we are surprised.
Expected Value(t+1) = Expected Value(t) + Learning rate*Prediction error(t)
where Prediction error(t) = Reward(t) - Expected value(t)
δ (or Δ) = Prediction error
What are the strengths of the Rescorla-Wagner Model?
Error-driven learning is intuitive
Simple - few variables and parameters
Correctly predicted several experimental results ahead of time
What are the shortcomings of the Rescorla-Wagner Model?
Can’t explain all the classical conditioning findings, specifically referring to timing (stronger conditioning the closer CS and US are in time) and higher-order conditioning
What is the Temporal Difference (TD) model?
It is a generalized version of the Rescorla-Wagner model, which calculates predictions and error and updates associations as stimuli are perceived in real time
What is the TD equation?
New value of A = V(A) + LR[(R(A) + V(B) - V(A)]
V(A) = Value of A
LR = learning rate
R(A) = Reward for A
V(B) = Value of B
What are Marr’s Level for classical conditioning?
Computational level: Predictions of unconditioned stimulus based on conditioned stimulus (usually)
Algorithmic: Rescorla-Wagner model, and temporal difference model are possible algorithmic explanations
Schultz, Dayan & Montague (Monkey) Study
Procedures:
Taught monkeys to associate light bursts with juice rewards (classical conditioning)
Recorded action potentials from dopamine neurons in the ventral tegmental area (VTA)
Findings:
Before conditioning, dopamine neurons spike more when monkeys get a reward
After conditioning, dopamine neurons spike in response to the conditioned stimulus (light), but no longer the reward
After conditioning, dopamine neurons actually suppress firing when the expected reward is omitted
Dopamine neuron firing correlated with prediction error from TD model
Conclusion:
Prediction error can drive decision-making
What are does the TD model predict of the dopamine neuron firing?
Reward Unexpected
We weren’t expecting a reward, so when a reward occurs, neurons fire in response to the reward
Reward Expected
After being conditioned, we expect the reward in relation to the conditioned stimulus, so when we see the conditioned stimulus, we get a spike when we see the light instead of the reward itself, but then it goes back to normal
Reward Absent
After being conditioned, we expect the reward to happen after the light, but when we don’t get the reward we are expecting, the value is decreased, so we see a suppression in neurons firing.