Week 5 - Operant Conditioning (Basic Principles) Flashcards

(41 cards)

1
Q

What is Classical Conditioning?

A

Learning via association between stimuli, producing involuntary, reflexive responses

Example: Dog salivating when it hears a bell associated with food.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Operant Conditioning?

A

Learning via consequences of voluntary behaviour, where behaviours are strengthened by reinforcement and weakened by punishment

Example: A rat pressing a lever to receive food

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the key difference between Classical and Operant Conditioning?

A

Classical = reflexes, associations, involuntary; Operant = decisions, consequences, voluntary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What happens to a behaviour followed by a reward according to operant conditioning?

A

It is repeated/increases in frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What happens to a behaviour followed by a punishment according to operant conditioning?

A

It is avoided/decreases in frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the Law of Effect?

A

Responses followed by satisfaction are more likely to recur, while those without satisfaction are weakened.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Who expanded Thorndike’s ideas in operant conditioning?

A

B.F. Skinner.
- saw operant conditioning as a “theory of everything” for behaviour (though later shown to be incomplete)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the Skinner Box?

A

An automated chamber for observing and measuring behaviour with levers, lights, speakers, and food dispensers.
- allowed continuous obervation and measurement of behaviour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is shaping in operant conditioning?

A

Reinforcing behaviours progressively closer to the target behaviour.

Rats don’t naturally press levers; how do we train such novel behaviour?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an example of shaping?

A

Rewarding a rat for moving towards the lever, then only for touching it, and finally for pressing it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the significance of Skinner’s experiment with pigeons spinning?

A

Demonstrated that complex behaviours can be broken into simple reinforced steps.
* Pigeon Spin: By reinforcing partial turns, Skinner taught pigeons to spin in circles.
* Pigeon Ping Pong: Two pigeons pecked a ball back and forth, shaped through reinforcement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do operant principles explain in real-world learning?

A
  • Animals forage where food is plentiful and avoid barren areas.
  • Learning is adaptive: instincts cover predictable environments, but learning is needed for variable environments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is superstitious behaviour?
Give an example.

A

Behaviour developed from random reinforcement, leading to illusory cause-effect beliefs.
* Skinner (1948): Randomly dispensed food to pigeons every 15 seconds.
* Pigeons developed arbitrary rituals (wing flapping, spinning, pecking), mistakenly believing these behaviours caused food.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an example of superstitious behaviour in humans?

A

Athletes’ pre-game rituals or pressing pedestrian crossing buttons repeatedly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the techniques for teaching behaviour?

A
  • Shaping (Scan and Capture): Reinforce approximations of target behaviour.
  • Baiting: Use food or objects to lure behaviour (e.g., tapping, treats).
  • Mimicry: Reward imitative actions (e.g., parrots copying human sounds).
  • Sculpting: Physically guide an animal into the desired posture/behaviour, then reinforce.
  • Instruction/Language (humans): Verbal cues establish mental connections between actions and outcomes (e.g., “Eat veggies → get dessert”)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is chaining in operant conditioning?

A

Training behaviours in sequence, either forward or backward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the most effective method of chaining?

A

Backward chaining (most effective):
- Teach the last step first so learner experiences the end goal quickly.
- Example: Tying shoelaces – child first pulls final loop through, then earlier steps are added progressively

18
Q

What is a reinforcer?

A

An outcome that increases the likelihood of behaviour.

19
Q

What is a punisher?

A

An outcome that decreases the likelihood of behaviour.

20
Q

What is the difference between positive and negative in operant conditioning?

A
  • Positive: Adding something
  • Negative: Removing something

Do NOT confuse positive/negative with good/bad:
Positive = addition; Negative = subtraction

21
Q

What is the Operant Conditioning Contingency Square?

A
  • Positive Reinforcement: Increases behaviour (e.g., dog gets treat for sitting)
  • Negative Reinforcement: Increases behaviour (e.g., car buzzer stops when seatbelt is fastened)
  • Positive Punishment: Decreases behaviour (e.g., barking dog gets spray collar)
  • Negative Punishment: Decreases behaviour (e.g., child loses iPad privileges)
22
Q

What is bridging in operant conditioning?

A
  • Combines classical and operant conditioning.
  • Pairing: The bridge is first paired with the actual reward (like a treat) until the learner associates the signal with the good outcome.
  • Marking the behavior: When the learner performs the correct behavior, the trainer immediately uses the bridge signal. This tells the learner exactly what they did right.
  • Bridging the delay: Since it’s not always possible to deliver the primary reward at the precise moment, the bridge provides the feedback that the reward is on its way.
23
Q

What is Continuous Reinforcement?

Schedules of Reinforcement

A

Reinforcing every response, good for learning new behaviours.

24
Q

What is Partial Reinforcement?

Schedules of Reinforcement

A

Reinforcing only some responses, creating resistance to extinction.

25
What is Fixed Ratio schedule? ## Footnote Types of Partial Schedules
Reinforced every nth response (e.g., café loyalty card).
26
What is Variable Ratio schedule?
Reinforced on average every nth response (e.g., gambling machines). ## Footnote Types of Partial Schedules
27
Which schedule is most resistant to extinction?
Variable Ratio. ## Footnote Most resistant to extinction → teaches persistence.
28
What is Fixed Interval schedule?
Reinforced after a fixed time if response occurs (e.g., weekly allowance). ## Footnote Types of Partial Schedules
29
What is Variable Interval schedule?
Reinforced on average after a time interval (e.g., checking social media). ## Footnote Types of Partial Schedules
30
What are the key effects of reinforcement schedules?
* Ratio schedules create faster learning than interval schedules because they link reinforcement directly to the number of responses, creating a stronger, more direct connection between the behavior and the reward * Variable Ratio = most resistant to extinction (basis for gambling addiction) * Fixed schedules show post-reinforcement pauses
31
Why is reinforcement more effective than punishment?
Reinforcement shapes long-term behaviour better than punishment.
32
What are some problems with punishment?
* Not as permanent * Reduces trust * Increases aggression * Risk of fear and learned helplessness
33
What are effective punishment guidelines?
* No escape possible * Intense (within limits) * Continuous * Immediate * Short in duration * No mixed signals * Reinforce alternative behaviours * watch for negative side effects
34
What are the reward variables that affect learning?
- Drive: Higher motivation (e.g., hunger) → stronger learning. - Magnitude (Size): Larger rewards = faster learning, but subject to diminishing returns. ○ Big rewards also → faster extinction if removed. - Delay: Immediate reinforcement is far more effective than delayed. - Explains difficulty resisting temptations (immediate reward vs delayed punishment, e.g., partying before an exam)
35
What is the Three-Term Contingency?
- Antecedent (A) / Discriminative Stimulus (Sᴰ): This is the environmental cue or signal that makes a specific behavior more likely to be reinforced or punished. - Example: A green traffic light. - Behavior (B) / Operant Response (R): This is the voluntary action or response that follows the antecedent. - Example: Pressing the brake pedal. - Consequence (C) / Outcome (Sᴿ or Sᴾ): This is what immediately follows the behavior and determines whether the behavior will occur more or less frequently in the future. - Example: The car moves forward (reinforcement). ## Footnote Skinner’s Claim: This 3-term relationship explains all operant behaviour
36
What is Stimulus Control?
When behaviour is governed by specific stimuli.
37
What is Stimulus Generalisation?
Response generalises to similar stimuli. ## Footnote (e.g., rat presses lever for green light, also for yellow light).
38
What is Stimulus Discrimination?
Learner distinguishes between stimuli. EG. presses only under green light, not red ## Footnote ○ Achieved through differential reinforcement.
39
What is the key takeaway about operant conditioning?
Explains voluntary behaviours through reinforcement and punishment.
40
What is Thorndike's Puzzle Box experiment?
- Cats placed in boxes could escape by pullling a string, stepping on a platform, or turning a latch. - Over time, cats escaped faster due to trial and error learning.
41
what is forward chaining?
Teach step 1, then 1+2, etc.