Instumental conditioning Flashcards

(69 cards)

1
Q

Instrumental Conditioning is

A
  • aka operant conditioning
    learning of a contingency between a voluntary behaviour & its consequence
  • requires explicit training
    –> An important point to keep in mind for any type of
    instrumental conditioning, is that it proceeds best when the consequence immediately follows the response
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In the wake of Darwin’s theory of evolution by natural selection, a debate was sparked concerning the extent to which mental abilities, such as problem solving, were conserved across species. Elaborate

A
  • Darwinian naturalists reported seemingly amazing accounts of intellectual achievements in animals, including stories of cats baiting a lawn with scattered breadcrumbs to lure unsuspecting birds.
  • How could this apparently sophisticated problem-solving behaviour in animals be objectively evaluated?
    –>rmr the data emerging from Pavlov’s laboratory concerned only reflexive learning, diff from this type of skill.
    –> A new line of research originating from the lab of Edward L. Thorndike (1874-1949) provided an answer. –> While Pavlov was investigating conditional reflexes with dogs in Russia, Thorndike was investigating a different type of learning with cats in America.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

psych Edward L Thorndyke

A
  • tho we care mostly bout humans, much of the early work was done in experiments on animals that focused on behaviour & ignored mental processes.
  • Thorndike began his investigations by studying cats in a puzzle box.
    –> Puzzle box= small
    chamber with door that can be opened by performing a specific behaviour like pulling on a rope –> basically escape room
  • outside was a small dish of food that provided motivation for the hungry cat to escape.
  • Over several trials, Thorndike observed their behaviours & recorded their escape time.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Thorndike Puzzle Box experiment: Results

A
  • @ first, cat performed random behaviour as they tried to escape
    –> it would eventually come
    upon the solution of pulling on the rope by accident.
  • Thorndike predicted that on trials following this discovery of the correct solution, the cat would then escape immediately when placed in the same puzzle box
  • Instead, he found that the frequency of the random behaviours generally decreased over time
    –> over trials, the random behaviours that didn’t lead to escape would occur less frequently, leaving only the correct target behaviour in place.
  • suggested that animals
    followed a simple stimulus-response type process with little credit for consciousness.
    –> Unlike in humans, there was never a distinct “aha” moment.
    –> The cat seemed to be
    working from a long trial-and-error process of discovery.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Stamping in & stamping out hypothesis

A
  • Thorndike hypothesized process called “Stamping
    In” & “Stamping Out”
  • determined whether a behaviour was maintained or eliminated, respectively.
    –> behaviours like rope pulling were stamped in cuz they had favourable consequence of food access.
  • random behaviours like turning in a circle, were stamped out.
  • Eventually, this general process leads to refinement & the cat learns the contingency between the specific behaviour of rope pulling & the specific consequence of
    food reward.
  • In this view, there was no need to attribute any special intellectual sophistication to the cat.–> Rather, the apparent problem-solving behaviour was more accurately described as a change in the probabilities of the various possible responses.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Law of Effect

A
  • Thorndike’s intuitive law
    Behaviours with positive consequences r stamped in & performed more frequently. Those with negative consequences r stamped out & performed less frequently
  • Just how strongly these responses r stamped in or out r proportional to the consequences (satisfying or unsatisfying) of the response.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Skinner & Operant Behaviour

A
  • If Thorndike was the father of instrumental conditioning, B.F. Skinner was its doting uncle. –> It was Skinner who popularized the Law of Effect & pushed instrumental conditioning to the forefront of learning theory.
  • Coined the term operant for operant conditioning
  • coined/preferred the term reinforcer instead of referring to referring to a ‘satisfying effect’ or ‘reward’ that followed a response
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Operant Chamber

A
  • aka a Skinner Box.
  • A special chamber with a lever or other mechanism by which an animal could respond to produce a reinforcer.
  • he used for the experimental study of operant (instrumental) conditioning.
  • the response rate (e.g., lever presses over time) was automatically recorded with a device = a cumulative recorder
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Cumulative recorder

A
  • Records the cumulative response rate during an instrumental conditioning experiment.
    –> - In typical experiment, animal stayed in the chamber for a set interval of time & analyzing the output of the cumulative recorder would allow learning to be assessed.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The Skinner box had many advantages over Thorndike’s puzzle box. Explain.

A
  • Trials could be shorter, there were no constraints on responding, & after completing a response & experiencing its effect, the animal remains in the box, and is free to respond again (and again and again!)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A key diff from classical conditioning is that in the case of instrumental conditioning…

A

…we r considering overt behaviours that r operated by an actor leading to a reinforcer.
- That’s why it’s aka “operant conditioning”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Similarities between classical conditioning & instrumental conditioning

A
  • in instrumental conditioning, the > pairings there r between an operant response (e.g., lever press) & its consequence (e.g., food), the stronger the acquired learning.–> If conditions change & the operant response is no longer paired with its consequence, the result is a decline in responding leading to extinction. –> Same idea with CS & UR
  • like classical conditioning, extinction in instrumental conditioning isn’t ‘unlearning’ the response. –>Rather, new learning is layered on top of the previously learned response. –> supported by the observation that following extinction, an instrumental response can show spontaneous recovery & faster reacquisition.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A reinforcer is

A

Any stimulus that is presented after a response that impacts the frequency that the response is performed

  • tho the Law of Effect is important in understanding learning, it’s not entirely clear how to define the “satisfying or annoying states” which determine the behaviour frequency.–> more precise strategy is to refer to the reinforcer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Primary reinforcers

A

A reinforcer with intrinsic value such a food, water or a mate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Secondary Reinforcer

A
  • A reinforcer that can be exchanged for a primary reinforcer. Money is the most commonly used for humans.
    –> established through classical conditioning
    –> paper rectangles, round pieces of metal, & small plastic cards have little intrinsic value (i.e., by themselves, they do not produce any particularly satisfying effects), but can be used to obtain items that r natural reinforcers.
    –> Thus, these secondary reinforcers can be powerful motivators of behaviour.
  • Similarly, grades, air miles, gold stars, coupons, & status symbols act as powerful reinforcers & influence ur behaviour, but only to the extent that they have been associated with other primary reinforcers. –> The principle here is the same as in classical conditioning with respect to higher-order conditioning.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Behavioural responses r changed by both________________ & ________________, each
of which can either be ____________ or ____________. Elaborate

A
  • positive reinforcers, negative reinforcers, presented, removed

–> leads to 4 diff types of instrumental conditioning: Presenting or removing a positive reinforcer, & presenting or removing a negative reinforcer.
–> . Depending on what target behaviour u r trying to affect, various types of instrumental conditioning may be appropriate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Reward training

A
  • the presentation of a positive reinforcer following a response, which increases the frequency of the behaviour being reinforced
    –> Ex, If you present a person with a cold drink every
    time he puts money into a machine, the frequency of this behaviour is likely to increase.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Punishment training

A
  • the presentation of a negative reinforcer following a response, which decreases the frequency of the behaviour being reinforced
  • Ex, If putting money into a vending machine was
    followed by a mild electric shock, the frequency of the behaviour would likely decrease very quickly.
  • tho the use of punishment training from a theoretical perspective is sound, it can be controversial when applied in the real world. –> use of punishment must consider
    the ethics of inflicting fear or distress on the recipient
  • Many learning theorists, including B.F. Skinner, have suggested that when punishment is used, the authority figure may become a signal for fear or distress through classical conditioning. –> Consider how such a contingency may impact a parent-child relationship.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Omission training

A
  • A positive reinforcer is being removed/omitted (thus, the name), leading to a decrease in the behaviour being reinforced
  • Not to be confused with punishment which we apply broadly on a daily basis
    –> Ex, the time-out procedure used by parents & schools –> they must leave the play area & sit alone for some time without access to the toys or friends that the other
    children r free to enjoy= removal of positive reinforcers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Escape training

A
  • a response is followed by the removal of a negative reinforcer (“escaping” an undesired situation), leading to an increase in that response behaviour.
  • its the removal of an aversive stimulus (i.e., something unpleasant)
    –> Ex, a child can avoid their chores on weekend (the negative reinforcer) by performing a response of
    completing homework.
    –> Ex, in a rate experiment , the floor of 1 side of cage delivers a constant mild electric shock; it can be avoided if the rat moves to the opposite side of the cage.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Immediate & Delayed Consequences

A
  • Training is most effective when the consequence immediately follows the target behaviour rather than being delayed.–> allows an organism to accurately associate the correct behaviour with the reinforcer.
  • This is especially evident when training animals.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Delay of gratification

A
  • in some circumstances both human & non-human subjects show the ability to respond to reinforcers that may not be immediately delivered.–> Ex, In most employment situations, u r required to wait until ur scheduled payday to receive ur financial reinforcer–> Ex, rat in operant chamber doesn’t need to be reinforced following every lever press in order to keep pressing.
  • ability to tolerate delay of gratification begins to develop in childhood, & young children who show difficulty tolerating such delays tend to have more difficulty in coping with stress & frustration later as adolescents
    –> In adults, the inability to delay gratification may play an influential role in substance abuse & addiction.–>immediately rewarding consequences of drug taking behaviour may overshadow the delayed, & possibly greater, benefits of abstinence.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

the Testing Effect

A
  • The Testing Effect= the phenomena that learning is better facilitated by testing, forced memory recall, than repeated episodes of studying & reviewing material. –> thus, best way to learn new concepts is through lots of testing, even pop quizzes
  • In order for it to work tho, test material difficulty level should allow for relatively high success rates. –>if profs make the tests so hard that recall is, on average, nearly impossible, successful learning will not occur.
  • Another important factor to consider is the role of feedback. –> Receiving full feedback after the completion of tests is extremely important cuz how can u improve if u don’t know what u did wrong & why?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Roediger & the Testing Effect

A
  • Roediger & Karpicke suggested that the testing effect results from processes of encoding during studying & testing.
  • When material is studied, processes used to retrieve items when attempting to recall is also encoded.
    –> This process of retrieval is what is activated during testing, & by testing, individuals have practice accessing the material, something that simply reviewing material can’t offer.
  • The testing effect has obvious benefits for testing in educational systems. –> As long as tests r fair & the appropriate feedback is given, frequent testing is actually a beneficial, effective way of learning. –> However, the testing effect is also quite relevant for test preparation. –> Instead of reading info repeatedly, it is best to continuously test urself! –> do it urself or get friends & family to help out
    –> the more practice retrieving the material, the stronger & more long-lasting the learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Acquisition
- the process of acquisition leads to learning the contingency between a response & its consequence. -->In acquisition studies of instrumental conditioning, psychs r often interested in measuring the rate of responding of the new behaviour.
26
Acquisition: Graphing responses
-The response rate for a given behavior can be visualized using a cumulative recorder --> Essentially, a long piece of paper flows through the machine at a constant rate as a pen draws a straight line. --> With each response made by the subject, the pen moves up a notch leading to a characteristic pattern of acquisition. - In modern studies of learning, the og pen & paper model of the cumulative recorder may be replaced by an automated computer system.
27
Acquisition: Graph for a TYPICAL experiment
-- The flat horizontal line indicates when the subject is not responding, while an upward slope indicates when a response has been made. - cuz the y-axis indicates the cumulative record, we see that the graph continues growing in an upwards direction, as it tallies all previous responses in a permanent record. - The pattern of responding depends on a # of factors including the participant, behaviour complexity, & type of reinforcement used. - NOTE: graphs seen in the module shows the results of reward training, where the frequency of the behaviour increases.--> Each response is followed by a food reward.
28
Autoshaping
- essentially it's learning a contingency between a behaviour & its consequence without careful guidance by the researcher (my definition) --> Ex, Thorndike's puzzle box, where the cat learned to escape by pulling a rope with no guidance. --> Ex, pigeon place in a cage where it gets food everytime it peaks the keyhole --> over time, pigeon will peck it accidentally & eventually learn the contingency between the behaviour & the consequence
29
Shaping by successive approximation
- not all behaviours can be autoshaped cuz some instrumental responses r far too complex for a participant to discover on their own. --> Ex, teaching a dolphin to do a backflip - complex behaviours can be shaped by successive approximation= The complex behaviour can be organized into smaller approximations which gradually build up to the full response we hope to condition. --> EACH of these approximations can be reinforced through the presentation of a reward. --> Over time, the successive approximations lead to the final complex behaviour. - even when the target behaviour could otherwise be learned through trial & error, shaping can dramatically reduce acquisition time. --> This technique is used extensively by animal trainers. (& coaches--> textbook)
30
Shaping Def textbook
- Used when a desired behaviour is too complex for a subject to discover on their own in a single step. The behaviour is broken down into smaller, easier steps, eventually leading to the more complex behaviour.
31
A Famous example of shaping
- comes from the noted behaviourist BF Skinner, who set up an unusual display in the lobby of the psych building @ Harvard Uni. - 2 pigeons were playing game of table tennis, pecking a ping-pong ball back & forth to each other across a special table that was sloped toward each of them. --> Observers were fascinated & wondered how such complex behaviour could be learned by pigeons. - Skinner used shaping by successive approximation --> 1st they learned to simply peck @ the ping pong table to receive a food pellet. --> Once established, they had to peck a stationary ball, then a moving ball, then finally peck the ball all the way across to the other side of the table. --> As pigeons progressed through these stages, the criteria for receiving the reward became stricter. --> With training complete, all that was left was for Skinner to place the 2 pigeons on opposite sides of the table & start keeping score. - Similarly, Skinner was also able to train pigeons to walk in figure-eights, dance, & play a critical role in a prototype pigeon-guided missile system in World War 2. --> he lamented that military never took him seriously.
32
Chaining
A technique used to develop a sequence of behaviours. Each behaviour is reinforced with the opportunity to perform the next behaviour in a sequence --> can be used to produce even more complex behaviours. - Chaining is basically adding on increasingly complex behavioural requirements to the og requirements in order to receive the og reinforcer. - We can see how powerful chaining is in learning complex behaviour when suddenly asked to perform an isolated behaviour from a sequence of chained behaviours. --> Ex, if I asked u to recall the 4 letters preceding the letter "P", u may also feel the urge to quickly recite ALL the letters prior to "P".
33
Chaining examples
- say a rat is initially trained to press a lever for a food pellet as the last step in a chain of responses. - The next challenge for the rat is an overhanging string placed nearby. --> The rat must pull the string to gain access to the lever. --> The response of pulling the string is reinforced by the opportunity to make the original lever press response that leads to food. - textbook question ans ex of chaining: “A gerbil is trained to maneuver through a tiny bar-like scene. First, reward the gerbil for jumping in a miniature margarita glass. Next, reward the gerbil for making their way across the dance floor and then get them to jump in the margarita glass. Finally, add your own custom bar element before the dance floor and repeat the process!”
34
Shaping VS chaining
- Both shaping by successive approximation & chaining r used to learn complex behaviours.--> but, the 2 techniques differ in how the desired behaviour is reinforced. SHAPING - Reinforced For: Improvement --> behaviour is reinforced only if it is a closer approximation of the desired behavior than the behaviour last reinforced. --> reinforcing on the bases of improvement CHAINING - Reinforced for: Correct Order --> reinforces the behaviour so long as it is performed in a defined order. --> The behaviours in the chaining sequence, as well as the order, r set prior to the training.
35
Discriminative Stimulus
- symbol: SD or S+ - important to learn the contingency between a response & reinforcement, but also when that contingency is valid - The SD signals when a contingency between a particular behaviour & reinforcement is “on”. --> Ex, In the case of the child, the environment of their parents’ home becomes an SD for the vegetable eating behaviour, which is reinforced with a dessert reward.
36
S delta
- symbol: Sδ or S- - is a cue which indicates when the contingent relationship is not valid. -Ex, the environment of the grandparents’ home becomes an S-delta for the behaviour of vegetable eating. The child learns that under these conditions, eating vegetables will not lead to a dessert reward.
37
TEXTBOOK: Discriminative stimuli
- A signal to the organism when a given response-reinforcer relationship is valid. Can indicate either the presence (S+) or absence (S-) of the relationship. - positive discriminative stimulus= S+ - negative discriminative stimulus= S-
38
How do the S+ and S- of instrumental conditioning compare with the CS+ & CS- of classical conditioning?
- CS+ informs u about what will happen; “look alive, the US train is about to arrive.” - S+ informs u about what could happen if u produce the appropriate behaviour; “if you act now, reinforcers are standing by.” - The CS- informs u of what will not happen; “the US train will definitely not arrive in the next 20 minutes.” - S- informs u that a response-reinforcer is not currently valid; “there’s no point in acting now, wait for a better opportunity.” - Despite these qualitative diffs in info, the mechanics of both instrumental & classical conditioning function similarly with respect to stimulus generalization & discrimination.
39
SD generalization gradient
- rmr in classical conditioning, a CR was elicited not only by the CS that the subject was trained with, but also to cues similar to the og CS. --> This range of responding could be graphed on a generalization gradient - similar thing happens with the SD in instrumental conditioning. --> In our pigeon ex, the bird will learn to respond with pecking to the keyhole when the green light is on, but will also respond with pecking behaviour to lights of a similar wavelength to the original SD. --> This range of responding to lights can be captured on a SD Generalization Gradient.
40
Pigeon and SD & S-
- In controlled lab, psychs can manipulate variables like the SD, the S-delta, & the presentation of a reinforcer. --> via these manipulations, psychs can train participants to better discriminate between stimuli, an ability that can be measured on a generalization gradient that displays diffs in responding to systematically differing stimuli. - if we take a pigeon who learned that pecking a keyhole in the presence of a green light leads to food & then teach them that pecking that same keyhole in the presence of a red light does not lead to food.--> The green light would act as an SD & the red light as an S-Delta --> we will have also shifted the generalization gradient - in previous gradient, blue & yellow light each led to a moderate level of pecking behaviour. --> Now pecking in the presence of the blue light remains moderate, but is reduced in yellow light cuz it's intermediate between the green light (SD) & the red light (S-delta). - The takeaway= when the SD & S-Delta r in the same modality (e.g., colours of light or frequencies of sound), intro of training with an S-delta leads to better stimulus discrimination & fine tuning of behaviour that is more sharply directed to the SD.
41
Contrast effects
- Changes in the value of a reward lead to shifts in response rate. - Negative contrast occurs when a response originally receiving a high reward is shifted to a lower reward; this results in reduced responding. - Positive contrast occurs when a response originally receiving a low reward is shifted to a higher reward; this results in increased responding. - Therefore, how powerful a given reinforcer is, depends not only on the absolute value of the reinforcer itself, but also on the relative value of that reinforcer compared to other reinforcers that have been experienced. --> Ex, person from $5 to $1 pay will have reduced response but person with $1 all along will have the same response. - If a particular response leads to less reinforcement than before, it may be optimal to seek out alternate sources of reinforcement.--> Interestingly, animals placed in a negative contrast effect condition also show an increase in exploratory behaviours, akin to ‘shopping around’ for a better reinforcement opportunity
42
Overjustification effect
- A newly introduced reward for a previously unrewarded task can alter an individual’s perception of that task. --> A task that was previously regarded as having intrinsic value (an activity pursued cuz it is, in and of itself, rewarding) now becomes viewed as work with extrinsic value (an activity undertaken only cuz it leads to reward coming from other sources). - In experiment by Lepper & Greene (1973), nursery school children were given the opportunity to draw pictures, an activity that the children found to be enjoyable. - Some of the children were then rewarded for making drawings with a ‘Good Player’ certificate. --> the rewarded children spent more time on drawing than another group of children who were not rewarded. --> However, when the certificates ran out, the previously rewarded children drastically dropped their drawing time to a level below the unrewarded children & chose to pursue other activities instead. - Important to keep in mind when interviewing candidates - Reward systems that r not planned properly can have unintended negative effects which can be especially important for considerations in educational & applied settings.
43
Continuous reinforcement
- aka CRF is a schedule of reinforcement when a response leads to a reinforcer on every single trial - However, in the real world, continuous reinforcement is very rare. --> far more likely that a contingent relationship is reinforced on a partial reinforcement schedule.
44
Partial reinforcement schedules
- aka PRF - can have reinforcement delivery determined by either total responses or time. --> not every time
45
In cases where reinforcement is not continuous, there r 2 basic methods for determining when reinforcement will be delivered: List Them
- Ratio Schedule - Interval Schedule
46
A ratio schedule of reinforcement
is based on the # of responses made by a subject, which determines when reinforcement is given. -->pigeon on an FR-1 schedule, is rewarded with food for each pecking response, while a pigeon on an FR-10 schedule, is rewarded with food for every 10th pecking response.
47
interval schedule of reinforcement
- is based on the time since the last response that was reinforced. -->pigeon on a FI-1 minute schedule is rewarded with food for the first pecking response after a 1 minute period.--> Over an hour, the pigeon has the potential to earn 60 food pellets. --> A pigeon on an FI-10 minute schedule, is rewarded with food for the first pecking response after a 10 minute period.--> over 1 hour, the pigeon has the potential to earn just 6 food pellets.
48
Variable schedule twist
- In contrast to a fixed schedule, rewards on variable ratio & variable interval schedules r provided following a variable amount of work or length of time, respectively. - Ex, on a VR-10 schedule, the pigeon must peck an average of 10 times to get food reward, but the exact # of pecks that yields a reward changes across trials. --> When u look @ the overall rate of reinforcement, it works out that, on average, 10 pecks were necessary per trial. - On a VI-10 min schedule, the first response following an average of 10 min will be reinforced, but the exact length of time between rewards changes across trials. --> When u look at the overall rate of reinforcement, it works out that on average 10 min must pass before a pecking response is reinforced.
49
What r the 4 basic types of schedules: List Them
Fixed ratio, variable ratio, fixed interval, & variable interval.
50
Fixed Ratio schedule
- this type is readily demoed in the lab where a pigeon must peck 3 times to receive food reward. -->This pecking behaviour can be elicited even when up to 100 pecks r required to receive a reward. --> Ex, a shirt manufacturer may pay a set amount of money for every 3 shirts sewed, effectively placing the worker on a FR-3 schedule. -However, there is a limit to how stingy an FR schedule can be.--> A schedule that is too stingy will lead to ratio strain, & the subject will stop responding.
51
Fixed Ratio: graph
- “Cumulative # of Responses” on the y-axis in relation to “Time” on the x-axis showing a repeating pause & run pattern with a diagonal rising line labelled “run” showing a pigeon pecking a keyhole followed by a flat horizontal line labelled “pause” where the pigeon is shown facing away from the keyhole. -Following reinforcement, a participant will pause with inactivity before beginning the next run of responding. - To understand why, consider a pigeon who receives a food reward after pecking a keyhole 20 times (an FR-20 schedule).--> If the pigeon is not particularly hungry, it will lack the motivation to work hard. --> so, he will pause before starting the next round of 20 food pecks. --> It’s as if the pigeon is procrastinating before having to start his next job.
52
Post-reinforcement pause
- A period during which the organism momentarily stops responding before starting up again. --> Occurs after reinforcement on a fixed ratio schedule.
53
Ratio Strain
- As the # of responses required for reward increases the post-reinforcement pause tends to get longer. - If the required responses continue to increase the organism will eventually reach break point & stop responding completely.
54
variable ratio schedule
- reinforcement is delivered after some random # of responses around a characteristic mean. --> Ex, reinforcement players receive from a slot machine in a casino.--> After some random # of plays set around a pre-set mean, the slot machine returns rewards. --> Naturally, the slot machine is set to have a very low # of mean payouts & we know how powerful it is
55
variable ratio: graph
- As the casino slot machine players demo, the variable ratio schedule is capable of supporting very constant & high response rates. - And so, a cumulative record of responses reinforced on a variable ratio schedule will tend to look like a diagonal line with no pauses between. --> VR schedules that deliver more frequent reinforcement will support higher response rates.--> a VR-10 schedule will have a steeper slope than a VR-40 schedule. -->tho the payout of the slot machine is random, player knows that the only way to possibly get a reward is to continue playing, & so they do. --> a player may become emotionally attached to the particular slot machine he is playing, feeling that it is “warmed up” & will want to protect it from another player who may try to intrude on his investment. --> in reality, the VR mean payout is set so low that these concerns r not warranted from a statistical perspective. The payouts also tend to be averaged across several machines, not just the 1 the player is using.
56
fixed interval schedule
- reinforcement is delivered following the 1st response after a set interval of time. --> Ex, on a FI-1 min schedule, a rat is reinforced for the 1st lever press that occurs at least 1 minute after the last reinforcement was delivered. --> Note that the subject is free to respond at any time, but these responses will have no effect until the interval has passed. --> thus the scallop shape (I think) --> A perfect fixed interval schedule is rarely seen outside the lab, but a good ex of the pattern created by a fixed interval schedule would be a course with weekly quizzes --> For many, this means that study behaviour responses will start ramping up just before the quiz. --> Immediately following the quiz, the study response behaviour will likely pause for a period, before starting the process again to ramp up for the next quiz
57
fixed interval: graph
- Fixed interval schedules produce a cumulative record with a characteristic scallop pattern.--> Following reinforcement, there is a lull period in which responding drops, then slowly starts picking up again & peaking just before the next reinforcement is schedule to be delivered following a response. --> makes sense, cuz the individual does not want to miss the reinforcement window, but there's no direct reinforcement for responding well beforehand
58
variable interval schedule
- u could receive reinforcement @ any time, tho u do have an idea of how often reinforcement is likely to come up. --> Ex, a course that has pop quizzes that can happen @ any point in time. So, if you’re a diligent student, this means that study behaviour would continue @ a steady rate to ensure u were prepared when the dreaded pop quiz is announced --> same principle makes random drug testing of athletes more effective than regularly scheduled testing in promoting drug-free training behaviour. - participant on a VI schedule tends to respond @ a very steady rate, which ensures that they will not miss an opportunity for reinforcement. --> This steady rate of responding is shown here as an increasing straight line on the cumulative record. --> As u can imagine, a VI schedule that delivers more frequent reinforcement will support higher response rates. --> On a VI-2 min schedule, a participant can potentially earn 30 reinforcers in 1 hour. --> a VI-2 min schedule leads to a steeper slope than a VI-6 min schedule.
59
Extinction & Schedules
- instrumental conditioning for behaviours learned on a partial reinforcement schedule, r far more robust—meaning more resistant to extinction—than those trained on a continuous reinforcement schedule. --> In instrumental conditioning, extinction refers to the stopping of a desired behaviour once reinforcement is no longer given. - On a partial reinforcement (PRF) schedule, 1 reinforcement stops occurring, it is not immediately obvious that an abrupt change has happened & that no further reinforcements will be delivered. --> For this reason, it is often best to train behaviours using PRF rather than CRF schedules if u r interested in having the behaviour maintained over a long period. - logic follows when discussing variable & fixed reinforcement schedules that use the same # of responses or set interval of time. --> Variable schedules are more resistant to extinction than fixed schedules since individuals have fewer expectation about when reinforcement is coming. --> As a result, it takes longer on a variable schedule to realize reinforcement is never coming again, & the learner continues to perform the behaviour.--> So, a VR-5 schedule will be more robust than an FR-5 schedule.
60
If classical conditioning involves _____________________ ___________________, instrumental conditioning involves ______________________ _______________________________
If classical conditioning involves forming new reflexive responses, instrumental conditioning involves forming new voluntary behaviours that direct goal-centered actions.
61
many feel the constant need to check our cell phones for any sort of update or bit of communication that keeps us in tune with the greater social network we r a part of. WHY?
- mechanisms of learning, operant conditioning in particular, play a large role. - Receiving messages is based on a variable interval schedule of reinforcement. --> this is a partial schedule of reinforcement, meaning there is a high resistance towards extinction --> we don’t necessarily get constant messages so when there is a period of time without messages, we don’t think that we’re never going to get a message again. - cuz the reinforcement is unpredictable it encourages a steady rate of responding. --> We don’t always know when messages r going to come, so we constantly check our phones in hopes that one will be there.
62
In both classical & instrumental conditioning, learning occurs as a result of _________ experience
direct -->subjects must actually experience the US or the reinforcer/punisher in order for behaviour to be modified.
63
recalling Latent learning
- Keeping in mind the distinction discussed earlier between learning & performance, we have already seen that learning may remain latent until the subject is put in a context where this learning is relevant. - This latent learning is still based on the subject’s own direct experience, it is just that this experience will not be reflected in performance until the subject is in the appropriate context.
64
Observational Learning
- In many cases, we can see instances where an individual learns by observing the experience of others, especially when we r in unfamiliar situations - watching others learn results in cultural transmission - Albert Bandura & colleagues (1960s) conducted 1st experimental studies on observational learning. - Purpose: To determine the extent to which children learned to behave aggressively as a result of observing aggressive behaviour in others, phenomenon that still inspires research & debate today - placed children in room with an inflatable Bobo Doll that bounced back when hit & other toys. Some children watched a film of an adult aggressively hitting or kicking the doll; others did not. - When children that saw film entered the room, they showed a strong tendency to immediately begin attacking the doll, often displaying behaviour that was even more aggressive than in the movie clip. - Other children were much less likely to display this type of behaviour when given the same opportunity to play with the toys. --> Clearly the children were modeling their aggressive behaviour on what they had observed others doing; they had learned by observation to direct aggressive behaviour @ a particular target.
65
Bandura's work's implications
- directly addressed an issue that continues to be of concern today, which is the relationship between viewing violence in media (particularly on television & in video games) & aggressive behaviour in children. --> violent behaviours r encouraged in video games e.g., u won't get very far in GTA by taking a non-violent approach - Other research suggests that the average North American child spends as much time viewing or engaging with entertainment media each week as the typical working adult spends at their job—about 40 hours - Furthermore, children (& adults, for that matter) who r exposed to more aggressive behaviour in the media appear to display more aggressive behaviour & have more aggressive thoughts
66
Final Thoughts on Learning
- Whereas classical conditioning involves elicited behaviour triggered by a stimulus, instrumental conditioning involves voluntary behaviours emitted by the organism, & observational learning need not involve any immediate change in behaviour. - classical conditioning involves learning the relation between stimuli (CS & US), whereas instrumental conditioning & observational learning involve learning the relation between behaviour & its consequence. - In some ways these assigned categories may be artificial distinctions & some learning theorists have suggested that the same underlying mechanisms may be responsible for all forms of learning. --> At the very least, we can observe that classical & instrumental conditioning work together in learning situations in the real world. --> Ex, chapter author had colleague intro him to the world of pungent French cheeses. One day, the author had unfortunate experience of food poisoning after eating a special type of cheese called Petite Muenster, which has a characteristically strong odor. --> single trial has had long-lasting effects. To this day, the pungent odor is a CS that automatically triggers a CR of queasiness. The odor also plays another role, acting as a discriminative stimulus for the author to plug his nose (an instrumental response), which is reinforced by the desirable consequence of reducing the inflow of the offending odor (negative reinforcement).
67
SLIDES: Superstition in pigeons
-skinner 1948 - put pigeon in a box and would periodically give it food. --> now while waiting for food, the pigeon would do random pigeon behaviour like flapping its wings - say that it just happens to flap its wings right before the food is given, it'll make a correlation between flapping behaviour & food --> so now, when the pigeon wants food, it'll flap its wings
68
Why do unboxing videos & pimple-popping videos go viral?
- we feel rewarded --> we gain from the learning trials of others - humans r highly social animals & we have survived due to this - we don't have to undergo the trial ourselves to imagine its effects --> benefit of observational learning
69
SLIDES CONCLUSION: Learning is important for anticipating events in our world. (3 POINTS)
1. We learn associate relationships between stimuli, behaviour & consequences. 2. Adaptive functions of learning r embedded in biology & refined by environmental input. 3. We can learn explicitly & implicitly.