Instumental conditioning Flashcards

Question

Acquisition

Answer 1

- the process of acquisition leads to learning the contingency between a response & its consequence. -->In acquisition studies of instrumental conditioning, psychs r often interested in measuring the rate of responding of the new behaviour.

Answer 2

-The response rate for a given behavior can be visualized using a cumulative recorder --> Essentially, a long piece of paper flows through the machine at a constant rate as a pen draws a straight line. --> With each response made by the subject, the pen moves up a notch leading to a characteristic pattern of acquisition. - In modern studies of learning, the og pen & paper model of the cumulative recorder may be replaced by an automated computer system.

Answer 3

-- The flat horizontal line indicates when the subject is not responding, while an upward slope indicates when a response has been made. - cuz the y-axis indicates the cumulative record, we see that the graph continues growing in an upwards direction, as it tallies all previous responses in a permanent record. - The pattern of responding depends on a # of factors including the participant, behaviour complexity, & type of reinforcement used. - NOTE: graphs seen in the module shows the results of reward training, where the frequency of the behaviour increases.--> Each response is followed by a food reward.

Answer 4

- essentially it's learning a contingency between a behaviour & its consequence without careful guidance by the researcher (my definition) --> Ex, Thorndike's puzzle box, where the cat learned to escape by pulling a rope with no guidance. --> Ex, pigeon place in a cage where it gets food everytime it peaks the keyhole --> over time, pigeon will peck it accidentally & eventually learn the contingency between the behaviour & the consequence

Answer 5

- not all behaviours can be autoshaped cuz some instrumental responses r far too complex for a participant to discover on their own. --> Ex, teaching a dolphin to do a backflip - complex behaviours can be shaped by successive approximation= The complex behaviour can be organized into smaller approximations which gradually build up to the full response we hope to condition. --> EACH of these approximations can be reinforced through the presentation of a reward. --> Over time, the successive approximations lead to the final complex behaviour. - even when the target behaviour could otherwise be learned through trial & error, shaping can dramatically reduce acquisition time. --> This technique is used extensively by animal trainers. (& coaches--> textbook)

Answer 6

- Used when a desired behaviour is too complex for a subject to discover on their own in a single step. The behaviour is broken down into smaller, easier steps, eventually leading to the more complex behaviour.

Answer 7

- comes from the noted behaviourist BF Skinner, who set up an unusual display in the lobby of the psych building @ Harvard Uni. - 2 pigeons were playing game of table tennis, pecking a ping-pong ball back & forth to each other across a special table that was sloped toward each of them. --> Observers were fascinated & wondered how such complex behaviour could be learned by pigeons. - Skinner used shaping by successive approximation --> 1st they learned to simply peck @ the ping pong table to receive a food pellet. --> Once established, they had to peck a stationary ball, then a moving ball, then finally peck the ball all the way across to the other side of the table. --> As pigeons progressed through these stages, the criteria for receiving the reward became stricter. --> With training complete, all that was left was for Skinner to place the 2 pigeons on opposite sides of the table & start keeping score. - Similarly, Skinner was also able to train pigeons to walk in figure-eights, dance, & play a critical role in a prototype pigeon-guided missile system in World War 2. --> he lamented that military never took him seriously.

Answer 8

A technique used to develop a sequence of behaviours. Each behaviour is reinforced with the opportunity to perform the next behaviour in a sequence --> can be used to produce even more complex behaviours. - Chaining is basically adding on increasingly complex behavioural requirements to the og requirements in order to receive the og reinforcer. - We can see how powerful chaining is in learning complex behaviour when suddenly asked to perform an isolated behaviour from a sequence of chained behaviours. --> Ex, if I asked u to recall the 4 letters preceding the letter "P", u may also feel the urge to quickly recite ALL the letters prior to "P".

Answer 9

- say a rat is initially trained to press a lever for a food pellet as the last step in a chain of responses. - The next challenge for the rat is an overhanging string placed nearby. --> The rat must pull the string to gain access to the lever. --> The response of pulling the string is reinforced by the opportunity to make the original lever press response that leads to food. - textbook question ans ex of chaining: “A gerbil is trained to maneuver through a tiny bar-like scene. First, reward the gerbil for jumping in a miniature margarita glass. Next, reward the gerbil for making their way across the dance floor and then get them to jump in the margarita glass. Finally, add your own custom bar element before the dance floor and repeat the process!”

Answer 10

- Both shaping by successive approximation & chaining r used to learn complex behaviours.--> but, the 2 techniques differ in how the desired behaviour is reinforced. SHAPING - Reinforced For: Improvement --> behaviour is reinforced only if it is a closer approximation of the desired behavior than the behaviour last reinforced. --> reinforcing on the bases of improvement CHAINING - Reinforced for: Correct Order --> reinforces the behaviour so long as it is performed in a defined order. --> The behaviours in the chaining sequence, as well as the order, r set prior to the training.

Answer 11

- symbol: SD or S+ - important to learn the contingency between a response & reinforcement, but also when that contingency is valid - The SD signals when a contingency between a particular behaviour & reinforcement is “on”. --> Ex, In the case of the child, the environment of their parents’ home becomes an SD for the vegetable eating behaviour, which is reinforced with a dessert reward.

Answer 12

- symbol: Sδ or S- - is a cue which indicates when the contingent relationship is not valid. -Ex, the environment of the grandparents’ home becomes an S-delta for the behaviour of vegetable eating. The child learns that under these conditions, eating vegetables will not lead to a dessert reward.

Answer 13

- A signal to the organism when a given response-reinforcer relationship is valid. Can indicate either the presence (S+) or absence (S-) of the relationship. - positive discriminative stimulus= S+ - negative discriminative stimulus= S-

Answer 14

- CS+ informs u about what will happen; “look alive, the US train is about to arrive.” - S+ informs u about what could happen if u produce the appropriate behaviour; “if you act now, reinforcers are standing by.” - The CS- informs u of what will not happen; “the US train will definitely not arrive in the next 20 minutes.” - S- informs u that a response-reinforcer is not currently valid; “there’s no point in acting now, wait for a better opportunity.” - Despite these qualitative diffs in info, the mechanics of both instrumental & classical conditioning function similarly with respect to stimulus generalization & discrimination.

Answer 15

- rmr in classical conditioning, a CR was elicited not only by the CS that the subject was trained with, but also to cues similar to the og CS. --> This range of responding could be graphed on a generalization gradient - similar thing happens with the SD in instrumental conditioning. --> In our pigeon ex, the bird will learn to respond with pecking to the keyhole when the green light is on, but will also respond with pecking behaviour to lights of a similar wavelength to the original SD. --> This range of responding to lights can be captured on a SD Generalization Gradient.

Answer 16

- In controlled lab, psychs can manipulate variables like the SD, the S-delta, & the presentation of a reinforcer. --> via these manipulations, psychs can train participants to better discriminate between stimuli, an ability that can be measured on a generalization gradient that displays diffs in responding to systematically differing stimuli. - if we take a pigeon who learned that pecking a keyhole in the presence of a green light leads to food & then teach them that pecking that same keyhole in the presence of a red light does not lead to food.--> The green light would act as an SD & the red light as an S-Delta --> we will have also shifted the generalization gradient - in previous gradient, blue & yellow light each led to a moderate level of pecking behaviour. --> Now pecking in the presence of the blue light remains moderate, but is reduced in yellow light cuz it's intermediate between the green light (SD) & the red light (S-delta). - The takeaway= when the SD & S-Delta r in the same modality (e.g., colours of light or frequencies of sound), intro of training with an S-delta leads to better stimulus discrimination & fine tuning of behaviour that is more sharply directed to the SD.

Answer 17

- Changes in the value of a reward lead to shifts in response rate. - Negative contrast occurs when a response originally receiving a high reward is shifted to a lower reward; this results in reduced responding. - Positive contrast occurs when a response originally receiving a low reward is shifted to a higher reward; this results in increased responding. - Therefore, how powerful a given reinforcer is, depends not only on the absolute value of the reinforcer itself, but also on the relative value of that reinforcer compared to other reinforcers that have been experienced. --> Ex, person from $5 to $1 pay will have reduced response but person with $1 all along will have the same response. - If a particular response leads to less reinforcement than before, it may be optimal to seek out alternate sources of reinforcement.--> Interestingly, animals placed in a negative contrast effect condition also show an increase in exploratory behaviours, akin to ‘shopping around’ for a better reinforcement opportunity

Answer 18

- A newly introduced reward for a previously unrewarded task can alter an individual’s perception of that task. --> A task that was previously regarded as having intrinsic value (an activity pursued cuz it is, in and of itself, rewarding) now becomes viewed as work with extrinsic value (an activity undertaken only cuz it leads to reward coming from other sources). - In experiment by Lepper & Greene (1973), nursery school children were given the opportunity to draw pictures, an activity that the children found to be enjoyable. - Some of the children were then rewarded for making drawings with a ‘Good Player’ certificate. --> the rewarded children spent more time on drawing than another group of children who were not rewarded. --> However, when the certificates ran out, the previously rewarded children drastically dropped their drawing time to a level below the unrewarded children & chose to pursue other activities instead. - Important to keep in mind when interviewing candidates - Reward systems that r not planned properly can have unintended negative effects which can be especially important for considerations in educational & applied settings.

Answer 19

- aka CRF is a schedule of reinforcement when a response leads to a reinforcer on every single trial - However, in the real world, continuous reinforcement is very rare. --> far more likely that a contingent relationship is reinforced on a partial reinforcement schedule.

Answer 20

- aka PRF - can have reinforcement delivery determined by either total responses or time. --> not every time

Answer 21

- Ratio Schedule - Interval Schedule

Answer 22

is based on the # of responses made by a subject, which determines when reinforcement is given. -->pigeon on an FR-1 schedule, is rewarded with food for each pecking response, while a pigeon on an FR-10 schedule, is rewarded with food for every 10th pecking response.

Answer 23

- is based on the time since the last response that was reinforced. -->pigeon on a FI-1 minute schedule is rewarded with food for the first pecking response after a 1 minute period.--> Over an hour, the pigeon has the potential to earn 60 food pellets. --> A pigeon on an FI-10 minute schedule, is rewarded with food for the first pecking response after a 10 minute period.--> over 1 hour, the pigeon has the potential to earn just 6 food pellets.

Answer 24

- In contrast to a fixed schedule, rewards on variable ratio & variable interval schedules r provided following a variable amount of work or length of time, respectively. - Ex, on a VR-10 schedule, the pigeon must peck an average of 10 times to get food reward, but the exact # of pecks that yields a reward changes across trials. --> When u look @ the overall rate of reinforcement, it works out that, on average, 10 pecks were necessary per trial. - On a VI-10 min schedule, the first response following an average of 10 min will be reinforced, but the exact length of time between rewards changes across trials. --> When u look at the overall rate of reinforcement, it works out that on average 10 min must pass before a pecking response is reinforced.

Answer 25

Fixed ratio, variable ratio, fixed interval, & variable interval.

Answer 26

- this type is readily demoed in the lab where a pigeon must peck 3 times to receive food reward. -->This pecking behaviour can be elicited even when up to 100 pecks r required to receive a reward. --> Ex, a shirt manufacturer may pay a set amount of money for every 3 shirts sewed, effectively placing the worker on a FR-3 schedule. -However, there is a limit to how stingy an FR schedule can be.--> A schedule that is too stingy will lead to ratio strain, & the subject will stop responding.

Answer 27

- “Cumulative # of Responses” on the y-axis in relation to “Time” on the x-axis showing a repeating pause & run pattern with a diagonal rising line labelled “run” showing a pigeon pecking a keyhole followed by a flat horizontal line labelled “pause” where the pigeon is shown facing away from the keyhole. -Following reinforcement, a participant will pause with inactivity before beginning the next run of responding. - To understand why, consider a pigeon who receives a food reward after pecking a keyhole 20 times (an FR-20 schedule).--> If the pigeon is not particularly hungry, it will lack the motivation to work hard. --> so, he will pause before starting the next round of 20 food pecks. --> It’s as if the pigeon is procrastinating before having to start his next job.

Answer 28

- A period during which the organism momentarily stops responding before starting up again. --> Occurs after reinforcement on a fixed ratio schedule.

Answer 29

- As the # of responses required for reward increases the post-reinforcement pause tends to get longer. - If the required responses continue to increase the organism will eventually reach break point & stop responding completely.

Answer 30

- reinforcement is delivered after some random # of responses around a characteristic mean. --> Ex, reinforcement players receive from a slot machine in a casino.--> After some random # of plays set around a pre-set mean, the slot machine returns rewards. --> Naturally, the slot machine is set to have a very low # of mean payouts & we know how powerful it is

Answer 31

- As the casino slot machine players demo, the variable ratio schedule is capable of supporting very constant & high response rates. - And so, a cumulative record of responses reinforced on a variable ratio schedule will tend to look like a diagonal line with no pauses between. --> VR schedules that deliver more frequent reinforcement will support higher response rates.--> a VR-10 schedule will have a steeper slope than a VR-40 schedule. -->tho the payout of the slot machine is random, player knows that the only way to possibly get a reward is to continue playing, & so they do. --> a player may become emotionally attached to the particular slot machine he is playing, feeling that it is “warmed up” & will want to protect it from another player who may try to intrude on his investment. --> in reality, the VR mean payout is set so low that these concerns r not warranted from a statistical perspective. The payouts also tend to be averaged across several machines, not just the 1 the player is using.

Answer 32

- reinforcement is delivered following the 1st response after a set interval of time. --> Ex, on a FI-1 min schedule, a rat is reinforced for the 1st lever press that occurs at least 1 minute after the last reinforcement was delivered. --> Note that the subject is free to respond at any time, but these responses will have no effect until the interval has passed. --> thus the scallop shape (I think) --> A perfect fixed interval schedule is rarely seen outside the lab, but a good ex of the pattern created by a fixed interval schedule would be a course with weekly quizzes --> For many, this means that study behaviour responses will start ramping up just before the quiz. --> Immediately following the quiz, the study response behaviour will likely pause for a period, before starting the process again to ramp up for the next quiz

Answer 33

- Fixed interval schedules produce a cumulative record with a characteristic scallop pattern.--> Following reinforcement, there is a lull period in which responding drops, then slowly starts picking up again & peaking just before the next reinforcement is schedule to be delivered following a response. --> makes sense, cuz the individual does not want to miss the reinforcement window, but there's no direct reinforcement for responding well beforehand

Answer 34

- u could receive reinforcement @ any time, tho u do have an idea of how often reinforcement is likely to come up. --> Ex, a course that has pop quizzes that can happen @ any point in time. So, if you’re a diligent student, this means that study behaviour would continue @ a steady rate to ensure u were prepared when the dreaded pop quiz is announced --> same principle makes random drug testing of athletes more effective than regularly scheduled testing in promoting drug-free training behaviour. - participant on a VI schedule tends to respond @ a very steady rate, which ensures that they will not miss an opportunity for reinforcement. --> This steady rate of responding is shown here as an increasing straight line on the cumulative record. --> As u can imagine, a VI schedule that delivers more frequent reinforcement will support higher response rates. --> On a VI-2 min schedule, a participant can potentially earn 30 reinforcers in 1 hour. --> a VI-2 min schedule leads to a steeper slope than a VI-6 min schedule.

Answer 35

- instrumental conditioning for behaviours learned on a partial reinforcement schedule, r far more robust—meaning more resistant to extinction—than those trained on a continuous reinforcement schedule. --> In instrumental conditioning, extinction refers to the stopping of a desired behaviour once reinforcement is no longer given. - On a partial reinforcement (PRF) schedule, 1 reinforcement stops occurring, it is not immediately obvious that an abrupt change has happened & that no further reinforcements will be delivered. --> For this reason, it is often best to train behaviours using PRF rather than CRF schedules if u r interested in having the behaviour maintained over a long period. - logic follows when discussing variable & fixed reinforcement schedules that use the same # of responses or set interval of time. --> Variable schedules are more resistant to extinction than fixed schedules since individuals have fewer expectation about when reinforcement is coming. --> As a result, it takes longer on a variable schedule to realize reinforcement is never coming again, & the learner continues to perform the behaviour.--> So, a VR-5 schedule will be more robust than an FR-5 schedule.

Answer 36

If classical conditioning involves forming new reflexive responses, instrumental conditioning involves forming new voluntary behaviours that direct goal-centered actions.

Answer 37

- mechanisms of learning, operant conditioning in particular, play a large role. - Receiving messages is based on a variable interval schedule of reinforcement. --> this is a partial schedule of reinforcement, meaning there is a high resistance towards extinction --> we don’t necessarily get constant messages so when there is a period of time without messages, we don’t think that we’re never going to get a message again. - cuz the reinforcement is unpredictable it encourages a steady rate of responding. --> We don’t always know when messages r going to come, so we constantly check our phones in hopes that one will be there.

Answer 38

direct -->subjects must actually experience the US or the reinforcer/punisher in order for behaviour to be modified.

Answer 39

- Keeping in mind the distinction discussed earlier between learning & performance, we have already seen that learning may remain latent until the subject is put in a context where this learning is relevant. - This latent learning is still based on the subject’s own direct experience, it is just that this experience will not be reflected in performance until the subject is in the appropriate context.

Answer 40

- In many cases, we can see instances where an individual learns by observing the experience of others, especially when we r in unfamiliar situations - watching others learn results in cultural transmission - Albert Bandura & colleagues (1960s) conducted 1st experimental studies on observational learning. - Purpose: To determine the extent to which children learned to behave aggressively as a result of observing aggressive behaviour in others, phenomenon that still inspires research & debate today - placed children in room with an inflatable Bobo Doll that bounced back when hit & other toys. Some children watched a film of an adult aggressively hitting or kicking the doll; others did not. - When children that saw film entered the room, they showed a strong tendency to immediately begin attacking the doll, often displaying behaviour that was even more aggressive than in the movie clip. - Other children were much less likely to display this type of behaviour when given the same opportunity to play with the toys. --> Clearly the children were modeling their aggressive behaviour on what they had observed others doing; they had learned by observation to direct aggressive behaviour @ a particular target.

Answer 41

- directly addressed an issue that continues to be of concern today, which is the relationship between viewing violence in media (particularly on television & in video games) & aggressive behaviour in children. --> violent behaviours r encouraged in video games e.g., u won't get very far in GTA by taking a non-violent approach - Other research suggests that the average North American child spends as much time viewing or engaging with entertainment media each week as the typical working adult spends at their job—about 40 hours - Furthermore, children (& adults, for that matter) who r exposed to more aggressive behaviour in the media appear to display more aggressive behaviour & have more aggressive thoughts

Answer 42

- Whereas classical conditioning involves elicited behaviour triggered by a stimulus, instrumental conditioning involves voluntary behaviours emitted by the organism, & observational learning need not involve any immediate change in behaviour. - classical conditioning involves learning the relation between stimuli (CS & US), whereas instrumental conditioning & observational learning involve learning the relation between behaviour & its consequence. - In some ways these assigned categories may be artificial distinctions & some learning theorists have suggested that the same underlying mechanisms may be responsible for all forms of learning. --> At the very least, we can observe that classical & instrumental conditioning work together in learning situations in the real world. --> Ex, chapter author had colleague intro him to the world of pungent French cheeses. One day, the author had unfortunate experience of food poisoning after eating a special type of cheese called Petite Muenster, which has a characteristically strong odor. --> single trial has had long-lasting effects. To this day, the pungent odor is a CS that automatically triggers a CR of queasiness. The odor also plays another role, acting as a discriminative stimulus for the author to plug his nose (an instrumental response), which is reinforced by the desirable consequence of reducing the inflow of the offending odor (negative reinforcement).

Answer 43

-skinner 1948 - put pigeon in a box and would periodically give it food. --> now while waiting for food, the pigeon would do random pigeon behaviour like flapping its wings - say that it just happens to flap its wings right before the food is given, it'll make a correlation between flapping behaviour & food --> so now, when the pigeon wants food, it'll flap its wings

Answer 44

- we feel rewarded --> we gain from the learning trials of others - humans r highly social animals & we have survived due to this - we don't have to undergo the trial ourselves to imagine its effects --> benefit of observational learning

Answer 45

1. We learn associate relationships between stimuli, behaviour & consequences. 2. Adaptive functions of learning r embedded in biology & refined by environmental input. 3. We can learn explicitly & implicitly.

Instumental conditioning Flashcards

(69 cards)