Andrade
Evaluation
Strengths
Reliability:
* Standardised procedure (same instructions, same tape, same volume)
Internal validity:
* Operationalisation of variables
* Counterbalancing (reduces order effects)
* Use of lures causing false alarms
Weaknesses
Generalisability:
* Sample was gynocentric (88% female), all taken from university participant panel
* Individual explanations as opposed to situational ones (variation between participants)
Validity:
* Lack of mundane realism (ecological)
* Demand characteristics (internal)
Baron-Cohen
Evaluation
Strengths
Reliability:
* Standardised procedure (same RET)
* Extraneous variables controlled using IQ test
Internal validity:
* Quasi-lab experiment
* Operationalisation of IV/DV
* High construct validity — can discriminate between those with AS/HFA and those who don’t
Weaknesses
Generalisability:
* All Group 1 were male and only 15 participants
* Most were from Cambridge or Exeter
Ecological validity:
* Low mundane realism (black-and-white, still images, can only see eyes, no body language)
Pozzulo
Evaluation
Strengths
Reliability:
* Standardised procedure (same clips shown, same instructions, same lineups)
*
Ethics:
* consent obtained for all participants (parents of children)
* children only asked to complete the task when they were ready
* children monitored for fatigue, anxiety and stress
* ensured participants had the right to withdraw
* no deception and a debrief was given at the end of the study
Weaknesses
Generalisability:
* children from ages of 4-7 only
* university participant panel for adults, all under 30 so limited life experience
Ecological validity:
* lack of mundane realism
* emotions the participant felt would be different to a real lineup
Milgram
Evaluation
Strengths
Quantitative data:
* Data collected on maximum shock delivered, as well as the latency and duration of shocks
* Easy to compare and draw conclusions
Reliability:
* Standardised procedure is easy to replicate
Weaknesses
Generalisability:
* Sample — all men and from the same area
Ethics:
* No informed consent given, deception involved
* Participants were arguably denied their right to withdraw (through the prods)
* Permanent psychological damage may have been done despite the debrief and friendly reconciliation (signs of nervous tension and distress were shown)
Piliavin
Evaluation
Strengths
Quantitative data:
* Data collected on number of people in the critical and adjacent areas, plus their race and sex, and data on the helpers as well as the latency and frequency of help offered. Also, data was collected on any persons that left the critical area or the car.
* Spontaneous help: 62/65 and 19/38, total 81/103.
* 90% male, 64% white.
* Critical area: 5% left for cane vs 9% left for drunk. 34 people left in total.
* Response time for 7+ males was 80s, for 1-3 males was 300s.
Ecological validity:
* Field experiment using independent measures design
* Low risk of demand characteristics
* High degree of mundane realism — gender breakdown (60% male) and racial composition (55% white) was typical in the real world NYC subway at the time
Weaknesses
Internal validity:
* Field experiment so could not control for extraneous variables (e.g. noise on the train, observers’ lines of sight, brightness of the lighting on the train, etc.)
* Possible demand characteristics if some observers experienced the experiment more than once, this may cause them to help more or less.
Ethics:
* Participants did not give informed consent.
* Deception was present as participants thought the victim actually required help, which may have caused psychological distress that was not relieved by a debrief.
* Any comments were recorded, as well as their race/sex, which may be a privacy concern.
* Their right to withdraw was arguably denied, although they could leave the car entirely.
Perry
Evaluation
Strengths
Internal validity:
* Extraneous variables controlled (repeated measures, multiple trials)
* Counterbalancing to reduce order effects
* Double-blind technique
* Use of paradigms like CID and IRI that are already established
Reliability:
* Standardised procedure (time between approaching figures, set pictures for rooms)
* Lab experiment and control of extraneous variables
Weaknesses
Generalisability:
* All were university students
* All male (androcentric)
* All had normal visual acuity with no past history of psychological disorders
Ecological validity:
* CID was computerised
* People don’t actually take OT
* People are not likely to be choosing rooms arranging deep talks
Bandura
Evaluation
Strengths
Internal validity:
* lab experiment
* control of extraneous variables (e.g. baseline aggression)
* low demand characteristics (covert observation through a one-way mirror)
Reliability:
* standardised procedure (model’s actions aggressing towards Bobo doll, aggression arousal, timed observation)
* inter-rater reliability (baseline aggression scores, performances of half the subjects also scored independently by a second observer)
Weaknesses
Ecological validity:
* aggression arousal was artificial and lacked mundane realism
* unrealistic to observe models aggressing towards a toy
* demand characteristics as the children may have believed the Bobo doll was designed to be hit, so did not actually learn aggression and thus not generalisable to real life
Ethics:
* informed consent not given as true aims of the study were not disclosed to participants
* right to withdraw was not explicitly given as children were subject to experimental conditions without their agreement and were not made aware of their right to withdraw
* permanent psychological harm may have been caused by the aggression arousal and the subsequent learning of aggression from the model
Fagen
Evaluation
Strengths
Reliability:
* same 5 actions taught: trunk here, trunk up, bucket, blow, steady
* same verbal cues and primary and secondary reinforcers (chopped bananas, whistle)
* same performance test schedule and scoring system
* same training techniques (capture, shaping, luring)
Quantitative data:
* sessions were timed from first cue to elephant’s last offer, times ranged from 257 to 451 minutes
* number of sessions ranged from 25 to 35
* performance tests were scored and 80% was required to achieve mastery
Weaknesses
Generalisability:
* small sample of only 5 elephants used, 4 of them juvenile, and all female
* lack representation from other age ranges of elephants, and males
Internal validity:
* tourists and elephant calves are distractions during the training, so the elephants may not be fully focused
* the 80% pass rate means that they may have been assumed to have mastered the trunk wash when they actually cannot do 20% of the procedure
* some unnecessary training e.g. “trunk out” was included in training time for 4/5 elephants
* researcher bias during the performance test, e.g. angle of “steady” is subjective
Saavedra and Silverman
Evaluation
Strengths
Ethics:
* informed consent
* confidentiality
* protection from harm using gradual exposure, going up the boy’s personalised rating from the Feelings Thermometer
Qualitative data:
* boy developed phobia at age 5 as a jar of buttons fell on him when he reached into it
* use of interviews
* case study
Weaknesses
Generalisability:
* case study
* one 9 year old Hispanic American boy
Internal validity:
* lack of a control group
* subjectivity of measurements on the Feelings Thermometer
* social desirability, pretending to show less fear
Dement and Kleitman
Evaluation
Strengths
Reliability:
* Standardised procedure
* Woken up after 5 or 15 minutes, use of forced choice
Internal validity:
* Use of bell
* Use of recording device (ensuring no one else was in the room during dream recall)
Ecological validity:
* Lack of mundane realism
Hassett
Evaluation
Strengths
Internal validity:
* Left/right placement was counterbalanced each trial
* 53 animals excluded from sample for having had prenatal hormone treatments, or being infant monkeys who were difficult to individually identify
* video cameras recorded any interaction
Quantitative data:
* behavioural checklist used
* Male monkeys showed a significant preference for wheeled toys (73%) over plush toys (9%), both in terms of frequency and duration of interactions. 18% showed no preference.
* Female monkeys did not show a statistically significant preference for either plush toys (30%) or wheeled toys (39%), with some having no preference (30%).
Weaknesses
Generalisability:
* only on monkeys in captivity
* only on rhesus monkeys
* sample was 61 male and 21 female monkeys
Ecological validity:
* monkeys in an artificial environment (enclosure) rather than in the wild, provided with limited foraging opportunities create boredom
Hölzel
Evaluation
Strengths
Reliability:
* use of MBSR program (regular group meetings for all experimental group participants)
* use of FFMQ
* same audio recordings sent to all experimental group participants
Experimental design:
* longitudinal, so can track long-term changes and easily establish cause-and-effect relationship between mindfulness and brain density
* repeated measures, so eliminates participant variables
Weaknesses
Generalisability:
* all right-handed
* small sample
* minimal previous exposure to meditation
Internal validity:
* self-reports and social desirability
* confounding variables e.g. lifestyle differences, social interactions, lack of standardisation of at-home practice