Many Labs 1 (2014) - Investigating Variation in Replicability:
A ‘Many Labs’ Replication Project
• Took 13 classic and contemporary studies. Many of these were ‘known’ to replicate.
So this really was looking at the conditions that allow replicability.
• 36 attempted replications
• 10 of the findings replicated consistently – a decent amount
• A Narrow and deep approach – look at few findings and tried to replicate multiple times.
• Useful approach but not clear how generalisable across the field the results are…
Open Science collaboration (2015) –
Estimating the reproducibility of psychological science
• Replications of 100 experimental and correlational studies (by 270 people)
• Designed a protocol that needed to be followed
o Contacting original authors for materials
o Registering the protocol of the design, participant numbers and analysis
• Studies from 3 journals - Psychological Science, Journal of Personality and Social Psychology, and Journal of Experimental Psychology: Learning, Memory, and Cognition
• Used high powered designs
• Focused on the size of the replication effects relative to the published original effects
• Focuses on replication in term of number being statistically significant
Effect size
Effect size is a measurement of the magnitude of an effect.
It is important to know about the size of an effect e.g., how much does a experimental manipulation influence the data.
• It might not matter whether your effect is large or small. Both can be interesting (and a small effect spread over millions of people can have large impacts)
effect size equation
effect size = mean of cond.1 - mean of cond.2
----------------------------------------------------
pooled SD (SD of all cond1 + cond2 scores)Effect size
Why can’t we use p values?
power
Once we have
a) A design (e.g., number of participants)
b) Knowledge of the Effect Size (from previous literature)
We can calculate ‘Power’ – which refers to the likelihood of getting a statistically significant result in this study given a) and b).
Typically in the behavioural sciences we might use a power of 80%.
So in the literature surrounding replication, you will hear a lot about Power
a) Replication studies should have the appropriate power to find the effect.
b) Issues with replication are often because the initial studies are often under powered. Combine with publication bias towards exciting findings, this means things getting published which might not be as reliable as they claim.
REPLICATION CRISIS: Open Science collaboration (2015) –
Estimating the reproducibility of psychological science
REPLICATION CRISIS
Difference between social psychology and cognitive psychology (open science collab 2015)
• First study of its kind – unclear as to what the replication rate should be.
There are always chances things don’t replicate
• Tried to maximise chance of replication - but things might not have gone as they should
• The finding that effect sizes seemed substantially inflated might be due to
systematic biases in publication
• Liking “significant” and “interesting” results
• This means authors not publishing when they fail to find significant effects.
• Low powered research designs
There are lots of reasons why you might not replicate findings
Many Labs 2 (2018)
• On average 14 out of 28 – significant effects in the same direction
• 8 of these found the effects in 89% to 100% of the different samples
• 6 of these found the effects in 11% to 46% of the different samples
• Out of those that did not replicate on average – 90% of the samples showed
non-significant effects
How did situation influence the findings?
• 11 of the 28 showed significant heterogeneity across samples (differences) –
but only one of these was in an effect that did not replicate
• Only 1 effect showed a difference between online and in the lab versions of the test
• Identified samples in WEIRD cultures - Western, educated, industrialized,
rich and democratic- and compared to samples from less WEIRD cultures
• 13 out of the 14 that replicated showed no difference between WEIRD and less WEIRD.
The one that did found the effect in the WEIRD sample but not the less WEIRD sample
• In total there were only 3 differences found between WEIRD and less WEIRD
• Explored task order (as each lab did multiple tests) and found no real effect of this
• Overall 7 of the replications had larger effect sizes than originally reported, 21 had smaller
effect sizes
• Conclusion – Although the situation does have an effect, they are not large enough to
explain failures to replicate
Replication crisis
Forsell et al (2019)
Interestingly – academics are quite good at predicting replication success …
• In this study academics had to predict the success of replication from Many Labs 2
• Via a questionnaire – 0.731 correlation with replication outcome
• Via a prediction market (where psychologists trade on outcome) = 0.755 correlation with replication outcome
• Via a questionnaire on predictions of effect sizes – 0.614 correlation
• So academics have a decent understanding od when a study may or may not replicate? How and why? Potentially if result is “surprising” or very novel …
• There are studies I have read and been quite sceptical of, either because the result seemed a bit surprising or unlikely, the sample looked small, the design didn’t seem great etc.
the situation
a potential solution is open science
open science
open science - why share
Sharing study rationale, hypotheses and plan for analysis before data collection somewhere that is openly accessible (and where this is time stamped) fixes the authors to one ‘story’ and doing one set of analyses.
• This prevents people making up a theory to explain the data
• It stops people analysing the data in lots of different ways until they find something ‘significant’. If we run lots of different analyses then chance alone will lead to one of them being ‘significant’. If that significant result is reported and written up to make it sound like it was the only analysis run it is convincing in the paper, but might mean it is not replicable.
• This type of statistics is called ‘null hypotheses significance testing’.
sharing the data allows
sharing the analysis
Open Science – how?
See Klein et al (2018)
Write a Study Protocol – a detailed written specification of
• Hypothesis
• Methods
• Analysis
• Ideally the level of detail would allow replication without instruction from you
This can be logged on various places, examples include (see links below)
• https://aspredicted.org/ - asks 9 questions, makes a document that is deposited and can be linked to privately via a url
• https://osf.io/ - much more comprehensive place to aid with study design, storing study code and data and preprints of the final article
materials
• In psychology this could be stimuli (words, pictures, videos, vignettes),
forms used, questionnaires used
data and metadata
• Raw data
• Anonymised data
• Anonymised data that is submitted to statistical analysis
(if problems with the above)
• Needs to be in a sharable format (e.g., not Excel but.csv file)
• If possible the script for creating this processed data from the raw data
should be made available
• Metadata – the documentation that explains the dataset, e.g., who collected and how, how many variables are in the dataset and what they refer to
analysis procedure
• Exact specification of how moved from the raw data to the results of the statistical analysis
• This includes how the data were cleaned
• E.g., share R script
This sounds like a lot and for many academics (including me) is a big change – mostly in terms of organising and documenting a workflow and setting things up in a way such as they are ‘sharable’ from the outset
• Ethical issues – informed consent forms
• Copyright
• Intellectual property (e.g., of materials you might have spent years developing)
• Getting all the above right!
Klein – “Do not let the perfect be the enemy of the good”
open science - why
The publication process
Where to submit
• Journal esteem / impact factor
• What’s the journal’s remit?
• What’s the word length of the articles?
• Which journal might be sympathetic
– does that journal publish this ‘type’ of work?
• Who are the editorial board?
The publication process
- Who picks it up?
- What is their bias?
- Which people do they ‘know’ that might review
this given their knowledge of the literature?