More effects (28) Used effects that should change across sample settings More labs were involved - Each protocol was administered to approximately half of 125 samples that comprised 15,305 participants from 36 countries and territories More diverse samples were used. Rationale – to understand how sample type and situation might influence replication – if this introduces a lot of variability, then we can say that replication attempts themselves are problematic. • On average 14 out of 28 – significant effects in the same direction • 8 of these found the effects in 89% to 100% of the different samples • 6 of these found the effects in 11% to 46% of the different samples • Out of those that did not replicate on average – 90% of the samples showed non-significant effects

Many fields (not just Psychology) are having an issue in relation to replication, for many reasons outlined. Statistical procedures Publication process Under powered studies a potential solution is open science

This is a movement to make parts (and hopefully all) of the research openly available to all for scrutiny Until recently all we have ever seen is the end result – the journal article Being ‘Open’ should reduce some of the systematic issues that have led to the replication crisis

open science Flashcards by Edie Pattison

Many Labs 1 (2014) - Investigating Variation in Replicability:
A ‘Many Labs’ Replication Project

• Took 13 classic and contemporary studies. Many of these were ‘known’ to replicate.
So this really was looking at the conditions that allow replicability.
• 36 attempted replications
• 10 of the findings replicated consistently – a decent amount
• A Narrow and deep approach – look at few findings and tried to replicate multiple times.
• Useful approach but not clear how generalisable across the field the results are…

How well did you know this?

Not at all

Perfectly

Open Science collaboration (2015) –

Estimating the reproducibility of psychological science

• Replications of 100 experimental and correlational studies (by 270 people)
• Designed a protocol that needed to be followed
o Contacting original authors for materials
o Registering the protocol of the design, participant numbers and analysis
• Studies from 3 journals - Psychological Science, Journal of Personality and Social Psychology, and Journal of Experimental Psychology: Learning, Memory, and Cognition
• Used high powered designs
• Focused on the size of the replication effects relative to the published original effects
• Focuses on replication in term of number being statistically significant

How well did you know this?

Not at all

Perfectly

Effect size

Effect size is a measurement of the magnitude of an effect.
It is important to know about the size of an effect e.g., how much does a experimental manipulation influence the data.
• It might not matter whether your effect is large or small. Both can be interesting (and a small effect spread over millions of people can have large impacts)

But if an effect is small, you need to make sure you have designed an experiment well enough to stand a chance of showing it
Publication bias (only publishing studies that have significant effects) might inflate the sizes of reported effect sizes …

How well did you know this?

Not at all

Perfectly

effect size equation

effect size = mean of cond.1 - mean of cond.2
                    ----------------------------------------------------
            pooled SD (SD of all cond1 + cond2 scores)

How well did you know this?

Not at all

Perfectly

Effect size

Why can’t we use p values?

p values indicate the probability of the observed data given that the null hypothesis is true
p values are heavily influenced by sample size
So we need a measure not influenced by sample size
An example measure is shown here (there are different ways of calculating this)
It uses Means and Variance

How well did you know this?

Not at all

Perfectly

power

Once we have

a) A design (e.g., number of participants)
b) Knowledge of the Effect Size (from previous literature)

We can calculate ‘Power’ – which refers to the likelihood of getting a statistically significant result in this study given a) and b).

Typically in the behavioural sciences we might use a power of 80%.

So in the literature surrounding replication, you will hear a lot about Power

a) Replication studies should have the appropriate power to find the effect.
b) Issues with replication are often because the initial studies are often under powered. Combine with publication bias towards exciting findings, this means things getting published which might not be as reliable as they claim.

How well did you know this?

Not at all

Perfectly

REPLICATION CRISIS: Open Science collaboration (2015) –

Estimating the reproducibility of psychological science

Replication effect sizes were half the magnitude of original effect sizes representing a substantial decline
97% of original studies had statistically significant results. 36% of replications had statistically significant results
39% of effects were subjectively rated to have replicated the original result
Correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams

How well did you know this?

Not at all

Perfectly

REPLICATION CRISIS

Difference between social psychology and cognitive psychology (open science collab 2015)

Cognition – Approximately 50% of findings were replicated (statistically significant)
Social – Approximately 25% of findings were replicated
Possibly due to
Nature of study – social – concepts under study are influenced by a wide range of things
Control – in cognition, much easier to control some of the variables

• First study of its kind – unclear as to what the replication rate should be.
There are always chances things don’t replicate
• Tried to maximise chance of replication - but things might not have gone as they should
• The finding that effect sizes seemed substantially inflated might be due to
systematic biases in publication
• Liking “significant” and “interesting” results
• This means authors not publishing when they fail to find significant effects.
• Low powered research designs

How well did you know this?

Not at all

Perfectly

There are lots of reasons why you might not replicate findings

Sample might differ in important ways
The context (situation) might have an influence on the findings
One of the studies made a mistake in the design
One is a false positive or the other a false negative

How well did you know this?

Not at all

Perfectly

Many Labs 2 (2018)

More effects (28)
Used effects that should change across sample settings
More labs were involved - Each protocol was administered to approximately half of 125 samples that comprised 15,305 participants from 36 countries and territories
More diverse samples were used.
Rationale – to understand how sample type and situation might influence replication – if this introduces a lot of variability, then we can say that replication attempts themselves are problematic.

• On average 14 out of 28 – significant effects in the same direction
• 8 of these found the effects in 89% to 100% of the different samples
• 6 of these found the effects in 11% to 46% of the different samples
• Out of those that did not replicate on average – 90% of the samples showed
non-significant effects

How well did you know this?

Not at all

Perfectly

How did situation influence the findings?

• 11 of the 28 showed significant heterogeneity across samples (differences) –
but only one of these was in an effect that did not replicate
• Only 1 effect showed a difference between online and in the lab versions of the test

• Identified samples in WEIRD cultures - Western, educated, industrialized,
rich and democratic- and compared to samples from less WEIRD cultures
• 13 out of the 14 that replicated showed no difference between WEIRD and less WEIRD.
The one that did found the effect in the WEIRD sample but not the less WEIRD sample
• In total there were only 3 differences found between WEIRD and less WEIRD
• Explored task order (as each lab did multiple tests) and found no real effect of this
• Overall 7 of the replications had larger effect sizes than originally reported, 21 had smaller
effect sizes
• Conclusion – Although the situation does have an effect, they are not large enough to
explain failures to replicate

How well did you know this?

Not at all

Perfectly

Replication crisis

Forsell et al (2019)

Interestingly – academics are quite good at predicting replication success …
• In this study academics had to predict the success of replication from Many Labs 2
• Via a questionnaire – 0.731 correlation with replication outcome
• Via a prediction market (where psychologists trade on outcome) = 0.755 correlation with replication outcome
• Via a questionnaire on predictions of effect sizes – 0.614 correlation
• So academics have a decent understanding od when a study may or may not replicate? How and why? Potentially if result is “surprising” or very novel …
• There are studies I have read and been quite sceptical of, either because the result seemed a bit surprising or unlikely, the sample looked small, the design didn’t seem great etc.

How well did you know this?

Not at all

Perfectly

the situation

Many fields (not just Psychology) are having an issue in relation to replication, for many reasons outlined.
Statistical procedures
Publication process
Under powered studies

a potential solution is open science

How well did you know this?

Not at all

Perfectly

open science

This is a movement to make parts (and hopefully all) of the research openly available to all for scrutiny
Until recently all we have ever seen is the end result – the journal article
Being ‘Open’ should reduce some of the systematic issues that have led to the replication crisis

How well did you know this?

Not at all

Perfectly

open science - why share

Sharing study rationale, hypotheses and plan for analysis before data collection somewhere that is openly accessible (and where this is time stamped) fixes the authors to one ‘story’ and doing one set of analyses.
• This prevents people making up a theory to explain the data
• It stops people analysing the data in lots of different ways until they find something ‘significant’. If we run lots of different analyses then chance alone will lead to one of them being ‘significant’. If that significant result is reported and written up to make it sound like it was the only analysis run it is convincing in the paper, but might mean it is not replicable.
• This type of statistics is called ‘null hypotheses significance testing’.

How well did you know this?

Not at all

Perfectly

sharing the data allows

Study These Flashcards

Checking of analysis (this prevents ‘errors; in analysis)
Consideration of how good the data set is
Comparison/combination with other data sets

sharing the analysis

Study These Flashcards

This will be a script like an R script (hence why we are now teaching you R)
Given the data and the analysis script people can replicate your findings exactly (and check for errors or problems)

Open Science – how?

See Klein et al (2018)

Study These Flashcards

Write a Study Protocol – a detailed written specification of
• Hypothesis
• Methods
• Analysis
• Ideally the level of detail would allow replication without instruction from you
This can be logged on various places, examples include (see links below)
• https://aspredicted.org/ - asks 9 questions, makes a document that is deposited and can be linked to privately via a url
• https://osf.io/ - much more comprehensive place to aid with study design, storing study code and data and preprints of the final article

materials

Study These Flashcards

• In psychology this could be stimuli (words, pictures, videos, vignettes),
forms used, questionnaires used

data and metadata

Study These Flashcards

• Raw data
• Anonymised data
• Anonymised data that is submitted to statistical analysis
(if problems with the above)
• Needs to be in a sharable format (e.g., not Excel but.csv file)
• If possible the script for creating this processed data from the raw data
should be made available
• Metadata – the documentation that explains the dataset, e.g., who collected and how, how many variables are in the dataset and what they refer to

analysis procedure

Study These Flashcards

• Exact specification of how moved from the raw data to the results of the statistical analysis
• This includes how the data were cleaned
• E.g., share R script
This sounds like a lot and for many academics (including me) is a big change – mostly in terms of organising and documenting a workflow and setting things up in a way such as they are ‘sharable’ from the outset
• Ethical issues – informed consent forms
• Copyright
• Intellectual property (e.g., of materials you might have spent years developing)
• Getting all the above right!
Klein – “Do not let the perfect be the enemy of the good”

open science - why

Study These Flashcards

But academics might not like some of this sharing. Some of this is being cautious (sharing results that have taken years) and sometimes because it takes time and planning up front.
Some are worried about being ‘scooped’
Worried about ‘errors’ being discovered

The publication process

Where to submit

Study These Flashcards

• Journal esteem / impact factor
• What’s the journal’s remit?
• What’s the word length of the articles?
• Which journal might be sympathetic
– does that journal publish this ‘type’ of work?
• Who are the editorial board?

The publication process
- Who picks it up?
- What is their bias?
- Which people do they ‘know’ that might review
this given their knowledge of the literature?

Study These Flashcards

This itself is skewed by the current state of play of the literature
Authors can request particular reviewers and request not to be reviewed by particular people
Editor can decide whether to take that advice or not

The publication process | Up until recently

* What is reviewed is the written paper * No data is provided – and so no way of checking data or analyses … * A system based on honesty! * There might be some issues here …

What are the incentives for academics?

* A job requirement to publish * In UK REF Problems in the publication method • Turning a blind eye • Exaggeration • Often in psychology – papers find ‘significant’ results, but actually the size of these effects might be small • Motivated to do so to make a bigger splash and get published

Problems in the publication method | Post hoc story telling

• Science is about developing and testing hypothesis. If the hypotheses come after the data …

P value fishing

• You should make a hypotheses and run the test that best tests this. If you run lots of tests of different kinds, then statistically you are likely to find something significant simply by chance alone …

Problems in the publication method | Outliers

• Outliers are a significant problem in data analysis. If you have extreme scores they can seriously mess up things like means and standard deviations, which are things that are used in many statistical tests. • So you need them out. But that means selecting what is an outlier based on a rule and sticking to that. • E.g., in some of my tests RTs < 300ms are impossible and RTs> 2000ms are very slow indeed and suggest someone has not done that particular trial. • But removing can influence analysis – so say whether it does!

Problems in the publication method | Non publication of data

* Most academics are sat on years worth of data….. * Often many academics try to replicate existing studies (with project students) and may fail to, but do not attempt to publish data. * This creates a bias in the field

Partial publication

• Conduct a large study in which lots of things are measured, but then publish different papers only on sub-sets of this overall data.

Inventing data

Problems in the publication method • Publication bias inflates effect sizes • Lots of published studies have lower power

exploratory research

you are not quite sure what you are doing or what you expect • Look at the data and plot it, but you should not use statistical tests as you might end up doing lots (Phishing) • Think about what the data is showing and why – then design a confirmatory study

confirmatory research

The rationale and hypotheses are known in advance, | enabling setting up a clear design to test them and a clear analysis plan

Being critical

Open Science is a ‘movement’ in science. It has a set of ideals that are ‘noble’ in the sense of trying to make the modern scientific process better able to self correct rather than produce unreliable results. As in all movements – some people will benefit more than others. • Those with more research time can write registered reports – those with less might have pressure to publish and so need to move quickly from idea to data collection • What happens if you specify things that you end up not being able to do? E.g., sample size of 80, but you get 50. You may not have any more time or resource to recruit the rest, so you fail to meet your own standards. Those with more resource will be less likely to have this issue. • Logging everything is costly in terms of time – so those with more research time benefit • Those with more technological expertise may benefit • A lot of it is about providing tools and education (e.g., learning R) and building it into data collection • It will create a mass of data – how can this be synthesised and who can really keep tabs on it? • If people want to cheat they will still find a way – so this is about stopping well intentioned people using unhelpful practices

open science Flashcards

(35 cards)