what is missing data?
Often don’t get information for every measure from every participants
Variables we study may contain missing values/ information
This includes error in data
why is missing data important?
Missing data points could be meaningful for analysis if observed
This could potentially affect our analysis as it may conceal a meaningful value for understanding a problem
We need to identify this and determine the nature of the missing data
what are the different reasons why missing data may occur?
non-response
data entry errors
instrumentation issues
privacy concerns
natural causes
what is non-response?
a reason why missing data may occur if participants choose not to answer certain questions or fail to provide information
what is data entry errors?
a reason why missing data may occur if there are mistakes made by researchers during the data collection or entry process
what is instrumentation issues?
a reason why missing data may occur if there are problems with the tools or instruments used to collect data
what is privacy concerns?
a reason why missing data may occur if sensitive information is omitted from a dataset to protect participants and to maintain ethics
what is natural causes?
a reason why missing data may occur if there are events beyond control e.g. technical issues, power outages, environmental factors can lead to missing data
what are the different types of missing data?
non-systematic
systematic
what are the types of non-systematic missing data?
missing completely at random (MCAR)
missing at random (MAR)
what are the different types of systematic missing data?
missing not at random (MNAR)
explain what MCAR is
The fact data is missing is independent of the observed and unobserved value
The missing data reduces the analysable population of the study
Reduces the statistical power but does not introduce bias
Can be considered a simple random sample of the full data set
explain what MAR is
The missing data is systematically related to the observed but not the unobserved values
Can occur if probability of completion of the survey is related to their sex which is fully observed but not the severity of their depression
explain what MNAR is
Missing data is systematically related to the unobserved value
Analysis of a dataset containing MNAR data
Likely to result in biased estimates
why is missing data a problem?
Can have significant effects on statistical analyses
May lead to biased estimates
Can affect the story we try to tell about our participants and who they represent in the wider population
why do we report missing data?
what missing data do we report?
why do we specify the reasons for the missing data?
to provide insights into whether the missingness is random or systematic
what are some ways in which we can identify missing data?
what are some other approaches to missing data?
complete case analysis (CCA)
mean/median/mode imputation
multiple imputation
weighted estimation
model-based methods
what is complete case analysis?
what is mean/median/mode imputation?
what is multiple imputation?
creating, analysing and combing multiple complete datasets with imputed values
what is weighted estimation?
using statistical models to estimate missing values