different type of missing values
MCAR
no pattern in the missing data and completely random. Can be ignored or removed
MNAR
MAR
missing values can be explained by other (observed) variables. Missingness predictable from other variables in the dataset
how to identify HOW MUCH missing data is missing
1) Frequencies functions > statistics table
2) Explore function > case processing table
3) Missing value analysis > univariate statistics table
how to detect patterns
little MCAR test
a multivariate test, that evaluates the subgroups of the data that share the same missing data pattern. evaluates differences between the observed and estimated means in each missing data pattern.
It is not a definitive test
* provides EM means table with the MCAR test provided
* If p-value is above .05 (non-significant), the data is MCAR
* If p-value is below .05 (significant), the data is not MCAR
t-tests for missing values
using dummy variables for missing values
listwise deletion
deleting data from all cases (participants) who have data missing for any variable in your dataset. You will have a dataset that is complete for all participants included in it
Few cases are missing
You may end up with a smaller/biased sample to work with
pairwise deletion
Lets you keep more of your data by only removing the data points that are missing from any analyses. It conserved more of your data because all available data from cases are included
estimating data
mean subsitution
Common point replacement
replacing the value with the midpoint of the scale
imputation
regression
imputation
expectation maximisation
multiple imputation
univariate outliers
large standardised scores, z scores, on a single variable.
how to detect univariate outliers
multivariate outliers
a case with a strange combination of scores on two or more variables
Calculating mahalanobis’s distance
a measure of the distance of each case from a centroid of cases determined from a combination of scores or variables detected along chi square distribution.
Mahal distance cut off for 2 predictor variables
13.816
leverage
influential individual points
unusual predictor value
discrepancy
the extent to which a case is in line with others (unusual y value given its x value)