how to Verify Data Quality? 3
DOCUMENTATION
– CONTENT:
- Entity (what is about?)
- Property
- Measurement (units) type of variable
- Time
– TECHNICAL
- Abbreviations, codes
- Program code for data set creation, conversion
TRUTHFULNESS
- verify from other sources
-plausibility
COMPLETENESS
– TALL:
- missing observations?
– WIDE:
- encoding for missing documentation (NaN, 0..)
how to preserve data quality?
CONVERSION:
– DATA TRANSFORMATION
- convert unit
-aggregation
MERGING:
- ‘key’ is key
– DATA CLEANING
- limit the scope of the analysis (focus on the scope!)
- check the realistic/possible range of value
- check the origin of the data
- eliminate outliners
- eliminate border observation
how can we treat the missing data?
ELIMINATE: horizontally or vertically but cannot be a statement
IMPUTE: make a statement based on other variables. (estimation)
INTERPOLATION: male a statement based on the same variable. OK CROSS-SECTION, NOT OK Interpolation in time
IMPROVE ECONOMETRICS: other methods - different form interpolation
what is meant by Audit trail?
document the entire process from original data to the final result
6 dimension of data quality