choose your rows
one and only one target value for each row (well… for multilabel, multiclass may not be true but still it has to answer a precise question as in single label case)
inspect the data
correct the data
- otherwise delete the value and leave it as missing
missing data -> mean
missing data -> median, mode
missing data -> interpolated value
- useful with time-series data
missing data -> constant
missing data -> missing rank
missing data -> 0
- when dealing with categorical data use it as (key, value) pair (0, ‘missing’)
missing data -> 0 - confidence
missing data -> ‘missing’
missing data -> delete columns
missing data -> delete rows
missing data -> imputation