Chain of assumptions - orthogonalization
Concept similar to independent (orthogonalized) TV-set adjustment knobs - one knob, one problem fix
Early-stopping cons
It is a ‘knob’ with 2 functions, contradicts orthogonalization:
Using metric goals
Using TRAIN/DEV/TEST sets
Orhtogonalization for metric
Human-level performance
Improving algo to human-level performance
Bias/variance
Human-level performance
The importance of HL performance is in its use as a Bayes error in human perception tasks; also, in other papers the state-of-the-art can also be seen as a proxy for the Bayes error
Once the HL performance is surpased is much more difficult to improve the ML algorithm
Techniques for supervised learning
Simple error analysis
Incorrectly labeled samples - training set
Incorrectly labeled samples - dev set
review corrected dev and test sets
different training and testing data distributions - bad option
different training and testing data distributions - good option
handling mismatched data distribution for training and dev sets
performance/error levels - mismatched data distributions
bias/variance - mismatched data distributions
avoidable bias = TRE - HLE
variance: TRDE -TRE
data mismatch: DE - TRDE
overfitting to dev set: TSE -DE
table representation - bias/variance analysis data-mismatch
addressing data mismatch training/test sets
building the first system
transfer learning
multi-task learning