Quantitative Methods
Characterized by objective measurements
Qualitative Methods
Emphasizes the understanding of human experience
Descriptive statistic
Methods for summarizing a sample or a distribution of value; used to describe phenomena
Inferential statistic
Methods for drawing conclusions based on values; used to generalize inferences beyond a given sample: The average number is significantly higher than 5
Elements of empirical methods in NLP
Evaluation measures
Effectiveness:
Eficiency:
Classification Effectiveness
Instance types in the evaluation:
When to use accuracy?
When not to use accuracy?
Precision
Recall
F1-score
Boundary errors and Issues
A common error in tasks where text spans need to be annotated is to choose a wrong boundary of the span
Issues
- leads to both an FP und an FN
- Identifying nothing as positive would increase the F1-score
How to deal with boundary errors
Evaluation of multi-class tasks
Micro-averaged precision
Micro-averaging takes into account the number of instances per class, so larger classes get more importance
Macro-averaged precision:
Macro-averaging computes the mean result over all classes, so each class gets the same importance
Confusion matrix
Why confusion matrices
Types of prediction errors
Sometimes, also the root mean squared error (RMSE) is computed, defined as RMSE = Sqrt(MSE)
Empirical Experiments:
Intrinsic vs extrinsic effectiveness evaluation:
x
y
Text corpora
Need for text corpora:
Annotation: