Describe the task of sentence segmentation
determine boundaries between sentences
Describe the ambiguity problem in sentence segmentation
There are many different eos characters and styles.
e.g. “xyz”. in UK but “xyz.” in America
If we are thinking of medical writing, a sentence may not begin with a capital letter if it is a protein or something
Describe the approaches to sentence segmentation (5)
Give 2 example rules for sentence segmentation
- dictionary of abbreviations
What tools are available for sentence segmentation
openNLP sentence detection (ml based)
Spacy (ml or rule based)
What is domain dependence
This is the idea that the use of language is different based on its domain, e.g. medical writing