Why annotate?
Leech’s 7 maxims of annotation
Types of annotations across linguistic levels
Phonological level
• Syllable boundaries (phonetic/phonemic annotation)
• Prosodic or suprasegmental features (prosodic annotation, e.g. pitch,
loudness, intonation)
Morphological level
• Prefixes, suffixes, stems (morphological annotation)
Lexical level
• Tokenization (essential for Chinese)
• Parts of speech (POS tagging) e.g. present: NN1, VVB, JJ
• Lemmas (lemmatization) stop, stopped, stops, stopping → stop
• Semantic fields (semantic annotation) cricket: sport, insect
Basic annotations on word level
• Plain text (also raw text) only sequences of characters without explicit
information about words or sentences
• Tokenization segmentation of RAW text into words/tokens and sentences
• sequence of characters are divided into words/tokens
• sequences of tokens is divided into sentences
• Stemming and lemmatization
• stemming: cutting off suffixes (no lexicon involved)
• lemmatization: base form taken from a lexicon
• POS-tagging
• labeling each word in a sequence of words with the appropriate part of speech (POS)
Annotation strategies
Syntactic annotation
Tree diagram
Labelled bracketing
Sentence level:
[s1The snake killed the rat and swallowed it]
Clause level:
[s1[c1The snake killed the rat] and [c2swallowed it]]
Phrase level:
[s1[c1[NPThe snake] [VPkilled [NPthe rat]]] and [c2[VPswallowed [NPit]]]]
Word level:
[s1[c1[NP[DTThe] [Nsnake]] [VP[Vkilled] [NP[DTthe] [Nrat]]]] [Conjand]
[c2[VP[Vswallowed] [NP[PPit]]]]]
Semantic annotation
Synonym -> similar context
Homonymy, polysemy -> different context
semantic fields: sense relations (word senses) and some other kinds
of relations (e.g., part-of, related-to etc.)
• annotation (cf. PoS tagging):
• definition of tagging scheme (labels and their meanings)
• tagging scheme: guidelines for application
• in semantics: this is not as easy and straightforward as for PoS
Discourse annotation
coherence: what makes a text hang together in terms of content
• cohesion: the means of making a text hang together
reference,
lexical cohesion,
substitution/ellipsis,
conjunctive relations (cause, result, effect etc.),
thematic development
Anaphoric relations
Links between a proform and an antecedent