Define the eight features used for pronoun resolution. State the extraction method if the feature is hard to get.
What is a baseline?
What is a ceiling?
A baseline is a score given by a relatively simple approach which is used as a standard against which the approach under investigation is compared.
Ceiling is the maximum performance that could be expected, generally the agreement achieved between two or more humans performing the task.
Why might a discourse model be used over a Naive Bayes model in resolving pronouns?
May not produce a globally consistent answer.
Quite likely that the classifier would propose that ‘he’ and ‘it’ refer to Burns in the example given.
In discourse model, fixed binding gives information on the bound pronoun, thus giving global consistency. The model also allows ‘repeated mention’ heuristic, something impossible for a single pass classifier.
Define morphological ambiguity. Give an example.
Words that can be decomposed into different sets of morphemes.
For example, unionised can be seen as un-ion-ise-ed, or union-ise-ed.
Define lexical ambiguity. Give an example.
Arises when a word has multiple senses.
For example, the word ‘duck’ could be an action or an animal.
Define syntactic/structural ambiguity. Give an example.
Multiple ways of bracketing an expression.
He ate the pizza with a fork.
The prepositional phrase ‘with a fork’ can be bind to ‘he’ or ‘the pizza’.
Define discourse relation ambiguity. Give an example.
Implicit relationship between sentences.
Max fell. John pushed him.
Describe the packing algorithm. What is it good for?
Packing is an optimization on the chart parsing. We record multiple derivations of a possible phrase in the same edge.
Works because rule application is not sensitive to the internal structure of an edge.
It can be proven that the algorithm can run in cubic time. It stops entries in the list from growing exponentially. However, unpacking takes exponential time…
Given a string of words, how do we compute the most likely tags?

Define:
Describe Yarowsky’s minimally-supervised learning approach to word sense disambiguation.

What can we do to avoid P(w_n | w_{n-1}) bigrams being zero?
Define four notions of context.
Define three ways to weigh context.

How do we combine visual and text words?
How do you learn adjective matrices from corpus data?

Define the seven major tasks in content generation.
What is a chart? What is it useful for?
Distinguish inflectional and derivational morphology.