Information extraction
The process of Information Extraction turns the unstructured information embedded in texts into structured data, e.g. populating a relational database to enable further processing.
Relation Extraction
Finding and classifying semantic relations among entities mentioned in a text.
RDF triple
A tuple of entity-relation-entity,
called a subject-predicate-object expression.
5 Classes of algorithms for relation extraction
Semisurpervised Relation Extraction via Bootstrapping
If we have a few high-precision seed patterns, or seed tuples, we can bootstrap a classifier.
Bootstrapping proceeds by taking the entities in the seed pair, and then finding sentences (e.g. on the web) that contain both entities.
From all such sentences, we extract and generalize the context around the entities to learn new patterns.
Semantic drift
In semantic drift, an erroneous pattern leads to the introduction of erroneous tuples, which - in turn - leads to the creation of problematic patterns and the meaning of the extracted relations ‘drifts’.
Relation Extraction
Confidence values in bootstrapping
Bootstrapping systems assign confidence values to new tuples to avoid semantic drift.
Given a document collection D, a current set of tuples, T, and a proposed pattern p, we need to track two factors:
hits(p): the set of tuples in T that p matches while looking in D.finds(p): the total set of tuples that p finds in D.Conf(p) = log(|finds(p))|) x |hits(p)| / finds(p)
Distant Supervision for Relation Extraction
Distant supervision combines the advantages of bootstrapping with supervised learning.
Instead of just a handful of seeds, distant supervision uses a large database to acquire a huge number of seed examples, creates lots of noisy pattern features from all these examples, and then combines them in a supervised classifier.
Unsupervised Relation Extraction
Open Information Extraction
A task which has the goal of extracting relations from the web when we have no labeled training data, and not even any list of relations.
Open Information Extraction
ReVerb 4 Steps
ss, find the longest sequence of words w that start with a verb and satisfy syntactic and lexical constraints, merging adjacent matches.w, find the nearest noun phrase x to the left which is not a relative pronoun, wh-word or existential “there”. Find the nearest noun phrase y to the right.c to the relation r = (x, w, y) using a confidence classifier and return it.Temporal expressions
Expressions that refer to absolute points in time, relative times, durations and sets of those.
Absolute temporal expressions can be mapped directly to calendar dates, times of day, or both.
Relative temporal expressions map to particular times through some other reference point.
Durations denote spans of time at varying levels of granularity.
Temporal Normalization
The process of mapping a temporal expression to either a specific point in time, or to a duration.
Fully qualified date expression
Contains a year, month and day in some conventional form.
Event Extraction
The task of identifying mentions of events in tasks.
7 Allen Relations
A before B
A overlaps B
A meets B
A equals B
A starts B
A finishes B
A during B