What is NLP?
Natural Language Processing
What do we mean by Natural Language?
Natural vs. Artificial Language
What are the two sides of NLP?
NLP is either NLU or NLG.
Who was the first person to think about can AI exist?
2000 years ago. Aristotle with toys and statues. Create a create that it would fool people into thinking it was real. Aristotle said no. Liebnez in late 1800s as well.
What are the applications of NLU?
Automated Text Annotation
Corpus analytics (corpus is document collection)
Search applications
Advanced applications
What are the applications of NLG?
Often a way of make NLU more digestable to humans.
Text annotation (fancier version of NLU) -Document summarization, e.g., make new sentence out of old sentence and make that the title of the article
Corpus analytics
Search applications
Advanced applications
What is a token?
A token is the technical name for a sequence of characters — such as hairy, his, or :) — that we want to treat as a group.
What is a collocation?
A collocation is a sequence of words that occur together unusually often. Thus red wine is a collocation,
What is a collocation?
A collocation is a sequence of words that occur together unusually often. Red wine is a collocation, machine learning, social media. When you split them up, they mean something different, flat screen. NLTK has a collocations() function.
What is a bigram?
A list of word pairs. NLTK has a bigrams function.
What is a Turing test?
Can a dialogue system, responding to a user’s text input, perform so naturally that we cannot distinguish it from a human-generated response?
What is RoBERTa?
An optimized method for pretraining self-supervised NLP systems
What is NER?
Named Entity Recognition. Locating entities in unstructured text.
What is Document Classification?
NLU application. Example is a spam filter. Amazon does large classification massive classification of product categories.
What is Search: Query Understanding?
Inferring the intent and meaning of a search engine users queries.
Query Segmentation: partition the queries into semantic units Query Scoping (NER): map query segments to entity types Query Expansion: broaden the query by adding additional phrases/tokens (usually synonyms and abbreviations, e.g., ML or Machine Learning, developer or engineer) Query Relaxation: make the query less restrictive by removing tokens (Black propane grill vs. propane grill)
What are Dialog Systems?
Conversational agents, conversational ai, chatbots.
What is machine translation?
NLG applications. Translate source text in one language to another. Google is the best at translation at scale. ML techniques and NLP.
What is Document Summarization?
Generate accurate summaries of longer text, e.g., model-written headline
What is lexical diversity?
Calculating the range of different words used in a text. A greater range = more lexical diversity.
What is a WSD?
Word sense disambiguation - identifying words with different meanings (see levels of nlp for more)
What is a polysemic word?
A word that has multiple meanings depending on context. “The bank will be closed this Saturday” and “The river overflowed the bank.” A lexicon tries to list word senses in order from most common to least common in occurrences.