Natural Language processing Flashcards

(19 cards)

1
Q

Define natural language processing

A

goal is to make machines to understand and interpret human language the way it is written or spoken

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are different levels of linguistic analysis used in NLP

A
  • syntax – what part of given text is grammatically right
  • semantics – what is the meaning of given text
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does NLU do

A

tries to understand the meaning of given text – the nature and structure of each word inside text must be known for
NLU

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are applications of natural language understanding

A
  • Search and Information Retrieval
  • Enhancing search results by identifying key information, like people or locations, for better relevance.
  • Word Prediction
  • Predicting the next word in a sentence based on context and prior words.
  • Text Classification
  • Categorising text into predefined categories (e.g., spam vs. not spam, topic classification).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is sentence segmentation

A

identify sentence boundaries in the given text, i.e., where one sentence ends and
where another sentence begins; sentences are often marked ended with punctuation mark ‘.’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is tokenisation

A

identify different words, numbers, and other punctuation symbols

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is stemming

A

strip the ending of words like ‘eating’ is reduced to ‘eat’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is part of speech (POS) tagging

A

assign each word in a sentence its own part-of-speech tag such as designating
word as noun or adverb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is named entity recognition

A

identify entities such as persons, location and time within the documents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is co reference (discourse) resolution

A

define the relationship of the given word in a sentence with previous and
next sentence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does stemming do

A

try to find the base form of words

Stemming usually refers to a crude heuristic process that chops off the ends of words
in the hope of achieving this goal correctly most of the time, and often includes the removal
of derivational affixes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does lemmaliser do

A

Lemmatization usually refers to doing things properly with the use of a vocabulary and
morphological analysis of words, normally aiming to remove inflectional endings only
and to return the base or dictionary form of a word, which is known as the lemma.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are stop words

A

Stop words usually refer to the most common words such as “and”
“the”, “a” in a language,
but there is no single universal list of stopwords. The list of the stop words can change
depending on your application.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is part of speech

A

The part of speech explains how a word is used in a sentence. There are eight main parts of speech - nouns,
pronouns, adjectives, verbs, adverbs, prepositions, conjunctions and interjections.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

WHat is bag of words

A

Any information about the order or structure of words is discarded. That’s why it’s called
a bag of words. This model is trying to understand whether a known word occurs in a
document, but don’t know where is that word in the document.
The intuition is that similar documents have similar contents. Also, from a content, we can
learn something about the meaning of the document.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Term Frequency(TF)

A

a scoring of the frequency of the word in the current document.
This part measures how often a word appears in a document.
The more frequently a word appears, the higher its TF score for that document.

17
Q

what is inverse Term. frequency

A

a scoring of how rare the word is across documents.
This part measures how unique or rare a word is across multiple documents in the corpus.
The rarer the word, the higher its IDF score.

18
Q

What does N-Gram do

A

basic idea underlying the statistical approach to word prediction is to use the probabilities of SEQUENCES OF
WORDS to choose the most likely next word

19
Q

What is the Markov assumption

A

only prior local context – the last few words – affects the next word
* making the Markov assumption for word prediction means assuming
that the probability of a word only depends on the previous N-1 words
(N-GRAM model)