9 Flashcards by Unknown Unknown

Which statement is not correct?
a. BERT, GPT, and Llama are all large language models.
b. ChatGPT uses an encoder-based architecture.
c. LSTMs are a special kind of RNN
d. Bidirectional LSTMs read a sentence from left to right and from right to left, in that way, it is already capable of understanding words in its context to some extent.

b. ChatGPT uses an encoder-based architecture.

How well did you know this?

Not at all

Perfectly

Which sentence is correct?
a. Stemming will retain the stem of a word and make sure that the remaining word is an existing one.
b. (De)capitalization, emoji removal, and transformers are preprocessing methods.
c. Bag of Words is an advanced approach to featurize documents which is not straightforward and very inexpensive when there is a large vocabulary.
d. TF-IDF stands for Term Frequency Inverse Document Frequency and this technique takes into account the word occurrences as well as the number of times the words occur in different documents and normalizes for the latter.

d. TF-IDF stands for Term Frequency Inverse Document Frequency and this technique takes into account the word occurrences as well as the number of times the words occur in different documents and normalizes for the latter.

How well did you know this?

Not at all

Perfectly

Which statement about BERT is not correct:
a. BERT is pretrained using two training steps: masked language modeling and next sentence prediction.
b. Masked language modeling means that randomly words are masked out and the model has to guess what word should be at the masked place.
c. BERT is not able to understand words in its context.
d. BERT is pretrained in an unsupervised manner on enormous amounts of data.

c. BERT is not able to understand words in its context.

How well did you know this?

Not at all

Perfectly

RNN stands for:
a. Recurrent Natural Network
b. Retrainable Neural Network
c. Recurrent Neural Network
d. Random Natural Network

c. Recurrent Neural Network

How well did you know this?

Not at all

Perfectly

Which statement is correct?
a. A neural network typically only consists of an input and output layer.
b. CamemBERT is a French version of BERT specialized in classifying text about types of cheese.
c. RoBERTa is an encoder-based model.
d. BERT is pretrained on labeled data using 2 pretraining steps.

c. RoBERTa is an encoder-based model.

How well did you know this?

Not at all

Perfectly

Natural Language processing can be defined as follows:
a. Natural language processing is a collection of computational techniques for automatic analysis and representation of human languages.
b. Natural language processing is the processing of sounds produced in human language in order to identify and respond to those sounds.
c. Natural Language Processing is the field of studies focused solely on the creation of of chatbots like ChatGPT.
d. Natural Language Processing is the field where units of textual information are found such as documents by matching a user’s search terms.

a. Natural language processing is a collection of computational techniques for automatic analysis and representation of human languages.

How well did you know this?

Not at all

Perfectly

What sentence is correct?
a. One hot encodings are an example of sparse distributed representations
b. Bag of words only uses 1-grams to featurize documents.
c. Stopword removal is a preprocessing method that removes unexisting words from sentences.
d. Using stemming, the word ‘stopping’ would become ‘stop’.

a. One hot encodings are an example of sparse distributed representations

How well did you know this?

Not at all

Perfectly

Which sentence is not correct?
a. If the vocabulary is (train, car, light, candle), possible one-hot encodings are candle=[1 0 0 0], light=[0 1 0 0], car=[0 0 1 0], train=[0 0 0 1].
b. Dense distributed representations typically have a smaller size of the vocabulary than the dimension of the representation.
c. Continuous Bag of Words and Skip-gram are two possible architectures for creating Word2Vec word embeddings.
d. Pretrained word embeddings are vector representations of words that also include syntactic and semantic word relationships.

b. Dense distributed representations typically have a smaller size of the vocabulary than the dimension of the representation.

How well did you know this?

Not at all

Perfectly

Which sentence is correct?
a. The continuous bag of words architecture predicts the context based on a word.
b. Simple methods like logistic regression with TF-IDF are not useful anymore, now large language models exist.
c. Changing the word ‘changing’ to ‘chang’ is an example of lemmatization.
d. Pretrained word embeddings trained with the Word2Vec architecture can make analogies such as vec(paris)-vec(France)+vec(Belgium)=vec(Brussels)

d. Pretrained word embeddings trained with the Word2Vec architecture can make analogies such as vec(paris)-vec(France)+vec(Belgium)=vec(Brussels)

How well did you know this?

Not at all

Perfectly

Which statement is not correct?
a. GloVe captures global corpus statistics, and therefore addresses Word2Vec’s disadvantage of only taking into account local context.
b. FastText can make word representations for unknown words because the word representations are based on the full word.
c. Word2Vec can only make representations for known words that are in the vocabulary, FastText addresses this disadvantage.
d. Bag of Words, TF-IDF, and pretrained word embeddings are used to encode words in sentences.

b. FastText can make word representations for unknown words because the word representations are based on the full word.

How well did you know this?

Not at all

Perfectly

Which of the following methods captures the most context of the sentence?
a. Logistic regression + TF-IDF
b. Recurrent neural network
c. Bidirectional LSTM
d. RoBERTa

d. RoBERTa

How well did you know this?

Not at all

Perfectly

Which statement about Bidirectional LSTMs is correct?
a. A Bidirectional LSTM is a type of Recurrent Neural Network
b. Bidirectional LSTMs look at words on an individual level, not looking at the sentence.
c. Bidirectional LSTMs only read sentences from left to right
d. A Bidirectional LSTM consists of one LSTM layer.

a. A Bidirectional LSTM is a type of Recurrent Neural Network

How well did you know this?

Not at all

Perfectly

Which statement is not correct?
a. Transformer architectures can be used for chatbots, text classification, and machine translation.
b. Positional encodings of words are considered in a transformer architecture.
c. Attention mechanisms are located in the output layer of a transformer.
d. During training, the ground truth is used as input for the decoder part of the transformer.

c. Attention mechanisms are located in the output layer of a transformer.

How well did you know this?

Not at all

Perfectly

Which sentence is correct?
a. In the pretraining step next sentence prediction, the next sentence is generated based on the previous one.
b. Finetuning BERT happens in an unsupervised manner using task-specific data.
c. RoBERTa differs from BERT as RoBERTa is decoder-based and BERT encoder-based.
d. RobBERT is a dutch variant of the BERT model.

d. RobBERT is a dutch variant of the BERT model.

How well did you know this?

Not at all

Perfectly

Which statement is correct?
a. GPT is based on the ChatGPT architecture.
b. Jailbreaking is circumventing limits that were placed on the model.
c. BERT is an autoregressive model.
d. Prompt engineering is the field of studies that learns to simulate conversations with users.

b. Jailbreaking is circumventing limits that were placed on the model.

How well did you know this?

Not at all

Perfectly

Quantization techniques are used to:
a. Count the number of parameters in a model.
b. Project weights of models in their quadrants.
c. Give an insight into the predictions of models.
d. Make models smaller so that they need less computation power/resources.

Study These Flashcards

d. Make models smaller so that they need less computation power/resources.

Which sentence is not correct.
a. Currently, specific versions of Llama models can run on a CPU.
b. Large language models are a type of chatbot.
c. Counterfactual data augmentation is an example of a debiasing technique.
d. Chatbots can be used be used for keyword extraction.

Study These Flashcards

b. Large language models are a type of chatbot.

How do societal biases slip into models?
a. It is in the training data used to train the models.
b. It is hardcoded into the model by the makers.
c. Bias is automatically mitigated through training the models.
d. Bias is entailed in the test data used to test the models.

Study These Flashcards

a. It is in the training data used to train the models.

Split the following sentence into 3-grams, assuming that we split by words: ‘The annoying people in the train talked so loud that I could not concentrate well’.
a. (The annoying people, in the train, talked so loud, that I could, not concentrate well)
b. (The annoying, annoying people, people in, in the, the train, train talked, talked so, so loud, loud that, that I, I could, could not, not concentrate, concentrate well)
c. (The annoying people, annoying people in, people in the, in the train, the train talked, train talked so, talked so loud, so loud that, loud that I, that I could, I could not, could not concentrate, not concentrate well)
d. (The annoying people, people in the, the train talked, talked so loud, loud that I, I could not, not concentrate well)

Study These Flashcards

c. (The annoying people, annoying people in, people in the, in the train, the train talked, train talked so, talked so loud, so loud that, loud that I, that I could, I could not, could not concentrate, not concentrate well)

Change the following words using lemmatization: ‘children’, ‘cared’,’bumped’:
a. ‘childr’, ‘car’,’bump’
b. ‘child’, ‘care’, ‘bump’
c. ‘child’, ‘care’, ‘bumpe’
d. ‘childr’, ‘care’, ‘bump’

Study These Flashcards

b. ‘child’, ‘care’, ‘bump’

Which of the following statements is correct?
a. The answers that ChatGPT generates are correct as truth was incorporated in the Reinforcement Learning model.
b. ChatGPT can be used for writing academic texts, and will only generate correct references.
c. ChatGPT might overutilize certain sentences such as that it is a language model from OpenAI, due to biases in the training data.
d. The model is not able to generate inappropriate requests.

Study These Flashcards

c. ChatGPT might overutilize certain sentences such as that it is a language model from OpenAI, due to biases in the training data.

GPT stands for:
a. General prompting transformer
b. Generative prompting transformer
c. Generative pretrained transformer
d. General pretrained transformer

Study These Flashcards

c. Generative pretrained transformer

BERT stands for:
a. Booming Encoder Representations from Transformers
b. Bag of Encoder Representations from Transformers
c. Big Encoder Representations from Transformers
d. Bidirectional Encoder Representations from Transformers

Study These Flashcards

d. Bidirectional Encoder Representations from Transformers

Which statement is not correct?
a. ChatGPT is trained using reinforcement learning with human feedback.
b. ChatGPT is finetuned in an unsupervised manner.
c. ChatGPT uses data that is demonstrated and ranked by a labeler and then, a reward model is trained using reinforcement learning.
d. Reinforcement learning is the field where a model aims to get the highest reward and tries different solutions to get this.

Study These Flashcards

b. ChatGPT is finetuned in an unsupervised manner.

25. Which statement is correct? a. ChatGPT can be used as a model for research because it is very clear how the model behind it was trained. b. Sentencedebias is a projection-based debiasing technique c. It is not necessary to carefully design prompts for chatbots, as they are not sensitive to the given input. d. The ouput of ChatGPT is steady across time.

b. Sentencedebias is a projection-based debiasing technique

9 Flashcards

(25 cards)