b. ChatGPT uses an encoder-based architecture.
d. TF-IDF stands for Term Frequency Inverse Document Frequency and this technique takes into account the word occurrences as well as the number of times the words occur in different documents and normalizes for the latter.
c. BERT is not able to understand words in its context.
c. Recurrent Neural Network
c. RoBERTa is an encoder-based model.
a. Natural language processing is a collection of computational techniques for automatic analysis and representation of human languages.
a. One hot encodings are an example of sparse distributed representations
b. Dense distributed representations typically have a smaller size of the vocabulary than the dimension of the representation.
d. Pretrained word embeddings trained with the Word2Vec architecture can make analogies such as vec(paris)-vec(France)+vec(Belgium)=vec(Brussels)
b. FastText can make word representations for unknown words because the word representations are based on the full word.
d. RoBERTa
a. A Bidirectional LSTM is a type of Recurrent Neural Network
c. Attention mechanisms are located in the output layer of a transformer.
d. RobBERT is a dutch variant of the BERT model.
b. Jailbreaking is circumventing limits that were placed on the model.
d. Make models smaller so that they need less computation power/resources.
b. Large language models are a type of chatbot.
a. It is in the training data used to train the models.
c. (The annoying people, annoying people in, people in the, in the train, the train talked, train talked so, talked so loud, so loud that, loud that I, that I could, I could not, could not concentrate, not concentrate well)
b. ‘child’, ‘care’, ‘bump’
c. ChatGPT might overutilize certain sentences such as that it is a language model from OpenAI, due to biases in the training data.
c. Generative pretrained transformer
d. Bidirectional Encoder Representations from Transformers
b. ChatGPT is finetuned in an unsupervised manner.