Algorithms and Models Flashcards

(47 cards)

1
Q

What is a Foundation Model?

A

Broad and wide modal that can do a lot (trained on a lot of data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a LLM?

A

Large Language model designed to generate human like text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Are LLM Deterministic or NonDeterministic?

A

NonDeterministic - they generate different responses each tim

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Diffusion Model used for?

A

Generative AI for images from text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does a Diffusion Model work?

A

It creates noise from an image and uses reverse diffusion to create an image from noise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the name of the high performing FM from AWS?

A

AWS Titan

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is RAG?

A

Retrieval Augmented Generation - allows FM to reference a data source outside of its training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Multi-model?

A

It can take multiple inputs (text, img, etc) and can generate variety of outputs (e.g cat photo that has audio to it)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is GPT?

A

Generative Pre-trained Transformer - generates human text or computer code based on input propmpts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is BERT often used for?

A

Bidirectional Encoder Representations for Transformers often used for text analysis (sentiment analysis, understanding documents)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is RNN used for?

A

Recurrent Neural Network - Meant for sequential data such as time-series or text

Often used for Speech Recognition and
Time Series Prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is ResNet used for?

A

Residual Network - Deep Convolutional Neural Network (CNN) used for image recognition tasks, object detection and facial recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is SVM used for?

A

Support Vector Machine - ML algorithm used for classification and regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is WaveNet used for

A

Model to generate raw audio waveform often used for speech synthesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is GAN used for?

A

Generative Adversarial Network - generate sythetic data such as images, videos that resemble traiing data

Often used for Data augmentation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is XGBoost used for?

A

Extreme Gradient Boosting - implement gradient boosting for data augmentation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Type of ML learning that needs labeled data for training

A

Supervised Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Downside of Supervised Larning

A

OFten difficult to perform on millions of data points because you need labeled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Type of ML Supervised Learning that predicts numeric values based on input. Output is continuous meaning it can take any value within range

A

Regression (e.g predict house prices, weather forcast)

20
Q

Type of ML Supervised Learning that predicts categorial label of input data

A

Classification (e.g spam not spam, animals in zoo)

21
Q

Type of ML Learning with Unlabled Input Data (and humans label output data)

A

Unsupervised Learning

22
Q

Unsupervised Learning Technique that groups similar data points together based on their features

A

Clustering Technique

23
Q

Example of Clustering Technique (K means clustering)

A

Customer Segmentation - customer wants to segment customers to understand different purchasing behavior (looking at customer purchase history)

24
Q

Technique that associates one item with another

A

Association Rule Learning Technique

25
What algorithm uses the association rule learning technique?
Apriori algorithm
26
Example of the Association Rule Learning Technique
Market Basekt Analysis Supermarket want to understand which products are frequently bought together using past customer purchase records. (bread & butter often put together) Outcome: supermarket can place associated products together to boost sales
27
Unsupervised Learning Technique often used to detect fraud
Anomlay Detection Technique
28
Data structure associated with Anomlay Detection Technique
Isolation Forest
29
Type of ML Learning that includes a small amount of labeled data and large amount of unlabeled data to train a system.
Semi-supervised learning
30
Steps of Semi-Supervised Learning
1) train on labeled data 2) train on unlabeled data 3) retrain on whole data set
31
Type of AI Learning where model generates pseduo labels for its own data without having humans label ANY data
Self-supervised Learning
32
Use Cases for Self-supervised Learning
Huge amount of text data, and want the model to learn english language, grammar, meaning of words and relationship between words.
33
What models use self-supervised learning
BERT and GPT
34
Type of machine learning where agent learns to make decisions by performing actions to maximizze cumulative rewards
Reinforcement Learning (RL)
35
4 steps of creating a Reinforcement Learning From Human Feedback (RLHF) model
1) Data Collection - set of human generated prompts and responses 2) Supervised fine tuning of langue model - fine tune existing model with human internal knowledge. 3) Build separate reward model - humans indicate which responses they prefer from the same prompt 4) Optimize language model with reward-based model
36
Confusion matrix to take predicted values and actual value to maximize the # of true positive and minimize# of false positive and false negatives
Binary Classification Evaluation Metrics
37
Best way to evaluate the performance of a model that does classifications
Binary Classification Evaluation Metrics
38
Binary Classification Evaluation Metric - Best when false positives are costly
Precision
39
Binary Classification Evaluation Metric - Best when false negatives are costly
Recall
40
Binary Classification Evaluation Metric - best when want to balance between precision and recall (imbalanced datasets)
F1
41
Binary Classification Evaluation Metric - best for balanced dataset
Accuracy
42
Algorihtm used to try to find best model with trying to perform binary classification
AUC-ROC
43
What does AUC-ROC stand for? And what values does it use?
Area under the curve-receiver operator curve. Use sensitivity (values 0-1) How often your model is classified not-spam as spam (1-specificity)
44
Regressions metrics to predict continuous values (regressions)
MAE - Mean Absolute Error MAPE - Mean Absolute Percentage Error RMSE - Root mean square error R (squared)
45
Regression metric that is often used to measure variance in your model
R (squared)
46
What does R (squared) value of 1 mean
It means that predictions are 100% accurate
47
Difference between a SLM and LLM
LLM are more powerful high latecy models that MUST be run on remote server with internet access vs. SLM are light weight. Can run on-edge with offline capabilities