General expressions Flashcards

(65 cards)

1
Q

What is a GAN?

A

A generative model with two neural networks — a Generator (creates fake data) and a Discriminator (detects fake vs real) — trained in competition to produce realistic synthetic data (e.g., images).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Key AWS-relevant idea about GANs?

A

Used for synthetic data generation, image creation, and data augmentation in ML workflows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a VAE?

A

A generative model that encodes input data into a probability distribution (latent space) and then decodes samples from that distribution to generate new data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

GAN vs VAE difference?

A

VAEs optimize probabilistic reconstruction (more stable training), while GANs use adversarial training (often sharper outputs).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Diffusion Model?

A

A generative model that gradually adds noise to data and then learns to reverse the noise process to generate high-quality synthetic outputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why diffusion model is important?

A

Powers modern image generators (e.g., Stable Diffusion-type models). Known for stability and high-quality outputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is RAG?

A

A technique that combines retrieval systems (e.g., vector databases) with LLMs to fetch relevant documents before generating a response.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why use RAG?

A

Improves accuracy, reduces hallucinations, and enables models to use up-to-date or private enterprise data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is BERT?

A

BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based language model that understands context by reading text bidirectionally (left and right).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is BERT mainly used for?

A

Text classification, sentiment analysis, question answering, and named entity recognition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is ROUGE?

A

A metric used to evaluate text summarization by measuring overlap between generated text and reference summaries (recall-focused).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does ROUGE measure?

A

N-gram overlap, longest common subsequence, and recall similarity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is BLEU?

A

A metric used to evaluate machine translation by measuring precision-based n-gram overlap between generated and reference text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

BLEU vs ROUGE difference?

A

BLEU focuses on precision (common in translation). ROUGE focuses on recall (common in summarization).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is SVD?

A

A matrix factorization technique that decomposes a matrix into three matrices: A = U Σ V^T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is SVD important in ML?

A

Used for dimensionality reduction, noise reduction, and recommendation systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Where is SVD commonly applied in NLP?

A

In Latent Semantic Analysis (LSA).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Word2Vec?

A

A neural network model that learns dense vector representations (embeddings) of words based on context.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Two Word2Vec architectures?

A
  • CBOW (predict word from context) * Skip-gram (predict context from word)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Word2Vec Key idea?

A

Words appearing in similar contexts have similar vector representations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is PCA?

A

A dimensionality reduction technique that transforms data into new orthogonal axes (principal components) that maximize variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Core idea formula:

A

Maximize Variance = λ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why use PCA?

A

Reduce features while retaining most important information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is LSA?

A

An NLP technique that uses SVD on a term-document matrix to uncover hidden (latent) relationships between words and documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Why is LSA useful?
Improves semantic search and topic modeling by reducing noise and dimensionality.
26
What is an SVM?
A supervised learning algorithm that finds the optimal hyperplane separating classes with maximum margin.
27
Key idea?
Maximize the margin between support vectors and decision boundary.
28
What allows SVMs to handle non-linear data?
The kernel trick (e.g., RBF kernel).
29
What is Logistic Regression?
A classification algorithm that predicts probabilities using the sigmoid function.
30
Sigmoid function:
f(x)= rac{1}{1+e^{-x}}
31
Output range?
Between 0 and 1 (probability).
32
Used for?
Binary classification problems.
33
What is a Decision Tree?
A supervised learning model that splits data based on feature values to make predictions.
34
How does it decide splits?
Using metrics like Gini impurity or Information Gain.
35
Advantage?
Easy to interpret and explain.
36
What is a vector in ML?
An ordered list of numbers representing data in multi-dimensional space.
37
Why are vectors important?
ML models operate on numerical vector representations.
38
Common operation?
Dot product (measures similarity).
39
What are embeddings?
Dense vector representations of objects (text, images, users, etc.) that capture semantic meaning.
40
Why are embeddings important in generative AI?
Enable similarity search, clustering, and Retrieval-Augmented Generation (RAG).
41
Example use case?
Converting text into vectors for vector databases.
42
Batch Size
The number of training samples processed before the model updates its weights.
43
Small vs Large batch size?
* Small → noisier updates, better generalization * Large → faster training, more stable gradients
44
Top-K Sampling
A text generation method that selects the next token from the **K most probable tokens**.
45
Effect of smaller K?
More deterministic output. Larger K → more diversity.
46
Top-P (Nucleus Sampling)
Selects the smallest set of tokens whose cumulative probability ≥ P, then samples from them.
47
Why use Top-P instead of Top-K?
It dynamically adjusts the candidate pool based on probability distribution.
48
Temperature
A parameter that controls randomness in token selection.
49
Low vs High temperature?
* Low (e.g., 0.1) → deterministic, focused * High (e.g., 1.0+) → creative, diverse
50
Confusion Matrix
A table used to evaluate classification performance by comparing predicted vs actual values.
51
What are the 4 components?
* True Positive (TP) * True Negative (TN) * False Positive (FP) * False Negative (FN)
52
Why important?
Helps compute accuracy, precision, recall, and F1-score.
53
Root Mean Squared Error (RMSE)
A regression metric measuring the square root of average squared prediction errors.
54
Formula:
RMSE = \sqrt{\frac{1}{n}\sum (y_i - \hat{y}_i)^2}
55
Why square errors?
Penalizes large errors more heavily.
56
Mean Absolute Error (MAE)
A regression metric measuring the average absolute difference between predictions and actual values.
57
Formula:
MAE = \frac{1}{n}\sum |y_i - \hat{y}_i|
58
RMSE vs MAE difference?
RMSE penalizes large errors more; MAE treats all errors equally.
59
Correlation Matrix
A table showing correlation coefficients between multiple variables.
60
Correlation coefficient range?
From -1 to +1. * +1 → perfect positive correlation * -1 → perfect negative correlation * 0 → no linear correlation
61
Shapley Values
A model explainability method that assigns each feature a contribution value to a specific prediction.
62
Why important?
Provides fair, mathematically grounded feature importance.
63
Used in?
Model interpretability for complex models (e.g., tree ensembles, neural networks).
64
Partial Dependence Plots (PDP)
A visualization showing the marginal effect of one feature on model predictions.
65
Why useful?
Helps understand how a feature influences output while averaging out other features.