General expressions Flashcards by Fernando Schmidt

What is a GAN?

A generative model with two neural networks — a Generator (creates fake data) and a Discriminator (detects fake vs real) — trained in competition to produce realistic synthetic data (e.g., images).

How well did you know this?

Not at all

Perfectly

Key AWS-relevant idea about GANs?

Used for synthetic data generation, image creation, and data augmentation in ML workflows.

How well did you know this?

Not at all

Perfectly

What is a VAE?

A generative model that encodes input data into a probability distribution (latent space) and then decodes samples from that distribution to generate new data.

How well did you know this?

Not at all

Perfectly

GAN vs VAE difference?

VAEs optimize probabilistic reconstruction (more stable training), while GANs use adversarial training (often sharper outputs).

How well did you know this?

Not at all

Perfectly

What is a Diffusion Model?

A generative model that gradually adds noise to data and then learns to reverse the noise process to generate high-quality synthetic outputs.

How well did you know this?

Not at all

Perfectly

Why diffusion model is important?

Powers modern image generators (e.g., Stable Diffusion-type models). Known for stability and high-quality outputs.

How well did you know this?

Not at all

Perfectly

What is RAG?

A technique that combines retrieval systems (e.g., vector databases) with LLMs to fetch relevant documents before generating a response.

How well did you know this?

Not at all

Perfectly

Why use RAG?

Improves accuracy, reduces hallucinations, and enables models to use up-to-date or private enterprise data.

How well did you know this?

Not at all

Perfectly

What is BERT?

BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based language model that understands context by reading text bidirectionally (left and right).

How well did you know this?

Not at all

Perfectly

What is BERT mainly used for?

Text classification, sentiment analysis, question answering, and named entity recognition.

How well did you know this?

Not at all

Perfectly

What is ROUGE?

A metric used to evaluate text summarization by measuring overlap between generated text and reference summaries (recall-focused).

How well did you know this?

Not at all

Perfectly

What does ROUGE measure?

N-gram overlap, longest common subsequence, and recall similarity.

How well did you know this?

Not at all

Perfectly

What is BLEU?

A metric used to evaluate machine translation by measuring precision-based n-gram overlap between generated and reference text.

How well did you know this?

Not at all

Perfectly

BLEU vs ROUGE difference?

BLEU focuses on precision (common in translation). ROUGE focuses on recall (common in summarization).

How well did you know this?

Not at all

Perfectly

What is SVD?

A matrix factorization technique that decomposes a matrix into three matrices: A = U Σ V^T

How well did you know this?

Not at all

Perfectly

Why is SVD important in ML?

Used for dimensionality reduction, noise reduction, and recommendation systems.

How well did you know this?

Not at all

Perfectly

Where is SVD commonly applied in NLP?

In Latent Semantic Analysis (LSA).

How well did you know this?

Not at all

Perfectly

What is Word2Vec?

A neural network model that learns dense vector representations (embeddings) of words based on context.

How well did you know this?

Not at all

Perfectly

Two Word2Vec architectures?

CBOW (predict word from context) * Skip-gram (predict context from word)

How well did you know this?

Not at all

Perfectly

Word2Vec Key idea?

Words appearing in similar contexts have similar vector representations.

How well did you know this?

Not at all

Perfectly

What is PCA?

A dimensionality reduction technique that transforms data into new orthogonal axes (principal components) that maximize variance.

How well did you know this?

Not at all

Perfectly

Core idea formula:

Maximize Variance = λ

How well did you know this?

Not at all

Perfectly

Why use PCA?

Reduce features while retaining most important information.

How well did you know this?

Not at all

Perfectly

What is LSA?

An NLP technique that uses SVD on a term-document matrix to uncover hidden (latent) relationships between words and documents.

How well did you know this?

Not at all

Perfectly

Why is LSA useful?

Improves semantic search and topic modeling by reducing noise and dimensionality.

What is an SVM?

A supervised learning algorithm that finds the optimal hyperplane separating classes with maximum margin.

Key idea?

Maximize the margin between support vectors and decision boundary.

What allows SVMs to handle non-linear data?

The kernel trick (e.g., RBF kernel).

What is Logistic Regression?

A classification algorithm that predicts probabilities using the sigmoid function.

Sigmoid function:

f(x)=rac{1}{1+e^{-x}}

Output range?

Between 0 and 1 (probability).

Used for?

Binary classification problems.

What is a Decision Tree?

A supervised learning model that splits data based on feature values to make predictions.

How does it decide splits?

Using metrics like Gini impurity or Information Gain.

Advantage?

Easy to interpret and explain.

What is a vector in ML?

An ordered list of numbers representing data in multi-dimensional space.

Why are vectors important?

ML models operate on numerical vector representations.

Common operation?

Dot product (measures similarity).

What are embeddings?

Dense vector representations of objects (text, images, users, etc.) that capture semantic meaning.

Why are embeddings important in generative AI?

Enable similarity search, clustering, and Retrieval-Augmented Generation (RAG).

Example use case?

Converting text into vectors for vector databases.

Batch Size

The number of training samples processed before the model updates its weights.

Small vs Large batch size?

* Small → noisier updates, better generalization * Large → faster training, more stable gradients

Top-K Sampling

A text generation method that selects the next token from the **K most probable tokens**.

Effect of smaller K?

More deterministic output. Larger K → more diversity.

Top-P (Nucleus Sampling)

Selects the smallest set of tokens whose cumulative probability ≥ P, then samples from them.

Why use Top-P instead of Top-K?

It dynamically adjusts the candidate pool based on probability distribution.

Temperature

A parameter that controls randomness in token selection.

Low vs High temperature?

* Low (e.g., 0.1) → deterministic, focused * High (e.g., 1.0+) → creative, diverse

Confusion Matrix

A table used to evaluate classification performance by comparing predicted vs actual values.

What are the 4 components?

* True Positive (TP) * True Negative (TN) * False Positive (FP) * False Negative (FN)

Why important?

Helps compute accuracy, precision, recall, and F1-score.

Root Mean Squared Error (RMSE)

A regression metric measuring the square root of average squared prediction errors.

Formula:

RMSE = \sqrt{\frac{1}{n}\sum (y_i - \hat{y}_i)^2}

Why square errors?

Penalizes large errors more heavily.

Mean Absolute Error (MAE)

A regression metric measuring the average absolute difference between predictions and actual values.

Formula:

MAE = \frac{1}{n}\sum |y_i - \hat{y}_i|

RMSE vs MAE difference?

RMSE penalizes large errors more; MAE treats all errors equally.

Correlation Matrix

A table showing correlation coefficients between multiple variables.

Correlation coefficient range?

From -1 to +1. * +1 → perfect positive correlation * -1 → perfect negative correlation * 0 → no linear correlation

Shapley Values

A model explainability method that assigns each feature a contribution value to a specific prediction.

Why important?

Provides fair, mathematically grounded feature importance.

Used in?

Model interpretability for complex models (e.g., tree ensembles, neural networks).

Partial Dependence Plots (PDP)

A visualization showing the marginal effect of one feature on model predictions.

Why useful?

Helps understand how a feature influences output while averaging out other features.

General expressions Flashcards

(65 cards)