AWS MLE Flashcards

(36 cards)

1
Q

What is regularization in machine learning and why is it important?

A

Regularization is the addition of a penalty term to a model’s loss function to discourage overfitting by limiting model complexity. Common methods include L1 (lasso) for feature selection, L2 (ridge) for weight shrinkage, and dropout in neural networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When would you prefer L1 over L2 regularization?

A

Use L1 when feature sparsity and interpretability are desired, since it drives some coefficients exactly to zero. L2 is better for stability and when all features contribute small effects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is model drift and what types are there?

A

Model drift is the degradation of model performance over time as data distribution changes. Two main types:

Data drift (covariate shift): Input feature distribution changes.

Concept drift: Relationship between input and target changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can you detect and mitigate model drift?

A

Monitor prediction quality and statistical distributions of inputs/outputs, set alerts for significant changes, retrain on updated data, use online learning or scheduled retraining pipelines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are common strategies for hyperparameter tuning?

A

Grid search, random search, Bayesian optimization, Hyperband, population-based training. Random search often outperforms grid for high-dimensional spaces.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does Bayesian optimization improve tuning efficiency?

A

It builds a surrogate model (like Gaussian Process) of the objective function, then uses acquisition functions (e.g., expected improvement) to balance exploration vs. exploitation, reducing number of evaluations needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the key components of a scalable ML system design?

A
  1. Data ingestion & preprocessing
  2. Feature store
  3. Training pipeline
  4. Model registry
  5. Deployment endpoints
  6. Monitoring (data quality, drift, performance)
  7. Retraining/CI-CD integration.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What role does a feature store play in ML systems?

A

It centralizes, versions, and serves consistent features for training and inference, reducing data leakage and ensuring reproducibility.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why is model registry important?

A

It stores model versions, metadata, performance metrics, and lineage, enabling reproducible deployments, auditing, and rollback if needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How would you design monitoring for a real-time fraud detection system?

A

Collect prediction + input logs, monitor latency and throughput, implement drift detection on inputs, track precision/recall using delayed ground truth, and trigger retraining when thresholds are exceeded.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the main stages of a SageMaker pipeline?

A
  1. Data preprocessing (Processing Jobs)
  2. Training (Training Jobs)
  3. Hyperparameter tuning (HPO Jobs)
  4. Model evaluation
  5. Model registration (Model Registry)
  6. Deployment (Endpoints/Batches)
  7. Monitoring (Model Monitor).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does SageMaker simplify ML workflow automation?

A

It provides managed services for data processing, distributed training, automatic scaling, experiment tracking, built-in algorithms, and integrates with CI/CD via SageMaker Pipelines and Step Functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is data versioning critical in ML?

A

It ensures reproducibility, traceability, and auditability of experiments—allowing you to link model artifacts to the exact dataset and features used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are common tools and approaches for data versioning?

A

DVC, LakeFS, MLflow with artifact tracking, Delta Lake for time travel, or AWS solutions like S3 versioning + SageMaker Lineage Tracking.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the bias–variance tradeoff

A

It describes the balance between underfitting and overfitting. High bias models are too simple and underfit, while high variance models are too complex and overfit. The goal is to minimize total error by balancing both.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Give an example of a high-bias algorithm and a high-variance algorithm.

A

High-bias: Linear regression on a non-linear dataset.
High-variance: Deep decision trees without pruning or regularization.

17
Q

How can you reduce variance without increasing bias too much?

A

Use regularization (L1/L2, dropout), ensemble methods (bagging, random forests), cross-validation, or more training data.

18
Q

How can you reduce bias in a model?

A

se a more complex model (e.g., deeper neural nets, more features), reduce regularization strength, or engineer more informative features.

19
Q

What is precision and what is recall?

A

Precision = TP / (TP + FP), the fraction of positive predictions that are correct.

Recall = TP / (TP + FN), the fraction of actual positives correctly identified.

20
Q

What is the precision–recall tradeoff?

A

Increasing precision usually decreases recall, and vice versa. Adjusting the decision threshold shifts the balance. The optimal point depends on the problem’s cost of false positives vs. false negatives.

21
Q

When would you prioritize recall over precision?

A

When missing a positive case is more costly than flagging false positives. Example: detecting cancer or fraud.

22
Q

When would you prioritize precision over recall?

A

When false positives are very costly or harmful. Example: recommending a financial loan approval or automated hiring filters.

23
Q

How does the PR curve differ from the ROC curve?

A

PR curve plots precision vs. recall; it’s more informative for imbalanced datasets. ROC curve plots TPR vs. FPR, but can be misleading when negatives vastly outnumber positives.

24
Q

What is the F1 score and why is it useful?

A

The harmonic mean of precision and recall (2 × P × R / (P + R)), useful when you want a balance and the class distribution is skewed.

25
Why do we use the harmonic mean in F1 score?
We use the harmonic mean (instead of the arithmetic mean) because it penalizes extreme imbalance between precision and recall. Arithmetic mean could give a deceptively high score if one metric is very large and the other is very small. Example: Precision = 1.0, Recall = 0.0 → Arithmetic mean = 0.5 (misleadingly high). Harmonic mean emphasizes the lower value — if either precision or recall is near zero, the F1 score will also be near zero. Same example: Precision = 1.0, Recall = 0.0 → F1 = 0.0 (correctly reflecting poor performance). So the harmonic mean ensures that a high F1 requires both precision and recall to be reasonably high, making it a better measure when balancing both.
26
How would you interpret coefficients of logistic regression for categorical and boolean variables?
Focus on interpreting odds ratios, the meaning of coefficients relative to a baseline for categorical variables, and how boolean inputs affect log-odds. Be clear about dummy variable encoding and interaction effects. Use real-world analogies if possible, like conversion rates or click-through probabilities.
27
Let’s say we want to build a model to predict booking prices. Between linear regression and random forest regression, which model would perform better and why?
Start by comparing assumptions—linear regression needs linear relationships, random forest doesn’t. Factor in interpretability vs. performance, training data size, and handling of outliers or skewed distributions. At Amazon, model choice often balances explainability (for stakeholders) with performance (for automated decisions).
28
Say you are given a dataset of perfectly linearly separable data. What would happen when you run logistic regression?
Tests edge-case awareness in optimization. Explain how logistic regression fails to converge in this case, as it endlessly increases weights to find a “better” separation. Discuss how regularization reintroduces a trade-off, allowing convergence. Useful for scenarios involving imbalanced or clean datasets—common in anomaly detection or QA tagging.
29
What is a K-means algorithm?
K-means is an unsupervised clustering algorithm that partitions data into k clusters. It iteratively assigns points to the nearest cluster centroid, then updates centroids as the mean of assigned points, until convergence. It minimizes within-cluster variance.
30
What is the difference between SVM and logistic regression?
Logistic regression is probabilistic and linear, optimizing log-loss to estimate class probabilities. SVM is margin-based, focusing on finding the maximum margin hyperplane. SVM can use kernels to model non-linear boundaries, while logistic regression remains linear unless features are engineered.
31
Describe normalization and Bayes’ rule.
Normalization rescales data to a common scale (e.g., min-max or z-score) to improve training stability. Bayes’ rule relates conditional probabilities: P(A|B) = [P(B|A) * P(A)] / P(B), useful in Bayesian models and Naïve Bayes classifiers.
32
Describe linear regression versus logistic regression.
Linear regression predicts continuous outputs using least squares. Logistic regression predicts probabilities of categorical outcomes using a sigmoid function and cross-entropy loss. Linear assumes Gaussian residuals; logistic assumes Bernoulli outcomes.
33
What kind of different loss functions do you know?
Regression: MSE, MAE, Huber loss. Classification: Cross-entropy, hinge loss, focal loss. Ranking: Pairwise loss, NDCG loss. Computer vision: IoU loss, Dice loss, perceptual loss. Regularization: L1, L2.
34
How do you measure the performance of computer vision models?
Use accuracy, precision/recall, F1, IoU (Intersection over Union), mAP (mean Average Precision), top-k accuracy, or confusion matrices depending on task (classification, detection, segmentation).
35
How do you deal with a troublesome dataset?
Diagnose issues (missing values, noise, outliers, imbalance), apply preprocessing (imputation, normalization, augmentation), perform feature engineering, and consult domain experts. For persistent issues, re-collect or refine labeling.
36
How do you deal with misrepresentative training data (imbalanced dataset, overfitting, explain how L1/L2 regularization work at an optimization level)?
Imbalanced data: resampling, class weights, anomaly detection framing. Overfitting: dropout, weight decay (L2), early stopping, data augmentation. L1 regularization (adds |w| term): promotes sparsity by pushing weights to zero. L2 regularization (adds w² term): shrinks weights smoothly, distributing importance. Both modify the optimization objective by penalizing large weights.