AWS MLE Flashcards

Question

Why do we use the harmonic mean in F1 score?

Answer 1

We use the harmonic mean (instead of the arithmetic mean) because it penalizes extreme imbalance between precision and recall. Arithmetic mean could give a deceptively high score if one metric is very large and the other is very small. Example: Precision = 1.0, Recall = 0.0 → Arithmetic mean = 0.5 (misleadingly high). Harmonic mean emphasizes the lower value — if either precision or recall is near zero, the F1 score will also be near zero. Same example: Precision = 1.0, Recall = 0.0 → F1 = 0.0 (correctly reflecting poor performance). So the harmonic mean ensures that a high F1 requires both precision and recall to be reasonably high, making it a better measure when balancing both.

Answer 2

Focus on interpreting odds ratios, the meaning of coefficients relative to a baseline for categorical variables, and how boolean inputs affect log-odds. Be clear about dummy variable encoding and interaction effects. Use real-world analogies if possible, like conversion rates or click-through probabilities.

Answer 3

Start by comparing assumptions—linear regression needs linear relationships, random forest doesn’t. Factor in interpretability vs. performance, training data size, and handling of outliers or skewed distributions. At Amazon, model choice often balances explainability (for stakeholders) with performance (for automated decisions).

Answer 4

Tests edge-case awareness in optimization. Explain how logistic regression fails to converge in this case, as it endlessly increases weights to find a “better” separation. Discuss how regularization reintroduces a trade-off, allowing convergence. Useful for scenarios involving imbalanced or clean datasets—common in anomaly detection or QA tagging.

Answer 5

K-means is an unsupervised clustering algorithm that partitions data into k clusters. It iteratively assigns points to the nearest cluster centroid, then updates centroids as the mean of assigned points, until convergence. It minimizes within-cluster variance.

Answer 6

Logistic regression is probabilistic and linear, optimizing log-loss to estimate class probabilities. SVM is margin-based, focusing on finding the maximum margin hyperplane. SVM can use kernels to model non-linear boundaries, while logistic regression remains linear unless features are engineered.

Answer 7

Normalization rescales data to a common scale (e.g., min-max or z-score) to improve training stability. Bayes’ rule relates conditional probabilities: P(A|B) = [P(B|A) * P(A)] / P(B), useful in Bayesian models and Naïve Bayes classifiers.

Answer 8

Linear regression predicts continuous outputs using least squares. Logistic regression predicts probabilities of categorical outcomes using a sigmoid function and cross-entropy loss. Linear assumes Gaussian residuals; logistic assumes Bernoulli outcomes.

Answer 9

Regression: MSE, MAE, Huber loss. Classification: Cross-entropy, hinge loss, focal loss. Ranking: Pairwise loss, NDCG loss. Computer vision: IoU loss, Dice loss, perceptual loss. Regularization: L1, L2.

Answer 10

Use accuracy, precision/recall, F1, IoU (Intersection over Union), mAP (mean Average Precision), top-k accuracy, or confusion matrices depending on task (classification, detection, segmentation).

Answer 11

Diagnose issues (missing values, noise, outliers, imbalance), apply preprocessing (imputation, normalization, augmentation), perform feature engineering, and consult domain experts. For persistent issues, re-collect or refine labeling.

Answer 12

Imbalanced data: resampling, class weights, anomaly detection framing. Overfitting: dropout, weight decay (L2), early stopping, data augmentation. L1 regularization (adds |w| term): promotes sparsity by pushing weights to zero. L2 regularization (adds w² term): shrinks weights smoothly, distributing importance. Both modify the optimization objective by penalizing large weights.

AWS MLE Flashcards

(36 cards)