What is regularization in machine learning and why is it important?
Regularization is the addition of a penalty term to a model’s loss function to discourage overfitting by limiting model complexity. Common methods include L1 (lasso) for feature selection, L2 (ridge) for weight shrinkage, and dropout in neural networks.
When would you prefer L1 over L2 regularization?
Use L1 when feature sparsity and interpretability are desired, since it drives some coefficients exactly to zero. L2 is better for stability and when all features contribute small effects.
What is model drift and what types are there?
Model drift is the degradation of model performance over time as data distribution changes. Two main types:
Data drift (covariate shift): Input feature distribution changes.
Concept drift: Relationship between input and target changes.
How can you detect and mitigate model drift?
Monitor prediction quality and statistical distributions of inputs/outputs, set alerts for significant changes, retrain on updated data, use online learning or scheduled retraining pipelines.
What are common strategies for hyperparameter tuning?
Grid search, random search, Bayesian optimization, Hyperband, population-based training. Random search often outperforms grid for high-dimensional spaces.
How does Bayesian optimization improve tuning efficiency?
It builds a surrogate model (like Gaussian Process) of the objective function, then uses acquisition functions (e.g., expected improvement) to balance exploration vs. exploitation, reducing number of evaluations needed.
What are the key components of a scalable ML system design?
What role does a feature store play in ML systems?
It centralizes, versions, and serves consistent features for training and inference, reducing data leakage and ensuring reproducibility.
Why is model registry important?
It stores model versions, metadata, performance metrics, and lineage, enabling reproducible deployments, auditing, and rollback if needed.
How would you design monitoring for a real-time fraud detection system?
Collect prediction + input logs, monitor latency and throughput, implement drift detection on inputs, track precision/recall using delayed ground truth, and trigger retraining when thresholds are exceeded.
What are the main stages of a SageMaker pipeline?
How does SageMaker simplify ML workflow automation?
It provides managed services for data processing, distributed training, automatic scaling, experiment tracking, built-in algorithms, and integrates with CI/CD via SageMaker Pipelines and Step Functions.
Why is data versioning critical in ML?
It ensures reproducibility, traceability, and auditability of experiments—allowing you to link model artifacts to the exact dataset and features used.
What are common tools and approaches for data versioning?
DVC, LakeFS, MLflow with artifact tracking, Delta Lake for time travel, or AWS solutions like S3 versioning + SageMaker Lineage Tracking.
What is the bias–variance tradeoff
It describes the balance between underfitting and overfitting. High bias models are too simple and underfit, while high variance models are too complex and overfit. The goal is to minimize total error by balancing both.
Give an example of a high-bias algorithm and a high-variance algorithm.
High-bias: Linear regression on a non-linear dataset.
High-variance: Deep decision trees without pruning or regularization.
How can you reduce variance without increasing bias too much?
Use regularization (L1/L2, dropout), ensemble methods (bagging, random forests), cross-validation, or more training data.
How can you reduce bias in a model?
se a more complex model (e.g., deeper neural nets, more features), reduce regularization strength, or engineer more informative features.
What is precision and what is recall?
Precision = TP / (TP + FP), the fraction of positive predictions that are correct.
Recall = TP / (TP + FN), the fraction of actual positives correctly identified.
What is the precision–recall tradeoff?
Increasing precision usually decreases recall, and vice versa. Adjusting the decision threshold shifts the balance. The optimal point depends on the problem’s cost of false positives vs. false negatives.
When would you prioritize recall over precision?
When missing a positive case is more costly than flagging false positives. Example: detecting cancer or fraud.
When would you prioritize precision over recall?
When false positives are very costly or harmful. Example: recommending a financial loan approval or automated hiring filters.
How does the PR curve differ from the ROC curve?
PR curve plots precision vs. recall; it’s more informative for imbalanced datasets. ROC curve plots TPR vs. FPR, but can be misleading when negatives vastly outnumber positives.
What is the F1 score and why is it useful?
The harmonic mean of precision and recall (2 × P × R / (P + R)), useful when you want a balance and the class distribution is skewed.