What is the difference between Data Drift and Concept Drift?
Data Drift (Feature Drift) happens when the distribution of the input data changes over time (e.g., a demographic shift in your user base). Concept Drift happens when the underlying relationship between the inputs and the target variable changes (e.g., consumer purchasing behavior changes abruptly due to a pandemic).
What is Continuous Training (CT) in the context of MLOps?
While standard software relies on CI/CD (Continuous Integration/Continuous Deployment), MLOps adds CT. It is the automated process of evaluating model performance in production and automatically triggering a pipeline to retrain and deploy the model when performance degrades or new data arrives.
What is a Feature Store and why is it necessary?
A Feature Store is a centralized repository for storing, processing, and serving engineered features. It prevents Training-Serving Skew by ensuring the exact same feature transformation code is used during batch training and real-time inference, and it allows multiple teams to reuse features.
What is Training-Serving Skew?
A drop in a model’s performance when deployed, caused by a mismatch between the training environment and the production environment. Common causes include using different data pipelines (e.g., Pandas for training, Java for serving) or having subtle differences in how missing values are handled.
Explain “Shadow Deployment” (Shadow Mode) for ML models.
In a shadow deployment, a new model is deployed alongside the production model and receives the exact same live traffic. However, the new model’s predictions are not sent to the user; they are only logged. This allows you to evaluate its real-world performance with zero risk.
How does a Canary Deployment work?
A new model is deployed, and a small, controlled fraction of live traffic (e.g., 5%) is routed to it. If the model performs well and system metrics remain stable, the traffic allocation is gradually increased until it handles 100% of requests, replacing the old model.
What is the purpose of a Model Registry?
A Model Registry is a centralized tracking system (like a catalog) for managing the lifecycle of ML models. It tracks model lineage, versions, hyperparameters, artifacts, and lifecycle stages (e.g., “Staging”, “Production”, “Archived”). MLflow is a popular tool for this.
What is DVC (Data Version Control) and how is it different from Git?
Git is designed for versioning lightweight code, not massive datasets. DVC versions data, models, and intermediate files. It works alongside Git by storing lightweight metadata pointers in your Git repository, while storing the actual heavy data files in remote storage like AWS S3 or Google Cloud Storage.
Why is monitoring an ML model harder than monitoring traditional software?
Traditional software monitoring focuses on operational metrics (CPU usage, memory, latency, 500 errors). ML monitoring requires tracking predictive performance. This means monitoring the statistical distributions of incoming data to catch drift, and eventually joining predictions with delayed ground-truth labels to calculate accuracy/recall in production.
Explain A/B Testing in the context of model deployment.
Two versions of a model (Model A and Model B) are deployed simultaneously, and live traffic is split evenly (or via a specific ratio) between them. The predictions are served to the users, and business metrics (like click-through rate or revenue) are monitored to determine which model drives better real-world outcomes.