What is CRISP-DM?
cross-industry standard process for data mining
Why is it important to monitor model performance after deployment in model engineering?
It is important to track and evaluate model performance against dynamic data, and ensure the model remains relevant.
Real world data are dynamic and changing over time.
It also helps to identify model performance decay and give an early notification to the data science team in order to rebuild the model.
What does five-number summary refer to in the context of data ingestion and EDA?
for numeric data
Please explain what makes TDSP an Agile-based methodology.
It is an iterative, adaptive and incremental process that allows the data science team looping back to previous steps at any stage of the project.
What are benchmark models used for in the context of model engineering?
baseline reference point for demonstrating to stakeholders whether the additional modeling process adds value, and whether the newly built model is worth putting into deployment
What does HLP stand for in the context of model engineering?
Human Level Performance
the accuracy or error rate when the task is handled by human workforce
What is TFX in the context of MLOps?
Tensorflow Extended
What does TFT stand for in the context of MLOps?
Tensorflow transform
w/ data preprocessing components that can be embedded into a TFX project
What does TFMA stand for in the context of MLOps?
TensorFlow model analysis
What are the two main types of deployment strategies when implementing scoring models through API serve?
batch scoring: model prediction and inference executed for large numbers of data points, triggered by a schedule or a specific event
real-time scoring: predictions served almost instantly after requests arrive from clients
What does DVC stand for in the context of model engineering?
data version control
List examples of what is included in model metadata.
information of model name, version ID, model registry location,
model input, and output directories
Please explain why model versioning is important to the model building process.
It ensures that changes to the models are tracked properly, and previously generated models are reproducible. It helps to increase the efficiency of team collaboration and allows continuous model experimentations and improvements.
What does Eclat/ECLAT stand for in the context of frequent itemset mining (association rule mining)?
Both expansions appear in the literature.
A depth-first search algorithm that uses vertical data format and set intersection to mine frequent itemsets efficiently.
What does FP stand for in the FP-growth algorithm?
frequent pattern
In the first pass, the algorithm counts item frequencies and stores them in a header table. In the second pass, it builds a compressed FP-tree (a prefix tree) by inserting transactions in descending frequency order. Frequent itemsets are then mined by recursively constructing conditional FP-trees — without candidate generation (unlike Apriori). Proposed by Han, Pei & Yin (2000).
What does SARSA stand for in the context of reinforcement learning?
state–action–reward–state–action
an algorithm for learning a Markov decision process policy
Please explain how overfitting and underfitting are related to the concepts of bias and variance.
Overfitting results in high variance models, with low bias in training data and high bias in test data; underfitting results in low variance models, with high bias in both training and test data.
What does LOOCV stand for in model engineering?
Leave One Out Cross Validation
Please explain the advantages and disadvantages of using cross validation.
Cross validation has the advantage of generating more reliable model evaluation because it performs multiple iterations of training and testing on different portions of sample data. However, it has the disadvantage of being time-consuming.
Which cross validation technique works best with classification models with unbalanced labels?
stratified K-fold cross validation
How many iterations of model testing are involved in a K-Fold Cross Validation on a dataset with N observations?
K
Please explain the reasons for building interpretable models.
Interpretable models help the debugging and reasoning process of model predictions. They yield more effectiveness in communication with stakeholders with different requirements. More importantly, building interpretable models helps to build trust with end users and decision makers by providing transparency and visibility of the model’s internal process.
What does CAM stand for in the context of CNNs?
class activation map
What does LIME stand for in the context of surrogate models in model engineering?
local interpretable model-agnostic explanations