AI & ML Flashcards

(201 cards)

1
Q

What is the difference between Supervised and Unsupervised Machine Learning?

A

Supervised Learning - Utilises labeled input and output data.

Unsupervised Learning - Discover hidden patterns in data without any human provided labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some of the benefits and drawbacks of Supervised and Unsupervised machine learning when compared to one another?

A

Supervised Learning:
Tend to be more accurate than unsupervised models
Require historical data, or humans to manually label data.

Unsupervised Learning:
Do not predict, they simply group data together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Within supervised machine learning models, what are features vs labels?

A

Supervised Machine learning models “learn” the association between known features and unknown labels.

Each column of data that will help us determine the outcome (win or loss for a tournament game) is called a feature.

The column of data that you are trying to predict is called the label. Machine learning models “learn” the association between features to predict the outcome of a label.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If we were training a classification supervised machine learning model (e.g. a logistic regression) on historic team sports results to predict the outcome of future games, why should we NOT use the points scored (win_pts or lose_pts) as a feature in our training dataset, even though we have the data available?

A

This feature is only available at the END of the game and for future games we are making predictions before a game begins.

This is called data leakage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define Responsible AI

A

The development and use of AI in a way that prioritises ethical considerations, fairness, accountability, safety and transparency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a TPU?

A

A Tensor Processing Unit (TPU) is Google’s custom developed chips for application-specific integrated circuits, allowing AI workloads (such as training and inference) to scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 4 storage classes in GCS?

A

1) Standard Storage - Hot data, accessed in real-time.

2) Nearline Storage - Once per month

3) Coldline Storage - Once every 90 days

4) Archive Storage - Once a year

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

On Cloud Storage, which data storage class is best for storing data that needs to be accessed less than once a year?

A

Archive Storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 4 products that should be considered in the Data Ingestion & Process phase of the Data-to-AI Workflow?

A

Pub Sub
Data Flow
Data Proc
Cloud Data Fusion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 6 products that should be considered in the Data Storage phase of the Data-to-AI Workflow?

A

Cloud Storage
BigQuery
Cloud SQL
Cloud Spanner
Cloud Bigtable
Cloud Firestore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 2 products that should be considered in the Data Analytics phase of the Data-to-AI Workflow?

A

BigQuery - Fully Managed Data Warehouse solution
Looker - BI layer for visualising and governing data across your organisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What makes a Machine Learning model a Deep Learning model?

A

Deep Learning is a subset of machine learning that adds layers in between input data and output results to make a machine learn at more depth. This is usually in the form of a neural-network architecture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

You want to use machine learning to discover the underlying pattern and group a collection of unlabeled photos into different sets. Which should you use?

A

Unsupervised Learning - Cluster Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which SQL command would you use to create an ML model in BigQuery ML?

A

CREATE OR REPLACE MODEL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the Machine Learning / MLOPs workflow as though you were comparing it to running a restaurant.

A

1) Data Preparation - Prepare your ingredients
a) Data Import - Batch vs Streaming. Structured vs Unstructured.
b) Feature Engineering - Chopping the onions, peeling the carrots etc. before you start cooking.

2) Model Development - Experiment with different recipes. Loop: Train the Model –> Evaluate the model

3) Model Serving - Finalise and iterate on the menu to meet customer’s changing needs
a) Deploy the Model
b) Monitor the Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the 3 stages of maturity in MLOps?

A

MLOps Level 0: Manual Process
At this initial level, the workflow for building and deploying models is entirely manual, script-driven, and interactive. It is characterized by a disconnect between data scientists and operations, infrequent release iterations, and a lack of active performance monitoring or CI/CD practices.

​MLOps Level 1: ML Pipeline Automation
This level focuses on performing continuous training (CT) of the model by automating the machine learning pipeline itself. It enables rapid experimentation and the continuous delivery of fresh prediction models trained on new live data, often employing triggers like data validation or drift detection.

​MLOps Level 2: CI/CD Pipeline Automation
The most mature level introduces a robust CI/CD system to automatically test and deploy new implementations of the ML pipelines themselves, not just the models. This allows organizations to reliably update pipeline architecture and code in production, enabling them to cope quickly with changing data and business environments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the 4 types of Machine Learning options for model development and usage via Google Cloud?

A

1) Pre-Trained Models
2) BigQuery ML
3) AutoML
4) Custom Training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the four phases of the AutoML Pipeline?

A

Phase 1 - Auto data pre-processing using Tensorflow Transform

Phase 2 - Architecture Search, Selection & Tuning

Phase 3 - Cross Validation & Bagging Ensemble

Phase 4 - Deploy & Predict

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What 2 critical technologies support auto search and architecture selection for AutoML?

A

Neural Architecture Search - Helps search the best models and tune the parameters automatically.

Transfer Learning - AutoML has already trained many different models with large amounts of data. These trained models can be used as foundational models to reach higher accuracy with much less data and computational training time. This allows you to train models with smaller datasets by leveraging inherent knowledge within models that were trained on larger datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the purpose of Phase 3: Bagging Ensemble within AutoML?

A

AutoML does not rely on one single model, but on the top number of models that were selected during phase 2. The number of models depends on the training budget, but is typically around ten.

The assembly can be as simple as averaging the predictions of the top number of models relying on multiple top models instead of one greatly improves the accuracy of prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

When would you choose to use Colab Enterprise over Vertex Workbench?

A

1) When you want to avoid managing compute.

2) When your logic can be housed within a single notebook.

3) Collaboration - When you don’t want to worry about utilising Git, as there are built in version control and sharing capabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When would you choose to use Vertex Workbench over Colab Enterprise?

A

1) When you’re migrating an existing Jupyter notebook from your local environment to the cloud.

2) Vertex Workbench is better for complex projects that span over multiple files and directories.

3) When you need native support for GitHub.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the benefit of using Vertex AI Workbench over a Jupyter Notebook run locally?

A

1) Scalability and Performance - Releasing yourself from the resource constraints of your local machine

2) Collaboration & Reproducibility - Shared Environment

3) Scheduled Executions

4) Seamless Integration with GCP Services

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What happens when you execute a cell on Colab Enterprise?

A

Colab Enterprise connects to an Python kernel on a runtime, the code is executed by that kernel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
You have several complex projects spanning multiple files, with complex dependencies. You also need to collaborate and share notebooks. Which notebook solution offers the best option?
Vertex Workbench
26
You have an existing Workbench instance and want to add a GPU to the instance. How can you modify a Workbench instance configuration after it has been created?
1) Stop the instance 2) Modify the hardware configuration 3) Click the submit button.
27
Colab Enterprise provides a default runtime and runs your code on it. To configure a runtime for specific needs, you must:
Use Runtime Templates. Create a runtime template with the configuration that you need, then create a runtime based on that template, then connect to the runtime from your notebook and run your code.
28
What are the 3 primary benefits of VertexAI?
1) Fast experimentation 2) Accelerated deployment 3) Simplified model management to achieve your ML goals.
29
What are the primary categories of Supervised Learning?
Regression - Continuous labels Classification - Discrete (Labels / Categories)
30
What are some different types of Regression models?
Linear Regression - Predicts a continuous output from a continuous input by attempting to model the line of best fit. Polynomial Regression - Captures non-linear relationships between features and labels within a continuous dataset. Time Series Regression / AutoRegressive Integrated Moving Average (ARIMA) - Predicts future values in a time-dependent data set: often employed to forecast future values based on past observations.
31
What are some use cases of Time Series Regression / AutoRegressive Integrated Moving Average (ARIMA)?
Sales Forecasting, Inventory Forecasting, Stock Market Analysis
32
What are some different types of Classification models?
Logistic Regression - Although Logistic Regression uses regression techniques, the outcome is actually binary classification. Random Forest & Gradient Boosted Trees - Uses a concept called collective intelligence, which builds a bunch of independent decision trees, and aggregates their result into a single result, usually a score. K-Nearest Neighbors (KNN) - Groups data points based on nearest neighbour, essential K-Means without centroids. KNN is useful for pre-processing and populating labels on data points that have not yet been classified before other Machine Learning techniques are applied. Support Vector Machine (SVM) - Attempts to draw a line between classes in high-dimensionality datasets with no labels.
33
What is the classification threshold within Logistic Regression?
A human-set threshold that determines where on the logistic regression curve the classifier will predict a positive value in favour of a negative value.
34
What are the primary categories of Unsupervised Learning?
Clustering - Involves grouping data points together so that objects in the same group (cluster) are more similar to each other than to those in other groups. Association - Attempts to learn the relationship between variables and objects within a dataset. Dimensionality Reduction / Matrix Factorisation - Dimensionality reduction is an unsupervised learning technique that reduces the number of features, or dimensions, in a dataset for better visualisation.
35
What use case is Matrix Factorisation good for?
Recommendations
36
Describe how K-Means Clustering works?
1) Place centroids randomly within the data space. 2) Measure the distance between each data point and each cluster center using the Euclidean Distance. 3) Assign each data point to its nearest centroid. 4) Recalculate the centroids by taking the mean of all new data values within their cluster. 5) Keep repeating until all data points fall in the same cluster as the previous epoch.
37
What are some use cases for K-Means Clustering?
Market & Customer Segmentation, Computer Vision, Fraud Detection
38
What are some of the use cases for Association Training?
Market basket analysis, Stock Analysis, anomaly detection, Medical diagnosis
39
What are some of the use cases for Dimensionality Reduction?
Data Visualisation, Image Compression & Noise Filtering
40
What is Topic Modelling?
Topic modelling is a unsupervised learning technique for discovering abstract "topics" hidden within a large collection of text documents.
41
What is Tensorflow?
TensorFlow is an end-to-end, open-source platform for machine learning that provides a comprehensive ecosystem for building and deploying models at scale. It is renowned for its production readiness.
42
Match the three types of data ingest with an appropriate source of training data. 1) Streaming batch (Dataflow), structured batch (BigQuery), stochastic (App Engine) 2) Streaming (BigQuery), structured batch (Pub/Sub), unstructured batch (Cloud Storage) 3) Streaming (Pub/Sub), structured batch (BigQuery), unstructured batch (Cloud Storage)
Streaming (Pub/Sub), structured batch (BigQuery), unstructured batch (Cloud Storage)
43
What are training checkpoints and what do they capture within Vertex?
Training checkpoints are snapshots of a model's state at specific points during the training process. Checkpoints enable model state to be persisted even if a training job is interrupted / fails. Checkpoints capture essential information like model weights, optimizer states, and the current training epoch or step.
44
What is a feature in Machine Learning?
A feature refers to a factor that contributes to the prediction. This is like an independent variable in statistics, or a column in a table.
45
What is the difference between Univariate Analysis and Bivariate Analysis within data pre-processsing?
Univariate Analysis - Each of the features is analysed independently. Bivariate Analysis - We compare 2 features to identify correlations.
46
How would you go about removing skewness from your feature data distribution within feature engineering?
By applying transformations such as log transformation and normalisation, you can transform skewed distribution to normal distribution by removing outliers.
47
How would you go about removing skewness within your target variable?
By using oversampling. Synthetic Minority Oversampling Technique (SMOTE, undersampling or oversampling. SMOTE does not create duplicates. Instead, it uses linear interpolation to create new data points between existing minority samples. It relies on the k-Nearest Neighbors (k-NN) algorithm.
48
When would you use Log Scaling over Scaling in feature engineering?
Log scaling is used when some of the data samples are in the power of law, or very large (left-leaning skewed data distribution), for example annual income.
49
How do you calculate Z-Score when using it for scaling in feature engineering?
Scaled value = (value − mean) / stddev
50
What is clipping within feature engineering?
Capping all feature values above or below a certain fixed value to remove outliers.
51
What are Scaling, Log-Scaling, Z-Score and Clipping all forms of within Feature Engineering?
Normalisation
52
What are the 2 reasons you would perform normalisation on a feature?
1) Numeric features that have distinctly different ranges (for example, age and income) 2) Numeric features that cover a wide range such as a city
53
What is undersampling / downsampling within Feature Engineering?
This technique involves reducing the number of examples in the majority class to match the size of the minority class to reduce bias. If you have 1,000 "Normal" transactions and 100 "Fraud" transactions: 1) You keep all 100 "Fraud" cases. 2) You randomly select only 100 "Normal" cases from the original 1,000. 3) You discard the remaining 900 "Normal" cases. 4) Result: A balanced dataset of 200 total rows.
54
What is a downside of undersampling / downsampling within Feature Engineering?
Data Loss: You are throwing away potentially valuable information. The model might miss important patterns present in the discarded examples.
55
What is oversampling within Feature Engineering?
Balancing an imbalance dataset by interpolating features for a minority dataset.
56
What is the number 1 mistake people make when feature engineering?
The number one mistake beginners make with feature engineering is applying preprocessing techniques (normalising features, removing outliers etc.) before splitting the data. For example, if you oversample the entire dataset first, and then split it into Training and Testing sets, the synthetic samples created in the training set will be based on—and therefore extremely similar to, or even near-identical to—original data points that end up in the test set. This is called data leakage and impacts model eval and production quality.
57
What is Upweighting within Feature Engineering?
Also known as Class Weighting or Cost-Sensitive Learning, this technique does not change the dataset size. Instead, it tells the model that errors made on the minority class are more expensive than errors made on the majority class. This can be useful to remove bias when training on a unbalanced dataset.
58
What is Dimensionality Reduction in Feature Extraction?
Dimensionality Reduction is the process by which an initial set of raw data is reduced to more manageable groups for training. In technical terms, you want to reduce the dimension of your feature space. By reducing the dimension of your feature space, you have fewer relationships between variables to consider, and you are less likely to overfit your model.
59
What is One-Hot Encoding and give an example?
Converting categorical data into a numerical format that can be fed into machine learning algorithms to improve prediction accuracy. For example, if you have 3 employee IDs: 101, 113 and 129, you would split the employee ID column into 3 separate features during feature engineering and represent the employee who was active for that line of data with a 1 and the rest with a 0. This increases the sparsity of your dataset.
60
What is Feature Hashing?
Feature hashing is a technique in Feature Engineering used to turn categorical features into a vector of a fixed size. It is essentially a space-efficient alternative to One-Hot Encoding, so useful for categories with lots of values.
61
What is bucketised within Feature Engineering?
Convert continuous numeric data into discrete intervals (categorical string features), which can then be one-hot encoded. For example, you could set age ranges for 18-25, 26-30, 30-40 etc.
62
What are the 4 characteristics to look out for with regards to Good Feature Identification?
1) Be related to the objective 2) Be known at prediction-time 3) Be numeric with meaningful magnitude 4) Have enough examples
63
In Feature Engineering, what do we mean by “Collapsing the Long-Tail”?
Group categories in the long tail (bucketisation) where data variables aren’t continuous to avoid outliers overfitting the model.
64
What is Vertex AI Feature Store?
A central store to aggregate, manage, serve and share features. These features are stored as a time-series, allowing traceability and searchability of features over time.
65
What are the 4 benefits of Feature Store?
Features are shareable for training and serving. Features are reusable, reducing duplicate effort. Features are scalable - Fully managed, and serve at low-latency for training Mitigate Training-Serving Skew - Track & monitor for drift between training and serving.
66
What is a Feature View within Vertex Feature Store?
A FeatureView is a logical grouping of features from a BigQuery table or view that you want to serve together. For example, if features are spread across multiple entity types, you can retrieve them in a single request that you can feed to a machine learning or batch prediction request. For the exam, this may be called a EntityType as this is what it was previously known as.
67
What is an Offline Store in Vertex Feature Store?
Your BigQuery table or view is the offline store. This eliminates data duplication and allows you to use the full power of BigQuery for feature engineering, analysis, and batch serving.
68
How is Offline Serving used for training models via Vertex Feature Store?
Offline serving is done directly from your BigQuery tables using standard BigQuery APIs and capabilities. This provides more flexibility and control over data access.
69
What are the 2 options for Online Store in Vertex Feature Store?
1) Optimised online serving (for ultra-low latency scenarios and embedding management). 2) Bigtable online serving (for large data volumes, similar to the legacy online store)
70
What are the 4 levels of Feature Store hierarchy?
Level 1: Data Source - Traditionally BigQuery, especially for Offline Serving for Model Training Level 2: Feature Registry - For management & governance. Feature registry contains feature groups, which corresponds to a BigQuery source table or view containing feature data. Level 3: Online Feature Store - Stores a copy of the latest feature values for low-latency serving. Level 4: Feature View - Configures which features from your BigQuery source should be regularly synchronized and made available in a specific Feature Online Store.
71
How does Feature Store maintain traceability of features over time?
Feature Store uses a time series data model to store a series of values for features. This model enables Feature Store to maintain feature values as they change over time.
72
For maximum speed, how to we want to store data on BigQuery for a Vertex AI workload?
For maximum speed, it's better to store materialized data instead of using views or subqueries for training data.
73
What is a Baseline Model within BigQuery ML and how do you create one?
A baseline model is a solution to a problem without applying any machine learning techniques. You can get started within a baseline model with a simple “CREATE OR REPLACE MODEL”
74
When training a model in BigQueryML, why do we need to convert numeric values (such as DayofWeek, or employee ID) to string before training?
BQML by default assumes that numbers are numeric features, and strings are categorical features. The model (e.g. Neural Network) will automatically treat any integer as a numerical value rather than a categorical value, meaning it will hold distinct meaningful magnitude rather than one-hot encoding.
75
What is the tradeoff between Static and Dynamic Training?
Static training is easier to build and test, but is likely to become stale quickly in high data drift environments. Dynamic is harder to build and test but will adapt to changes as you’re constantly retraining the model based on live usage data.
76
What’s the primary question you need to ask when making the decision over whether to embrace Static or Dynamic training?
Do the model features change like Science (slowly - Static) or fashion (quickly - dynamic)?
77
Which Google Vertex AI Service provides a toolkit to automate, monitor, and govern machine learning systems by orchestrating the workflow in a serverless manner?
Vertex AI Pipelines
78
Which type of logging should be enabled in Vertex AI Online Prediction that logs the stderr and stdout streams from your prediction nodes to Cloud Logging and can be useful for debugging?
Container Logging - Specifically designed to log the standard output (stdout) and standard error (stderr) streams from the containers running your model on the prediction nodes.
79
Vertex AI has a unified data preparation tool that supports image, tabular, text, and video content. Where are uploaded datasets stored in Vertex AI?
A Google Cloud Storage bucket that acts as an input for both AutoML and custom training jobs.
80
What are the three primary measurements for classification models in Vertex AI?
1) Confusion Matrix - Recall vs Precision 2) Precision/Recall Curve 3) Feature Importance - Bar chart to illustrate the feature attribution to a prediction
81
Why does Model Evaluation matter?
Performance - Assessing accuracy and alignment to business objectives. Generalisation - Ensuring the model works on new, unseen data and isn’t overfitting. Model Selection - When there are multiple models to choose from, we use evaluation to select how much we rely on each model to solve a problem. Improvement - Performance can be tracked after deployment to identify when retraining the model is necessary.
82
Where can Model Evaluation go wrong?
Overfitting - a model performs exceptionally well on its specific training data, but struggles to generalise with new unseen data. Data Validation & Splitting, such as Stratified sampling and cross-validation, can be used to mitigate this. Data or Concept Drift - when the distribution of real-world data changes over time. Continuous Monitoring and Deployment can be used to mitigate drift. Metric Choice - Relying on one metric alone, or metrics that do not relate to the project goals, can make evaluation a bottle neck.
83
What is Tensorboard Profiler?
Identify performance bottlenecks and optimize hardware resource utilization across CPUs, GPUs, and TPUs.
83
What is the difference between Training, test and Validation data within Machine Learning?
Training data is used to fit the model's initial parameters Validation data is used to tune hyperparameters (such as during back-propegation in DNNs) Test Data is reserved for the final, unbiased evaluation of performance on unseen examples
84
Define the quadrants of a Confusion Matrix in model evaluation?
True Positive - The model correctly predicted the positive outcome False Positive (Type 1 Error) - The model incorrectly predicted the positive outcome, when the outcome should have been negative False Negative (Type 2 Error) - The model incorrectly predicted the negative outcome, when the outcome should have been positive True Negative - The model correctly predicted the negative outcome
85
What are some metrics used to evaluation Regression Models?
Mean Absolute Error (MAE) - What’s the average distance from the line of best fit. Mean Squared Error (MSE) - Punish larger errors of MAE. R2 Score - Determines how well the model predicts the actual data.An R2 of 1 means the model perfectly fits the data, while an R2 of 0 indicates the model explains none of the variability around the mean. For example, a R2 value of 0.85 for exam scores might indicate that “hours studied” has a strong relationship to final score.
86
What is R2 within model evaluation?
R2 represents the proportion of the variance in the label that can be predicted from the features. An R2 of 1 means the model perfectly fits the data, while an R2 of 0 indicates the model explains none of the variability around the mean. A negative R2 means the model fits worse than using the average.
87
What is Precision in model evaluation and how would you calculate it?
“What percentage of everything we caught in the ocean were fish?” A metric for classification models that answers the following question: Out of all the labels marked as positive by the model, how many were correct? True Positive / True Positive + False Positive
88
What is Recall in model evaluation and how would you calculate it?
“What percentage of fish in the entire ocean did we catch?” A metric for classification models that answers the following question: Out of all the possible positive labels, how many did the model correctly identify? True Positive / True Positive + False Negative
89
What are the different goals of precision and recall in model evaluation?
Precision and Recall are often a tradeoff, and given your use case you may wish to optimise for one or the other. Consider the binary classification use case of Gmail separating mails “Spam” and “Not Spam”. If the goal is to catch as many emails as possible, then it may want to prioritise Recall. In contrast, if the goal is to only catch the messages that are DEFINITELY spam, then it may want to prioritse precision.
90
What does the Confidence Threshold determine on the Precision/Recall curve when evaluating a model?
The confidence threshold determines how a ML model counts the positive cases. A higher threshold increases the precision, but decreases recall. A lower threshold decreases the precision, but increases recall. You can manually adjust the threshold to observe its impact on precision and recall and find the best tradeoff point between the two to meet your business needs.
91
How do you increase precision via the Classification Threshold?
If you raise the Classification Threshold, then it increases precision. For example, in detecting Spam where Spam is your positive classifier, you would increase the classification threshold so that more emails get identified as “Not Spam”. Opposingly, if you want to increase Recall, you would lower the Classification Threshold.
92
What is the F1 score within Model Evaluation?
A measure of the accuracy of classification models. The f1 score is the harmonic average of the precision and recall. An f1 score's best value is 1. The worst value is 0.
93
What is Accuracy within Model Evaluation and how would you calculate it?
A metric for classification models that measures the percentage of all predictions (positive & negative) that the model got correct. True Positive + True Negative / TP + FP + TN + FN
94
What does AUC-ROC stand for and what does it do?
Area Under Curve - Receiver Operating Characteristic Plots the True Positive Rate (Recall) against the False Positive Rate. AUC-ROC measures: How well does the model separate the two classes?
95
What does AUC-PR stand for and what does it do?
Area Under Curve - Precision-Recall Plots Precision against Recall. AUC-PR measure: How well does the model find the positives without false alarms?
96
Which stage of the Machine Learning lifecycle is Feature Importance used?
Explainability
97
When should you use AUC-PR vs AUC-ROC within evaluation of classification models?
Use AUC-ROC when your classes are balanced. E.g. You care equally about positive and negative classes. Use AUC-PR when you have a "needle in a haystack" problem (highly imbalanced data). For example, spam filtering will have a ratio of 1:1000 positive to negative ratio, making AUC-PR the preferred option.
98
What can the feature importance be used for when evaluating your trained model?
The feature importance values could be used to help you improve your model and have more confidence in its predictions. You might decide to remove the least important features next time you train a model or to combine two of the more significant features into a feature cross to see if this improves model performance.
99
Within BigQuery ML, what is does the ML.EVALUATE function do?
The ML.EVALUATE function calculates evaluation metrics against your model type, given a dataset you pass to it.
100
What are Feature Attributions within Explainable AI?
Feature attributions are an explainability method that indicate how much each input feature contributed to your model’s predictions and to the model’s overall predictive power.
101
What is the difference between drift and skew within Machine Learning?
Drift is a function of time (the world changes). For example changing consumer behaviors. Skew is a function of data composition (the representation is distorted). For example, you train a self-driving car in california, but it fails in London because it needs to work on the other side of the road and with different road signs.
102
What are TFRecords?
TFRecords are primarily used to optimize the data input pipeline in machine learning models. They are designed to solve the "I/O bottleneck"—the problem where your powerful GPU or TPU sits idle, waiting for the hard drive to find and read thousands of individual files. You host these in GCS before loading them into a training pipeline. TFRecords are effectively "containers" that hold serialized data. While the outer shell is structured (following a Protocol Buffer format), the content inside can be raw binary blobs, such as JPEG or PNG data.
103
What are the 3 most popular Feature Attribution methods within Vertex AI and when should you use them?
1) Shapley (SHAP): Classification and regression on tabular data 2) Integrated Gradients (IG): Classification and regression on tabular data. Classification on image data 3) XRAI (eXplanation with Ranked Area Integrals): Classification on image data
104
What are the different ways you can measure data bias and fairness within Explainable AI?
1) Requesting feature attributions as part of your prediction request. 2) Using the What-If Tool
105
When calculating loss, when should you use Categorical Cross-Entropy vs Sparse Categorical Cross-Entropy?
Mathematically, they calculate the exact same loss value. The only difference is how you feed the ground truth labels into the function. Use Categorical Cross-Entropy if your labels are one-hot encoded vectors (e.g., [0, 1, 0]). Use Sparse Categorical Cross-Entropy if your labels are provided as simple integers (e.g., 1, 3, 14), which is generally the more memory-efficient option.
106
What is feature-cross in feature engineering?
Creating a synthetic feature by combining two variables into a co-dependent feature. For example, hour of day and day of week might be two separate features that have a large impact on a demand forecast model. Therefore you may wish to feature cross them into monday-9am, thursday-5pm etc.
107
What operation does feature-crossing use?
Multiplication. E.g: [A X B]: a feature cross formed by multiplying the values of two features. [A x B x C x D x E]: a feature cross formed by multiplying the values of five features. [A x A]: a feature cross formed by squaring a single feature.
108
What is the difference between Spatial and Temporal Functions in Feature Engineering? What’s a good analogy?
Spatial Functions deal with space and geography, whereas temporal functions deal with time-series data. Think of it like a football game: Spatial analysis would look at a single moment in time. You would analyze the positions of all the players on the field to understand the team's formation, how far apart they are, and which players are near the ball. This is a snapshot in space. Temporal analysis would look at a single player over the entire game. You would analyze their movement, their speed over time, and the sequence of their actions (e.g., a pass followed by a run). This is a progression over time.
109
A hospital uses the machine learning technology of Google to help pre-diagnose cancer by feeding historical patient medical data to the model. The goal is to identify as many potential cases as possible. Which metric should the model focus on?
Recall
110
A farm uses the machine learning technology of Google to detect defective apples in their crop, like those with irregular sizes or scratches. The goal is to identify only the apples that are actually bad so that no good apples are wasted. Which metric should the model focus on?
Precision
111
What are the two frameworks supported by Vertex AI Pipelines?
Kubeflow Pipelines (KFP) TensorFlow Extended (TFX)
112
What TensorFlow Extended Libraries are there for supporting each stage of the ML pipeline?v
Pre-Processing - Tensorflow Data Validation Feature Engineering -, Tensorflow Transform Model Evaluation - Tensorflow Model Analysis Serving - Tensorflow Serving
113
What is the difference between the Primary Goals of Vertex AI Pipelines and Ray on Vertex AI?
Vertex AI Pipelines - ML Workflow Orchestration & Automation (e.g. MLOps). This is used for the entire ML Lifecycle steps (data prep, train, deploy). You can view it as the Assembly Line in a factory. Ray on Vertex AI - Distributed Python & ML Compute. This is used for computationally intensive tasks WITHIN the ML lifecycle defined by Vertex AI Pipelines. You can view it as a single powerful multi-worker station on the assembly line.
114
What are the different types of components within Vertex AI Pipelines?
Function Components - Simply write a python function and add the @component decorator Container-Based Components - Anything that can be packed into a Docker container can be orchestrated.
115
What are parameters used for in Vertex AI Pipelines?
Passing data between components
116
What are artifacts used for in Vertex AI Pipelines?
To pass larger datasets between components, such as training data, that cannot be handled by parameters alone.
117
What are conditions used for in Vertex AI Pipelines?
To set rules where a component only runs if certain conditions are met. For example, only deploying the model if certain thresholds are met.
118
What are the 3 decorators required in order to create a Vertex AI Pipeline using code?
You create your functional and container-based components with the @component decorator You string together your components in the order in which you want them using your @pipeline decorator Use the @compiler decorator in order to take our pipeline function and compile it into our pipeline specification as a json file. This json file can be used to execute the pipeline.
119
What is cross-validation?
Cross-validation ensures that every single data point gets a chance to be in the "test set" at least once. In simple terms, instead of training your model once on one set of data and testing it once on another, cross-validation involves repeating the training and testing process multiple times on different subsets of your data to ensure your results aren't just a fluke. This is at the end of every training run, not every epoch.
120
From a folder structure perspective, what do you need to do differently when running a training pipeline against a Custom Container instead of a Pre-Built container?
For Pre-Built Containers, you use a setup.py file to specify all your libraries and dependencies before submitting a training pipeline, using Google’s pre-built containers for PyTorch, Tensorflow, Scikit and XGBoost in Artifact Registry. For Custom Containers, you use a Dockerfile to specify your dependencies. This Dockerile is pushed to Artifact Registry before you reference this container within your training pipeline.
121
In a neural network, what are the parameters that are learned by the model during training?
Weights and Biases. When we say we have a 10 billion parameter model, we are literally counting the number of weights and biases.
122
What are the hyper-parameters in a Neural Network that a human can decide before training?
1) Layers and neurons 2) Activation functions 3) Learning rate 4) Epochs
123
What is the difference between Vertex AI Hyperparameter Tuning Job vs Vertex AI Vizier?
Vertex AI Hyperparameter Tuning Job is a wrapper around your training code. You give it a Docker container (your model code) and say, "Maximize accuracy by changing the learning rate between 0.01 and 0.1." It spins up the infrastructure, runs the trials, and shuts them down. Vertex AI Vizier is a standalone "optimization engine" (API). It tells you what parameters to try next, but you are responsible for actually running the trial and reporting the result back. This can also therefore be used for non-ML use cases.
124
What are the 3 concepts in training Neural Networks that are utilised for amending the weights and biases?
1) Backpropagation - Modify weights and bias if the difference is significant 2) Cost or Loss Functions - Measure the distance between the predicted and actual value. 3) Gradient Descent - Decide how to the tune the weights, and when to stop when the data point reaches the base of the curve.
125
What is the difference between a Convolutional Neural Network and a Recurrent Neural Network?
CNNs process Space, making them good for Image Classification RNNs process Time, constantly relying on their memory of what just happened to understand the present. This makes them good for NLP where large sequences of text are provided as input to the model. They have also powerful in Time-Series Forecasting and Speech Recognition.
126
Within Neural Networks, what is a signal of overfitting and how would you go about remedying overfitting?
Over epochs, the evaluation loss should ideally decrease as the training loss decreases. However, if the evaluation loss starts to increase while the training loss continues to decrease, it's a sign of overfitting. Regularisation (through L1 and L2) can be used to help reduce the model’s complexity, making it better at generalising
127
What is Transfer Learning?
One or more layers from a previously trained model are lifted into a new model that will be used as a starting point for training a new model. For example, knowledge gained while learning to recognise cars could apply when trying to recognise trucks.
128
What are the 2 benefits of Transfer Learning?
1) You can use an available pretrained model, which can be used as a starting point for training your own model. 2) Transfer learning can enable you to develop models even for problems where you may not have very much data.
129
What are some ways to combat model underfitting?
1) Increase model complexity 2) Increase the number of features by performing feature engineering. 3) Remove noise from the data 4) Increase the number of epochs or increase the duration of training to get better results
130
What are some ways to combat model overfitting?
Regularization technique Dropout: Probabilistically remove inputs during training Noise: Add statistical noise to inputs during training Early stopping: Monitor model performance on a validation set and stop training when performance degrades. Data augmentation. Cross‐validation.
131
In the context of the bias-variance trade-off, what does an underfit model have?
High Bias and Low Variance.
132
In the context of the bias-variance trade-off, what does an overfit model have?
Low bias and high variance
133
What is regularization in model training?
Reguarlisation is a part of a loss function intended to keep model weights close to zero, ensuring no single weight overpowers the rest in any layer of the neural network. This enables models to more generalise and helps reduce overfitting.
134
What’s the difference between L1 regularisation and L2 regularisation?
L1 Regualisation tends to produce sparse weights, meaning many of the weights become exactly zero. L2 Regularisation prefers to keep all weights small but not exactly zero, maintaining density of weights. Leading to a more balanced, but complex solution.
135
Simply put, what does Gradient measure?
A gradient simply measures the change in all weights with regard to the change in error
136
What is Batch Normalisation and what is it used for?
Batch Normalisation is used in neural networks in order to force the input of every layer to have a mean of 0 and a variance of approx. 1. By normalizing the activations at every step, the weights don't need to be extremely small or large to handle the data, keeping gradients stable.
137
What is a computational graph in machine learning?
It is a Blueprint of Operations. At its core, a computational graph is a way to represent a series of mathematical operations. Think of it as a flowchart for your machine learning model to follow during training.
138
What are the 2 primary characteristics of a computational graph?
Nodes: These represent the operations themselves, like addition, multiplication, or more complex functions. Edges: These represent the flow of data (tensors) between the operations.
139
What are the 2 approaches to Distributed Training architectures?
Data Parallelism - the dataset is divided into smaller chunks, and each worker node processes a different subset of the data. Model Parallelism - Model parallelism is employed when a model is too large to fit into the memory of a single worker node. In this approach, the model itself is partitioned at particular layers (say 1 GPU handles Layer 1-5, the next GPU handles Layer 6-10 etc.), and different parts of the model are placed on different workers. All workers process the same batch of data.
140
In Distributed Training architectures, what is the Tensorflow command for Model Parallelism?
tf.distribute.Strategy
141
In Distributed Training architectures, what are the steps undertaken when utilising Data Parallelism?
1) Replicate the Model: An identical copy of the model is loaded onto each worker node. 2) Split the Data: The training dataset is partitioned, and each worker receives a unique portion. 3) Parallel Processing: Each worker independently computes the forward and backward passes on its data subset to calculate the gradients. 4) Gradient Synchronisation: The gradients from all workers are aggregated to update the model's parameters.
142
In Distributed Training architectures, what are the 2 approaches to Data Parallelism? What is the primary benefit and what is the primary drawback of each?
Synchronous Training: All workers must finish processing their data batch and report their gradients before the model parameters are updated. This ensures consistency but can lead to bottlenecks if some workers are slower than others. Asynchronous Training: Each worker updates the model parameters independently without waiting for the others. This can lead to faster training times but may result in less stable convergence as some workers might be using stale model parameters.
143
What are the 3 principals of Machine Learning in a Hybrid Environment?
Composability, Portability and Scalability
144
What is Tensorflow Lite?
TensorFlow Lite is a specialised version of Google's open-source machine learning framework designed to run machine learning models on mobile and embedded devices.
145
What is Quantisation in Machine Learning?
In the realm of machine learning, quantisation is a technique used to reduce the computational and memory costs of running models. It achieves this by converting the numerical precision of a model's parameters (weights) and activations from high-precision floating-point numbers to lower-precision data types, such as 8-bit integers. This process makes models smaller, faster, and more energy-efficient, with a minimal impact on accuracy.
146
What are the 3 forms of prompt design for guiding the output of a model?
Zero-Shot prompting - Providing a single command to the LLM without any examples One-Shot Prompting - Providing a single example of the task to the LLM. Few-Shot Prompting - Providing a few examples of the task to the LLM.
147
What is the Prompt Gallery in Vertex AI?
A curated collection of sample prompts that show how generative AI models can work for a variety of use cases.
148
What is temperature within generative models?
The degree of randomness in token selection. A temperature of 0 is deterministic, selecting high possibility words. Whereas a temperature of 1 introduces more creativity…but also holds greater risk of unexpectedness and hallucination.
149
What is Top-K within generative models?
A top-k of 1 means the selected token is the most probable among all tokens in the model’s vocabulary (also called greedy decoding), while a top-k of 40 means that the next token is selected from among the 40 most probable tokens (using temperature).
150
What is Top-P within generative models?
Tokens are selected from a set of tokens with the sum of the probabilities not exceeding P. For example, if tokens A, B, and C have a probability of .3, .2, and .1 and the top-p value is .5, then the model will select either A or B as the next token (using temperature) as their probability totals up to .5. This prevents the model from returning a response with extremely low probability, even when you want the range of responses to be high through Top-K.
151
What are the 4 umbrella methodologies for improving a Generative AI model’s performance?
1) Prompt Design 2) Fine Tuning 3) Reinforcement Learning 4) Distilling
152
What are the benefits of using prompt design to improve Generative AI performance as opposed to more technical methods like fine-tuning, reinforcement or distillation?
Inexpensive vs more training runs Allows fast experimentation and customisation Doesn’t require ML background or complex technical skills
153
What is the difference between Fine-Tuning and Parameter-Efficient Tuning?
Fine-Tuning updates all the parameters of a pre-trained model for a new task, which is computationally expensive but can achieve maximum performance. In contrast, Parameter-Efficient Tuning (PEFT) freezes most of the model's original weights and only trains a very small fraction of new or existing parameters. This makes PEFT vastly more efficient in terms of computational cost and storage, while still delivering strong, competitive results.
154
What are the benefits of Parameter-Efficient Fine-Tuning?
1) Aims to reduce the challenges of fine-tuning 2) Only trains a subset of parameters of a much larger foundational model, making it less computationally expensive 3) Can require smaller datasets, making it more accessible
155
What are the supervised and unsupervised versions of Parameter-Efficient Fine-Tuning?
Adapter-Tuning - involves inserting small, fully-connected neural network layers, called adapter modules, between the existing layers of a pre-trained model. Only the parameters of these new adapter layers are trained, while the original model remains unchanged. Reinforcement - Unsupervised reinforcement learning with human feedback.
156
What is the downside of Adapter-Tuning?
It can introduce latency as you are adding more layers to the neural network.
157
What is distillation in model training?
Transferring knowledge from a larger model to a smaller model to optimise performance, latency and cost.
158
What is the difference between Transfer Learning and Distillation?
Transfer Learning is about Adaptation. You want to take a smart model and teach it a new task. Distillation is about Compression. You want to make a heavy model smaller and faster while keeping the same task.
159
What are the three types of models you can interact with in Model Garden?
Pre-Trained Models - From Google, third party and open source Task Specific Models - Like Entity Extraction, Sentiment Analysis etc. Fine-Tunable Models - Mostly open source
160
What is Vertex AI Studio?
A tool that lets you quickly test and customise generative AI models so you can leverage their capabilities in your applications.
161
What type of data does Vertex AI Metadata track?
To track and analyze the metadata produced by your machine learning (ML) systems, metadata such as parameters, artifacts (like datasets and models), metrics, and the lineage of components.
162
What is Model Registry?
A centralised tracking system that stores lineage, versioning and related metadata for published machine learning models.
163
What are the 2 types of serving on Vertex Endpoints?
Online Predictions - Synchronous requests. When a request is sent, the service processes it immediately and returns the prediction in the same response. Batch Predictions - Asynchronous requests. You submit a "job" with a large dataset (from Cloud Storage or BigQuery). Vertex AI processes the data in bulk, and the results are written to a specified output location (e.g., Cloud Storage or BigQuery).
164
When should you use Batch vs Online predictions for Vertex Endpoints?
Online Predictions - Interactive applications, Mobile app backend, or when predictions are generated in a one-by-one workflow. Batch Predictions - Offline analysis, scoring an entire dataset at a point in time and / or scheduled workloads
165
What are Vertex AI Private Endpoints?
Vertex AI Private Endpoints provide a way to access Vertex AI online prediction services using private IP addresses within your Virtual Private Cloud (VPC) network. Instead of sending prediction requests over the public internet, traffic remains within the Google Cloud network, enhancing security and potentially reducing latency. Vertex AI Private Endpoints can be good to use when your backend client that is calling your endpoint is hosted in GCP.
166
How do we ensure that a Model state is not lost by interruptions / failures during a training job?
Applying Training Checkpoints
167
What 2 ways does Vertex AI provide for you to monitor your ML models?
Skew detection: This approach looks for the degree of distortion between your model training and production data Drift detection: In this type of monitoring, you're looking for drift in your production data. Drift occurs when the statistical properties of the inputs and the target, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions could become less accurate as time passes.
168
In a perfect scenario, would you prefer to monitor for skew detection or drift detection and why?
As much as possible, use skew detection because knowing that your production data has deviated from your training data is a strong indicator that your model isn't performing as expected in production. If you don't have access to the training data, turn on drift detection so that you'll know when the inputs change over time. Use drift detection to monitor whether your production data is deviating over time. For drift detection, enable the features you want to monitor and the corresponding thresholds to trigger an alert.
169
How do you enable skew detection on your Vertex AI Monitoring service?
For skew detection, set up the model monitoring job by providing a pointer to the training data that you used to train your model.
170
When it comes to adaptive models, what changing variables are you attempting to mitigate risk against?
An upstream model changing A data source maintained by another team changing Data Drift - The relationship between features and labels
171
What are the different types of data drift?
Changes in label distribution - E.g. A model that predicts how long humans live for has a label that has inherently increased over time. Changes in feature distribution - E.g. a model that predicts population movement patterns using postal code as a feature. Postal codes aren’t fixed and can therefore drift.
172
When predictions are made by a machine learning model, what it meant by extrapolation?
Extrapolation is the process of making predictions on new data that falls outside the range of the training data. The model has to make assumptions about how the patterns it learned from the the training data will continue in unchartered territory.
173
When predictions are made by a machine learning model, what it meant by interpolation?
Interpolation is the process of making predictions on new, unseen data that falls WITHIN the boundaries of the training data. For example, if you have a model trained to predict house prices based on square footage, and the training data includes houses between 1,000 and 3,000 square feet, predicting the price of a 2,200 square foot house would be interpolation.
174
What is the difference between Data Drift and Model / Concept Drift?
Data Drift (The features change) - The inputs into the model change. For example, a IOT device changes reporting from degree fahrenheit to degrees celsius. Model / Concept Drift (The relationship between features and labels change) - (also known as concept drift) occurs when the relationship between the input data and the output (the target variable) changes. The features themselves might not have changed, but what they mean in relation to the prediction has. For example, say you build a model to classify positive and negative sentiment of Reddit feed around certain topics. Over time, people's sentiments about these topics change. Or in email spam, malicious actors will adapt the language to bypass spam filters, causing model / concept drift for your Spam Detection model.
175
What is Training-Serving Skew?
Any scenario in which the training data is generated differently from how the data is generated / collected in production.
176
What is Ablation Analysis?
Ablation analysis is the process of systematically removing parts of a machine learning model or algorithm to understand the contribution of each component.
177
How do we protect from changing distributions / data drift in Machine Learning?
Monitoring - Look at the descriptive summaries of your inputs and compare them to what the model has seen. For example, if the mean or the variance has changed substantially, then you can analyse this new segment of the input space, to see if the relationships learned still hold. Monitor Residuals - Residuals are the difference between the predictions and the labels. If errors are increasing, or have moved to a different area of the curve, this could be evidence of a change in relationship between features & labels. Custom Loss Function - To emphasise data recency. Regularly Retrain Models - Applying a Dynamic Training principle to retrain at intervals or when data drift is detected by thresholds breaking.
178
What is Data Leakage?
Where data from outside the training dataset is improperly used to create the model. The model essentially learns from information it wouldn't have access to in a real-world scenario, such as averages from its testing dataset or variables that are derived from the label.
179
What are the two types of Data Leakage?
Target Leakage - When training data includes features that are "contaminated" with information about the target variable, but this information won't be available when you actually need to make a prediction E.g. “Weekly Wages” being a feature of a “Annual Wages” label. Train-Test Contamination - This occurs when you don't properly separate your training and testing datasets before pre-processing / feature engineering. E.g. calculating averages for missing datasets across both training and test.
180
What is VM Cohosting?
A Vertex AI model is deployed to its own virtual machine (VM) instance by default, cohosting enables models to share resources so that CPU, GPU and memory are fully utilised across traffic.
181
What are the benefits of VM Cohosting?
Resource sharing across multiple deployments. Cost-effective model serving. Improved utilization of memory and computational resources.
182
What is the shift in the actual relationship between the model inputs and the output called?
Concept / Model Drift
183
Where does the name Apache Beam come from?
The name Beam comes from a combination of the words “Batch” and “Stream”.
184
What is one key advantages of preprocessing your ML features using Apache Beam?
The same code you use to preprocess features in training and evaluation can also be used in serving.
185
What are the different TPU configurations?
A single TPU device A TPU Pod (a group of TPU devices connected by high‐speed interconnects) A TPU slice (a subdivision of a TPU Pod) A TPU VM
186
What is a tensor?
A tensor is a multi-dimensional array or an N-dimensional list of numbers.
187
What are the different types of tensor?
Scalar (a 0-dimensional array): A single value, like the number 7, has a shape of (), making it a 0-dimensional tensor. Vector (a 1D array): A list of numbers, such as [1, 2, 3], has one dimension and a length (e.g., 3), making it a 1-dimensional tensor. Matrix (a 2D array): A grid of numbers, like a spreadsheet with rows and columns, has two dimensions (rows and columns). For example, a 2x3 matrix is a 2-dimensional tensor with a shape of (2, 3). Higher-Dimensional Tensors: Any array with three or more dimensions is a tensor. For example, a 3D tensor could represent a grayscale video (with dimensions for frames, height, and width), while a 4D tensor could represent a batch of color images.
188
What is tf.Transform a hybrid of?
1) Apache Beam / Dataflow 2) Tensorflow
189
Which Tensorflow component identifies anomalies between training and serving data and can automatically create a schema by examining the data?
Data Validation
190
What is Tensorflow Extended - Data Validation used for?
Can be used for generating schemas and statistics about the distribution of every feature in the dataset. This can be used to ensure that the data used for training a model is consistent with the data the model will see in production.
191
What is the usual workflow when working with Tensorflow Extended - Data Validation?
StatisticsGen - Generate statistics for the data SchemaGen - Use those statistics to generate a schema for each feature Visualise the schema and statistics and manually inspect them Update the schema if needed
192
What is the difference between Pairwise and Pointwise evaluation within Generative AI evaluation?
The difference lies in their scope. In Pointwise evaluation the evaluator looks at a single prompt and a single response. They assign a score based on a specific rubric or scale (e.g., 1–5 Likert scale, Pass/Fail, or Binary Correct/Incorrect).. In pairwise evaluation, the evaluator is presented with one prompt and two different responses (like the LMSYS Chatbot Arena).
193
What are Rubrics within GenAI Evaluation?
Rubrics are a set of instructions and criteria given to the evaluation (whether a human or an LLM-as-a-Judge) to define exactly how to score a response. Without a rubric, evaluation is subjective "vibes-based" checking. With a rubric, evaluation becomes a measurable, reproducible metric.
194
What are metrics within GenAI Evaluation?
A score that measures the model output against the rating rubrics.
195
What do BLUE, ROUGUE and METEOR focus on within Generative AI evaluation?
ROUGE - Focuses on recall BLEU - Focuses on precision METEOR - Focuses on precision and recall
196
Within GenAI Evaluation, what does perplexity measure?
Quantifies how well the language model predicts the next word in a sequence.
197
What is the difference between Data Engineering and Feature Engineering?
Preprocessing the data for ML involves both data engineering and feature engineering. Data engineering is the process of converting raw data into prepared data. Feature engineering then tunes the prepared data to create the features that are expected by the ML model.
198
What is PCA in feature engineering?
Principal Component Analysis. It is a statistical technique used in feature engineering for dimensionality reduction. Essentially, it simplifies complex data by reducing the number of variables (features) while retaining as much of the original information (variance) as possible.
199
What is MinMax Scaling also known as?
MinMax Scaling (also commonly referred to as Normalization) is a feature scaling technique that shifts and rescales the values of a numeric feature so they end up ranging between two fixed numbers, typically 0 and 1.
200
If I am using TensorFlow transform, do I need to deploy pre-processing infront of my model endpoint at serving time?
No, you generally do not need to deploy a separate pre-processing service in front of your model endpoint. One of the primary value propositions of TensorFlow Transform (TFT) is that it allows you to "bake" your pre-processing logic directly into the model graph that you export for serving. This ensures that the exact same transformations used during training are applied during serving, eliminating training-serving skew.