What is the difference between Supervised and Unsupervised Machine Learning?
Supervised Learning - Utilises labeled input and output data.
Unsupervised Learning - Discover hidden patterns in data without any human provided labels.
What are some of the benefits and drawbacks of Supervised and Unsupervised machine learning when compared to one another?
Supervised Learning:
Tend to be more accurate than unsupervised models
Require historical data, or humans to manually label data.
Unsupervised Learning:
Do not predict, they simply group data together.
Within supervised machine learning models, what are features vs labels?
Supervised Machine learning models “learn” the association between known features and unknown labels.
Each column of data that will help us determine the outcome (win or loss for a tournament game) is called a feature.
The column of data that you are trying to predict is called the label. Machine learning models “learn” the association between features to predict the outcome of a label.
If we were training a classification supervised machine learning model (e.g. a logistic regression) on historic team sports results to predict the outcome of future games, why should we NOT use the points scored (win_pts or lose_pts) as a feature in our training dataset, even though we have the data available?
This feature is only available at the END of the game and for future games we are making predictions before a game begins.
This is called data leakage.
Define Responsible AI
The development and use of AI in a way that prioritises ethical considerations, fairness, accountability, safety and transparency.
What is a TPU?
A Tensor Processing Unit (TPU) is Google’s custom developed chips for application-specific integrated circuits, allowing AI workloads (such as training and inference) to scale.
What are the 4 storage classes in GCS?
1) Standard Storage - Hot data, accessed in real-time.
2) Nearline Storage - Once per month
3) Coldline Storage - Once every 90 days
4) Archive Storage - Once a year
On Cloud Storage, which data storage class is best for storing data that needs to be accessed less than once a year?
Archive Storage
What are the 4 products that should be considered in the Data Ingestion & Process phase of the Data-to-AI Workflow?
Pub Sub
Data Flow
Data Proc
Cloud Data Fusion
What are the 6 products that should be considered in the Data Storage phase of the Data-to-AI Workflow?
Cloud Storage
BigQuery
Cloud SQL
Cloud Spanner
Cloud Bigtable
Cloud Firestore
What are the 2 products that should be considered in the Data Analytics phase of the Data-to-AI Workflow?
BigQuery - Fully Managed Data Warehouse solution
Looker - BI layer for visualising and governing data across your organisation
What makes a Machine Learning model a Deep Learning model?
Deep Learning is a subset of machine learning that adds layers in between input data and output results to make a machine learn at more depth. This is usually in the form of a neural-network architecture.
You want to use machine learning to discover the underlying pattern and group a collection of unlabeled photos into different sets. Which should you use?
Unsupervised Learning - Cluster Analysis
Which SQL command would you use to create an ML model in BigQuery ML?
CREATE OR REPLACE MODEL
Describe the Machine Learning / MLOPs workflow as though you were comparing it to running a restaurant.
1) Data Preparation - Prepare your ingredients
a) Data Import - Batch vs Streaming. Structured vs Unstructured.
b) Feature Engineering - Chopping the onions, peeling the carrots etc. before you start cooking.
2) Model Development - Experiment with different recipes. Loop: Train the Model –> Evaluate the model
3) Model Serving - Finalise and iterate on the menu to meet customer’s changing needs
a) Deploy the Model
b) Monitor the Model
What are the 3 stages of maturity in MLOps?
MLOps Level 0: Manual Process
At this initial level, the workflow for building and deploying models is entirely manual, script-driven, and interactive. It is characterized by a disconnect between data scientists and operations, infrequent release iterations, and a lack of active performance monitoring or CI/CD practices.
MLOps Level 1: ML Pipeline Automation
This level focuses on performing continuous training (CT) of the model by automating the machine learning pipeline itself. It enables rapid experimentation and the continuous delivery of fresh prediction models trained on new live data, often employing triggers like data validation or drift detection.
MLOps Level 2: CI/CD Pipeline Automation
The most mature level introduces a robust CI/CD system to automatically test and deploy new implementations of the ML pipelines themselves, not just the models. This allows organizations to reliably update pipeline architecture and code in production, enabling them to cope quickly with changing data and business environments.
What are the 4 types of Machine Learning options for model development and usage via Google Cloud?
1) Pre-Trained Models
2) BigQuery ML
3) AutoML
4) Custom Training
What are the four phases of the AutoML Pipeline?
Phase 1 - Auto data pre-processing using Tensorflow Transform
Phase 2 - Architecture Search, Selection & Tuning
Phase 3 - Cross Validation & Bagging Ensemble
Phase 4 - Deploy & Predict
What 2 critical technologies support auto search and architecture selection for AutoML?
Neural Architecture Search - Helps search the best models and tune the parameters automatically.
Transfer Learning - AutoML has already trained many different models with large amounts of data. These trained models can be used as foundational models to reach higher accuracy with much less data and computational training time. This allows you to train models with smaller datasets by leveraging inherent knowledge within models that were trained on larger datasets.
What is the purpose of Phase 3: Bagging Ensemble within AutoML?
AutoML does not rely on one single model, but on the top number of models that were selected during phase 2. The number of models depends on the training budget, but is typically around ten.
The assembly can be as simple as averaging the predictions of the top number of models relying on multiple top models instead of one greatly improves the accuracy of prediction.
When would you choose to use Colab Enterprise over Vertex Workbench?
1) When you want to avoid managing compute.
2) When your logic can be housed within a single notebook.
3) Collaboration - When you don’t want to worry about utilising Git, as there are built in version control and sharing capabilities.
When would you choose to use Vertex Workbench over Colab Enterprise?
1) When you’re migrating an existing Jupyter notebook from your local environment to the cloud.
2) Vertex Workbench is better for complex projects that span over multiple files and directories.
3) When you need native support for GitHub.
What is the benefit of using Vertex AI Workbench over a Jupyter Notebook run locally?
1) Scalability and Performance - Releasing yourself from the resource constraints of your local machine
2) Collaboration & Reproducibility - Shared Environment
3) Scheduled Executions
4) Seamless Integration with GCP Services
What happens when you execute a cell on Colab Enterprise?
Colab Enterprise connects to an Python kernel on a runtime, the code is executed by that kernel.