[4] SageMaker Flashcards

Question 1

Q

How can models being built using SageMaker Notebooks be rapidly iterated?

Answer

A

Using SageMaker Local Mode to train models from the notebook, preventing the overhead from provisioning infrastructure and moving data

Question 2

Q

What is Amazon Ground Truth?

Answer

A

A service that uses humans (either in-house, specialists or Mechanical Turk) to label data and train the model

It uses this model to automatically label ‘easy’ cases, reducing training cost

Question 3

Q

What are .lst files?

Answer

A

Space separated files used to list data, such as images and their labels

Question 4

Q

Where can SageMaker algorithms be sourced?

Answer

A

They can be custom, from the Marketplace or provided

Question 5

Q

What the the main built-in SageMaker Algorithms?

Answer

A

BlazingText - word2vec text classification for NLP and sentiment analysis etc.
Image Classification Algorithm - general purpose CNN

K-Means - optimised for ‘web scale’

Latent Dirichlet Allocation (LDA) - perform text analysis and topic discovery

XGBoost - gradient boosted trees algorithm; used on tabular datasets

Question 6

Q

Where the do the assets for custom SageMaker Algorithms exist?

Answer

A

The code is hosted on ECR and the model itself etc. is on S3

Question 7

Q

Can you view the code for SageMaker Algorithms from the Marketplace?

Question 8

Q

What services does SageMaker support as data sources?

Answer

A

S3, EFS and FSx for Lustre

Question 9

Q

How are parts of the data in the dataset (i.e. train vs validation) managed?

Answer

A

With channels

Question 10

Q

How should failed training jobs be debugged?

Answer

A

look at the CloudWatch Logs
use the DescribeTrainingJob API and check the FailureReason

However, don’t use the SageMaker Console

Question 11

Q

What are the general types of hyper parameters?

Answer

A

Model hyper-parameters - how the model is structure e.g. filter size
Optimiser hyper-parameters - how the model is trained e.g. step size
Data hyper-parameters - modify the data itself e.g. data augmentation

Question 12

Q

What technique does SageMaker Automatic Model Tuning use?

Answer

A

Bayesian optimisation

Question 13

Q

What are the general steps for performing hyper parameter tuning?

Answer

A

(decide on the model to use)
Set the ranges of the hyper parameters e.g. max depth from 3 to 9
Choose the metric to maximise e.g. AUC

Question 14

Q

Which SageMaker tool is used for hyper parameter optimisation?

Answer

A

SageMaker Automatic Model Tuning

Question 15

Q

What are the steps to hosting a model with SageMaker?

Answer

A

Create a model in SageMaker - specific the S3 and ECR paths
Create an endpoint configuration based on the model from (2) and the number of instances etc.
Create an HTTPS endpoint using the configuration from (3)

Question 16

Q

What are some key considerations when managing SageMaker deployments?

Answer

Study These Flashcards

A

Decouple the ETL and ML pipelines as the former is IO intensive while the later needs GPU
Endpoints support auto-scaling and deployments across AZs (for HA)
A single endpoint can’t serve multiple models - use a Lambda function to perform an ensemble etc.

Question 17

Q

What are the key considerations when securing SageMaker Notebooks?

Answer

Study These Flashcards

A

Restrict the sagemaker:CreatePresignedNotebookInstanceUrl IAM permission
Restrict root access to notebooks
Narrow the scope of instance profiles attached to notebook instances

Question 18

Q

Can you lock down access per SageMaker Notebook using IAM?

Answer

Study These Flashcards

A

No

Question 19

Q

What are the key considerations when securing SageMaker models?

Answer

Study These Flashcards

A

Models are hosted in a public VPC by default, but a private one can be configured
Data and models are stored in S3 - encrypt this and restrict access to a trusted VPC endpoint

[4] SageMaker Flashcards

(19 cards)