- automates the process of building, tuning, and deploying machine learning models based on a tabular dataset (CSV or Parquet). - automatically explores different solutions to find the best model.

- A fully managed, scalable Jupyter notebook for quick data exploration, model building, and training. It helps you start working on ML models immediately without managing infrastructure.

Sagemaker Features Flashcards by Katie D

SageMaker AutoPilot

automates the process of building, tuning, and deploying machine learning models based on a tabular dataset (CSV or Parquet).
automatically explores different solutions to find the best model.

How well did you know this?

Not at all

Perfectly

SageMaker GroundTruth

a data labeling service that lets you use workforce (human annotators) through your own private annotators, Amazon Mechanical Turk, or third-party services.

How well did you know this?

Not at all

Perfectly

SageMaker Data Wrangler

a visual data preparation and cleaning tool that allows data scientists and engineers to easily clean and prepare data for machine learning.

How well did you know this?

Not at all

Perfectly

SageMaker Neo

allows you to optimize machine learning models for deployment on edge devices to run faster with no loss in accuracy.

How well did you know this?

Not at all

Perfectly

SageMaker Automatic Model Tuning

automates the process of hyperparameter tuning based on the algorithm and hyperparameter ranges you specify.
This can result in saving a significant amount of time for data scientists and engineers.

How well did you know this?

Not at all

Perfectly

Amazon SageMaker Debugger

provides real-time insights into the training process of machine learning models, enabling rapid iteration.
It allows you to monitor and debug training issues, optimize model performance, and improve accuracy by analyzing various model-related metrics, such as weights, gradients, and biases.

How well did you know this?

Not at all

Perfectly

Managed Spot Training

– allows data scientists and engineers to save up to 90% on the cost of training machine learning models by using spare compute capacity.

How well did you know this?

Not at all

Perfectly

SageMaker Studio

A web-based IDE for machine learning. It provides tools for the entire ML lifecycle, including data wrangling, model training, and deployment, all in one unified interface. Helps data scientists and developers quickly build and train models and streamline ML workflows.

How well did you know this?

Not at all

Perfectly

SageMaker Notebooks

A fully managed, scalable Jupyter notebook for quick data exploration, model building, and training. It helps you start working on ML models immediately without managing infrastructure.

How well did you know this?

Not at all

Perfectly

SageMaker Distributed Data Parallelism (SMDDP)

A feature that enables efficient distributed training of deep learning models by automatically parallelizing data across multiple GPUs and instances.
Speeds up the training of large models on massive datasets, improving scalability and reducing training time.
It supports frameworks like TensorFlow and PyTorch, making it ideal for large-scale deep-learning tasks that require intensive computational resources.

How well did you know this?

Not at all

Perfectly

SageMaker Pipelines

A fully managed CI/CD service for automating the end-to-end machine learning workflow, including data preprocessing, model training, and deployment. It helps automate and streamline the ML lifecycle, ensuring consistency and efficiency.

How well did you know this?

Not at all

Perfectly

SageMaker Model Monitor

Monitors models in production to detect issues such as data drift or model performance degradation.
Ensures that models continue to perform accurately after deployment.

How well did you know this?

Not at all

Perfectly

SageMaker Model Registry

A centralized repository for managing ML models, including tracking versions and promoting models for deployment. Ensures proper model version control and governance across teams.

How well did you know this?

Not at all

Perfectly

SageMaker Edge Manager

offers model management for edge devices, enabling you to optimize, secure, monitor, and manage machine learning models on various edge device fleets, including smart cameras, robots, PCs, and mobile devices.

How well did you know this?

Not at all

Perfectly

SageMaker Feature Store

a fully managed repository designed to store, share, and manage features for machine learning models. It ensures high-quality, standardized features are available for both training and real-time inference, helping teams keep their feature data synchronized and consistent.

How well did you know this?

Not at all

Perfectly

SageMaker JumpStart

Study These Flashcards

provides pre-trained foundation models and ready-to-use solutions for common machine learning tasks like text summarization, image generation, and object detection, enabling users to deploy and experiment without deep expertise quickly.

What are the two input modes for transferring training data?

Study These Flashcards

File mode and Pipe mode

File mode

Study These Flashcards

Downloads data into the SageMaker instance volume before model training commences
Slower than pipe mode
Used for incremental training

Pipe mode

Study These Flashcards

Directly stream data from Amazon S3 into the training algorithm container.
There’s no need to procure large volumes to store large datasets.
Provides shorter startup and training times.
Higher I/O throughputs
Faster than File mode.
You MUST use protobuf RecordIO as your training data format before you can take advantage of the Pipe mode.

What should you do if training via File mode is too slow?

Study These Flashcards

Convert training data into a protobuf RecordIO format to make use of Pipe mode.
Use Amazon FSx for Lustre to accelerate File mode training jobs.

What are the two ways to deploy a model of inference?

Study These Flashcards

Amazon SageMaker Hosting Services
Amazon SageMaker Batch Transform

Amazon SageMaker Batch Transform

Study These Flashcards

Batch inference
Doesn’t need a persistent endpoint

Amazon SageMaker Hosting Services

Study These Flashcards

Provides a persistent HTTPS endpoint for getting predictions one at a time.
Suited for web applications that need sub-second latency response.

SageMaker Content-Based Filtering

Study These Flashcards

Only applicable to recommendation systems
If you watch scifi, the content-based filtering will recommend scifi
Not a good solution for image content moderation (use Rekognition for that)

What 4 pieces of information does a training job need?

- The URL of the Amazon Simple Storage Service (Amazon S3) bucket where you've stored the training data. - The compute resources that you want SageMaker AI to use for model training. Compute resources are ML compute instances that are managed by SageMaker AI. - The URL of the S3 bucket where you want to store the output of the job. - The Amazon Elastic Container Registry path where the training code is stored.

What should you do to make sure a training dataset isn't accessible over the public internet?

- Configure the access to the source VPC endpoint and the VPC ID in the bucket policy. -

What does a VPC endpoint do?

- A VPC endpoint enables private connections between your VPC and supported AWS services - Traffic between your VPC and the other service does not leave the Amazon network. - VPC endpoint does not require a security group or a Network Access Control List (NACL).

How do you encrypt training data?

- KMS key - Key management service key

Sagemaker Features Flashcards

(28 cards)