Sagemaker Features Flashcards

(28 cards)

1
Q

SageMaker AutoPilot

A
  • automates the process of building, tuning, and deploying machine learning models based on a tabular dataset (CSV or Parquet).
  • automatically explores different solutions to find the best model.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

SageMaker GroundTruth

A
  • a data labeling service that lets you use workforce (human annotators) through your own private annotators, Amazon Mechanical Turk, or third-party services.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

SageMaker Data Wrangler

A

a visual data preparation and cleaning tool that allows data scientists and engineers to easily clean and prepare data for machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

SageMaker Neo

A

allows you to optimize machine learning models for deployment on edge devices to run faster with no loss in accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

SageMaker Automatic Model Tuning

A
  • automates the process of hyperparameter tuning based on the algorithm and hyperparameter ranges you specify.
  • This can result in saving a significant amount of time for data scientists and engineers.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Amazon SageMaker Debugger

A
  • provides real-time insights into the training process of machine learning models, enabling rapid iteration.
  • It allows you to monitor and debug training issues, optimize model performance, and improve accuracy by analyzing various model-related metrics, such as weights, gradients, and biases.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Managed Spot Training

A

– allows data scientists and engineers to save up to 90% on the cost of training machine learning models by using spare compute capacity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

SageMaker Studio

A

A web-based IDE for machine learning. It provides tools for the entire ML lifecycle, including data wrangling, model training, and deployment, all in one unified interface. Helps data scientists and developers quickly build and train models and streamline ML workflows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

SageMaker Notebooks

A
  • A fully managed, scalable Jupyter notebook for quick data exploration, model building, and training. It helps you start working on ML models immediately without managing infrastructure.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

SageMaker Distributed Data Parallelism (SMDDP)

A
  • A feature that enables efficient distributed training of deep learning models by automatically parallelizing data across multiple GPUs and instances.
  • Speeds up the training of large models on massive datasets, improving scalability and reducing training time.
  • It supports frameworks like TensorFlow and PyTorch, making it ideal for large-scale deep-learning tasks that require intensive computational resources.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

SageMaker Pipelines

A

A fully managed CI/CD service for automating the end-to-end machine learning workflow, including data preprocessing, model training, and deployment. It helps automate and streamline the ML lifecycle, ensuring consistency and efficiency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

SageMaker Model Monitor

A
  • Monitors models in production to detect issues such as data drift or model performance degradation.
  • Ensures that models continue to perform accurately after deployment.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

SageMaker Model Registry

A

A centralized repository for managing ML models, including tracking versions and promoting models for deployment. Ensures proper model version control and governance across teams.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

SageMaker Edge Manager

A

offers model management for edge devices, enabling you to optimize, secure, monitor, and manage machine learning models on various edge device fleets, including smart cameras, robots, PCs, and mobile devices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

SageMaker Feature Store

A

a fully managed repository designed to store, share, and manage features for machine learning models. It ensures high-quality, standardized features are available for both training and real-time inference, helping teams keep their feature data synchronized and consistent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

SageMaker JumpStart

A

provides pre-trained foundation models and ready-to-use solutions for common machine learning tasks like text summarization, image generation, and object detection, enabling users to deploy and experiment without deep expertise quickly.

16
Q

What are the two input modes for transferring training data?

A

File mode and Pipe mode

17
Q

File mode

A
  • Downloads data into the SageMaker instance volume before model training commences
  • Slower than pipe mode
  • Used for incremental training
18
Q

Pipe mode

A
  • Directly stream data from Amazon S3 into the training algorithm container.
  • There’s no need to procure large volumes to store large datasets.
  • Provides shorter startup and training times.
  • Higher I/O throughputs
  • Faster than File mode.
  • You MUST use protobuf RecordIO as your training data format before you can take advantage of the Pipe mode.
19
Q

What should you do if training via File mode is too slow?

A
  • Convert training data into a protobuf RecordIO format to make use of Pipe mode.
  • Use Amazon FSx for Lustre to accelerate File mode training jobs.
20
Q

What are the two ways to deploy a model of inference?

A
  • Amazon SageMaker Hosting Services
  • Amazon SageMaker Batch Transform
21
Q

Amazon SageMaker Batch Transform

A
  • Batch inference
  • Doesn’t need a persistent endpoint
22
Q

Amazon SageMaker Hosting Services

A
  • Provides a persistent HTTPS endpoint for getting predictions one at a time.
  • Suited for web applications that need sub-second latency response.
23
Q

SageMaker Content-Based Filtering

A
  • Only applicable to recommendation systems
  • If you watch scifi, the content-based filtering will recommend scifi
  • Not a good solution for image content moderation (use Rekognition for that)
24
What 4 pieces of information does a training job need?
- The URL of the Amazon Simple Storage Service (Amazon S3) bucket where you've stored the training data. - The compute resources that you want SageMaker AI to use for model training. Compute resources are ML compute instances that are managed by SageMaker AI. - The URL of the S3 bucket where you want to store the output of the job. - The Amazon Elastic Container Registry path where the training code is stored.
25
What should you do to make sure a training dataset isn't accessible over the public internet?
- Configure the access to the source VPC endpoint and the VPC ID in the bucket policy. -
26
What does a VPC endpoint do?
- A VPC endpoint enables private connections between your VPC and supported AWS services - Traffic between your VPC and the other service does not leave the Amazon network. - VPC endpoint does not require a security group or a Network Access Control List (NACL).
27
How do you encrypt training data?
- KMS key - Key management service key