- supervised learning - an implementation of the gradient-boosted trees algorithm - combines an ensemble of estimates from a set of simpler and weaker models. - highly effective for structured/tabular data - applicable for both classification and regression

- supervised learning - used for binary and multi-class classification as well as linear regression

- unsupervised learning - For clustering data points into groups based on similarity.

- unsupervised - designed for anomaly detection in large datasets.

- Computer vision - Based on ResNet architecture for categorizing entire images.

- Computer vision - Utilizes Single Shot Detector (SSD) to identify and locate multiple objects within an image.

- NLP - Supervised text classification (can't discover topics) - Highly optimized implementations of Word2Vec (embeddings) and text classification

- For discovering topics within text documents. - NLP - unsupervised - Latent Dirichlet Allocation aka Neural Topic Modeling

- NLP - Summarization and machine translation.

- Time series forecasting - For forecasting future values in time series data.

- Light Gradient Boosting Machine - faster training speed and lower memory usage compared to other gradient boosting frameworks like XGBoost - good for classification and regression

- open-source AutoML library that automates various aspects of ML, such as model selection, hyperparameter tuning, and ensembling, to achieve high accuracy with minimal effort. - tabular data

- gradient-boosting algorithm designed to handle categorical features efficiently - doesn't require explicit one-hot encoding

Sagemaker Built-In Algorithms Flashcards by Katie D

XGBoost

supervised learning
an implementation of the gradient-boosted trees algorithm
combines an ensemble of estimates from a set of simpler and weaker models.
highly effective for structured/tabular data
applicable for both classification and regression

How well did you know this?

Not at all

Perfectly

Linear Learner

supervised learning
used for binary and multi-class classification as well as linear regression

How well did you know this?

Not at all

Perfectly

Factorization Machines

supervised learning
high-dimensional sparse datasets
ideal for recommendation systems and click prediction
good at capturing interactions between features

How well did you know this?

Not at all

Perfectly

K-Means

unsupervised learning
For clustering data points into groups based on similarity.

How well did you know this?

Not at all

Perfectly

Random Cut Forest

unsupervised
designed for anomaly detection in large datasets.

How well did you know this?

Not at all

Perfectly

Image Classification

Computer vision
Based on ResNet architecture for categorizing entire images.

How well did you know this?

Not at all

Perfectly

Object Detection

Computer vision
Utilizes Single Shot Detector (SSD) to identify and locate multiple objects within an image.

How well did you know this?

Not at all

Perfectly

Semantic Segmentation

Computer vision
Employs the Fully Convolutional Network (FCN) algorithm for pixel-level image classification, providing detailed object shapes.
segmentation is a pixel-wise classification problem.
Attempt to predict a label for every pixel in the image. Results in a mask for which parts of the image correspond to a given label.

How well did you know this?

Not at all

Perfectly

BlazingText

NLP
Supervised text classification (can’t discover topics)
Highly optimized implementations of Word2Vec (embeddings) and text classification

How well did you know this?

Not at all

Perfectly

LDA

For discovering topics within text documents.
NLP
unsupervised
Latent Dirichlet Allocation aka Neural Topic Modeling

How well did you know this?

Not at all

Perfectly

Sequence2Sequence

NLP
Summarization and machine translation.

How well did you know this?

Not at all

Perfectly

DeepAR

Time series forecasting
For forecasting future values in time series data.

How well did you know this?

Not at all

Perfectly

LightGBM

Light Gradient Boosting Machine
faster training speed and lower memory usage compared to other gradient boosting frameworks like XGBoost
good for classification and regression

How well did you know this?

Not at all

Perfectly

AutoGluon-Tabular

open-source AutoML library that automates various aspects of ML, such as model selection, hyperparameter tuning, and ensembling, to achieve high accuracy with minimal effort.
tabular data

How well did you know this?

Not at all

Perfectly

CatBoost

gradient-boosting algorithm designed to handle categorical features efficiently
doesn’t require explicit one-hot encoding

How well did you know this?

Not at all

Perfectly

TabTransformer

Study These Flashcards

supervised learning on tabular data
tabular classification and regression tasks

Regression

Study These Flashcards

a type of supervised learning used to predict continuous numerical values
aims to find the best-fitting curve (or line in simpler cases) that represents the relationship between independent variables (features) and a dependent (target) variable.

Levenshtein distance

Study These Flashcards

sometimes called ‘edit distance’
tells you the closeness of two strings. For example, the word “raed” is closer to the word “read” than the word “baed”.
applications like autocorrect that matches the correct spelling of a given word.

How do you enable XGBoost to perform classification tasks?

Study These Flashcards

To enable XGBoost to perform classification tasks, set the objective parameter to multi:softmax and specify the number of classes in the num_class parameter.

Support Vector Machines (SVM)

Study These Flashcards

supervised algorithm mainly used for classification tasks
It uses decision boundaries to separate groups of data.

SVM with Radial Basis Function (RBF) kernel

Study These Flashcards

a variation of the SVM (linear) used to separate non-linear data.
Separating randomly distributed data in a two-dimensional space can be a daunting and difficult task.
The RBF Kernel provides an efficient way of mapping data (e.g., 2-D) into a higher dimension (e.g, 3-D). In doing so, we can conveniently apply the decision surface/hyperplane where we mainly based our model predictions.

Multinomial logistic regression

Study These Flashcards

used to predict categorical placement in or the probability of category membership on a dependent variable based on multiple independent variables
wants to understand which ice cream flavors are most preferred by different groups of people

What input data types are supported?

Study These Flashcards

recordIO
JSON
JPEG
PNG
CSV
libSVM

TSV, TFRecord, HDF5

Logistic Regression

Study These Flashcards

Logistic Regression is incorrect as this type of regression only predicts a binary output such as “0” or “1”.

knn

supervised

word2vec

- text classification, sentiment analysis, entity recognition, translation - not for summarization! use seq2seq for that

object2vec

- use when you have labelled objects/abstractions (e.g. claims)

Sagemaker Built-In Algorithms Flashcards

(27 cards)