Sagemaker Built-In Algorithms Flashcards

(27 cards)

1
Q

XGBoost

A
  • supervised learning
  • an implementation of the gradient-boosted trees algorithm
  • combines an ensemble of estimates from a set of simpler and weaker models.
  • highly effective for structured/tabular data
  • applicable for both classification and regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Linear Learner

A
  • supervised learning
  • used for binary and multi-class classification as well as linear regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Factorization Machines

A
  • supervised learning
  • high-dimensional sparse datasets
  • ideal for recommendation systems and click prediction
  • good at capturing interactions between features
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

K-Means

A
  • unsupervised learning
  • For clustering data points into groups based on similarity.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Random Cut Forest

A
  • unsupervised
  • designed for anomaly detection in large datasets.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Image Classification

A
  • Computer vision
  • Based on ResNet architecture for categorizing entire images.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Object Detection

A
  • Computer vision
  • Utilizes Single Shot Detector (SSD) to identify and locate multiple objects within an image.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Semantic Segmentation

A
  • Computer vision
  • Employs the Fully Convolutional Network (FCN) algorithm for pixel-level image classification, providing detailed object shapes.
  • segmentation is a pixel-wise classification problem.
  • Attempt to predict a label for every pixel in the image. Results in a mask for which parts of the image correspond to a given label.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

BlazingText

A
  • NLP
  • Supervised text classification (can’t discover topics)
  • Highly optimized implementations of Word2Vec (embeddings) and text classification
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

LDA

A
  • For discovering topics within text documents.
  • NLP
  • unsupervised
  • Latent Dirichlet Allocation aka Neural Topic Modeling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Sequence2Sequence

A
  • NLP
  • Summarization and machine translation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DeepAR

A
  • Time series forecasting
  • For forecasting future values in time series data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

LightGBM

A
  • Light Gradient Boosting Machine
  • faster training speed and lower memory usage compared to other gradient boosting frameworks like XGBoost
  • good for classification and regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

AutoGluon-Tabular

A
  • open-source AutoML library that automates various aspects of ML, such as model selection, hyperparameter tuning, and ensembling, to achieve high accuracy with minimal effort.
  • tabular data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

CatBoost

A
  • gradient-boosting algorithm designed to handle categorical features efficiently
  • doesn’t require explicit one-hot encoding
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

TabTransformer

A
  • supervised learning on tabular data
  • tabular classification and regression tasks
16
Q

Regression

A
  • a type of supervised learning used to predict continuous numerical values
  • aims to find the best-fitting curve (or line in simpler cases) that represents the relationship between independent variables (features) and a dependent (target) variable.
17
Q

Levenshtein distance

A
  • sometimes called ‘edit distance’
  • tells you the closeness of two strings. For example, the word “raed” is closer to the word “read” than the word “baed”.
  • applications like autocorrect that matches the correct spelling of a given word.
18
Q

How do you enable XGBoost to perform classification tasks?

A

To enable XGBoost to perform classification tasks, set the objective parameter to multi:softmax and specify the number of classes in the num_class parameter.

19
Q

Support Vector Machines (SVM)

A
  • supervised algorithm mainly used for classification tasks
  • It uses decision boundaries to separate groups of data.
20
Q

SVM with Radial Basis Function (RBF) kernel

A
  • a variation of the SVM (linear) used to separate non-linear data.
  • Separating randomly distributed data in a two-dimensional space can be a daunting and difficult task.
  • The RBF Kernel provides an efficient way of mapping data (e.g., 2-D) into a higher dimension (e.g, 3-D). In doing so, we can conveniently apply the decision surface/hyperplane where we mainly based our model predictions.
21
Q

Multinomial logistic regression

A
  • used to predict categorical placement in or the probability of category membership on a dependent variable based on multiple independent variables
  • wants to understand which ice cream flavors are most preferred by different groups of people
22
Q

What input data types are supported?

A

recordIO
JSON
JPEG
PNG
CSV
libSVM

No

TSV, TFRecord, HDF5

23
Q

Logistic Regression

A

Logistic Regression is incorrect as this type of regression only predicts a binary output such as “0” or “1”.

24
knn
supervised
25
word2vec
- text classification, sentiment analysis, entity recognition, translation - not for summarization! use seq2seq for that
26
object2vec
- use when you have labelled objects/abstractions (e.g. claims)