Linear Learner
linear regression
can handle both regression and classification
for classification, a linear threshold is used
Linear Learner Input Format
recordIO/protobuf, csv
file or pipe mode supported
Linear Learner Usage
preprocessing: data must be normalized and shuffled training: choose optimization function alg multiple models optimized in parallel tune L1, L2 regularization
XGBoost
eXtreme Gradient Boosting
boosted group of decision trees
gradient descent to minimize loss
can be used for classification and regression
XGBoost Input
CSV, libsvm
recently recordIO/protobuf, Parquet
XGBoost Usage
Models are serialized/deserialized with Pickle
can be used within notebook or as a built in SM algorithm
HPs: Subsample, eta, gamma, alpha, lambda
Only uses CPUs, only memory bound
Seq2Seq
Input is a sequence of tokens, output is a sequence of tokens
good for machine translation, text summarization, speech to text
Seq2Seq Input
recordIO/protobuf - tokens must be integers
start with tokenized text files
NEED TO PROVIDE TRAINING DATA, VALIDATION DATA, AND VOCAB FILES
Seq2Seq Usage
Training can take days
Pretrained models available
Public training datasets available for specific translation tasks
HPs: batch, optimizer, # layers
can optimize on accuracy, BLEU score, perplexity
only uses single machine GPU
DeepAR
forcasting 1D time-series data
uses RNNs
allows you to train the same model on several related time series
finds frequency and seasonality
DeepAR Input
JSON lines (gzip or parquet)
each record must contain: start, target
can contain dynamic/categorical features
DeepAR Usage
HPs: epochs, batch size, learning rate, # cells, context length
GPU or CPU for training, CPU only for inference
BlazingText
BlazingText Input
BlazingText Usage
Word2Vec has multiple modes:
- cbow > continuous bag of words (order doesn't matter) - skip-gram (order matters) - batch skip-gram (distributed over CPU nodes)
HPs:
cbow and skipgram use GPU (can use CPU)
batch skipgram use single or multiple CPU
Text class - use CPU for smaller, GPU for larger
Object2Vec
Object2Vec Input
Object2Vec Usage
HPs: usual deep learning ones:
- dropout, early stopping, epochs, learning rate, batch size, layers, activation function, optimizer, weight decay - encoder1 network, encoder2 network
single machine, multi GPU
use inference pref mode to optimize for encoder embeddings rather than classification or regression
Object Detection
Object Detection Input
Object Detection Usage
HPs: batch size, learning rate, optimizer
GPU for training, CPU for inference
Image Classification
- doesn’t tell you where the objects are (no bounding)
Image Classification Input
Image Classification Usage
HPs: batch, learning rate, optimizers (weight decay, beta1, beta2, eps, gamma)
GPU for training, GPU or CPU for instance