What are the 5 Areas of Algorithmns?
Clustering
Anomaly Detection
Regression
Multi-Class Classification
Two-Class Classification
Algorithm Cheat Sheet:
Anomaly Detection
( when to use )
( One-Class SVM & PCA-Based Anomaly Detection )
Anomaly Detection: find unusual data points
One-Class SVM: >100 features, aggressive boundary
PCA-Based Anomaly Detection: fast training
Algorithm Cheet Sheet:
Clustering
( when to use )
Clustering: identify data structure
Algorithm Cheat Sheet:
Multi-Class Classification
( when to use )
Multiclass Logistic Regression
Multiclass Neural Network
Multiclass Decision Forest
Multiclass Decision Jungle
One-V-All Multiclass
Multi-Class Classification: 3+ classification
Multiclass Logistic Regression: fast-training linear models
Multiclass Neural Network: accuracy, long training times
Multiclass Decision Forest: accuracy, fast training
Multiclass Decision Jungle: accuracy, small memory footprint
One-V-All Multiclass: utilized with the two class classifier
Algorithm Cheat Sheet:
Two-Class Classification
( when to use )
Two-Class SVM
Two-Class Locally Deep SVM
Two-Class Average Preceptron
Two-Class Logistic Regression
Two-Class Bayes Point Machine
Two-Class Decision Forest
Two-Class Boosted Decision Tree
Two-Class Decision Jungle
Two-Class Neural Network
Algorithm Cheat Sheet:
Regression
(when to use)
Ordinal Regression
Poisson Regression
Fast Forest Quantile Regression
Linear Regression
Bayesian Linear Regression
Neural Network Regression
Decision Forest Regression
Boosted Decision Tree Regression
What is a Pipeline?
Automated workflow of the machine learning steps
data processing to deployment
data processing > build & train > deploy and monitor
Image Classification:
Set Up Your Development Environment Steps
1. import packages
2. connect to worksapce
3. create an experiement to track runs
4. create a remote compute target
Stratified Random Sampling
division of populations into smaller sub-groups based on a shared characteristic (location, population, income, eduction, etc.)
by spliting the data this way you get proportional representations of your train and test split
5% of populations income > $125K, the train split should have 5% > $125K and the trest split should have 5% > $125K
SDK: Experiment
function name?
function to run and experiment?
function to run experiment from ScriptRunConfig( )?
log metrics?
Experiment Function
Experiment(workspace, name)
Run and Experiment
name_of_experiment.start_logging()
Run Experiment from ScriptRunConfig
Run.get_context( )
Log Metrics
name_of_experiment.log(‘metric name’, metric value)
not log_metric() which is log metric for mlFlow
End Run
name_of_experiment.complete()
SDK: What class is used to create a script configuration?
what is it used for?
how to submit the class?
script_config = ScriptRunConfig(source_directory, script)
run_name = experiment_name.submit(config=script_config)
new_run.wait_for_completion( )
ensures script does not finish locally
SDK: .get_context( )
retrieve and access the current run
used to log and run experiments when using the ScriptRunConfig( ) class
SDK: Environment
what’s the function?
add dependencies from conda (2 steps)?
Function
Environment()
Bring in Packages
CondaDependencies.create( conda_packages = [] )
env_name.python.conda_dependencies = env_name
SDK: Compute Cluster
package used?
how to provision?
how to create?
package used
from azureml.core.compute import AmlCompute
provisioning method
AmlCompute.provisioning_configuration( )
creation method
AmlCompute.create()
SDK: Automate Model Training AzureML
configuration function?
assign cluster function?
pipeline data function?
pipeline step function?
build pipeline function?
experiment & run functions?
Build Pipeline Preliminary Steps
Build Pipeline Additional Steps
Step One: Configuration Function
from azureml.core.runconfig import RunConfiguration
RunConfiguration( )
Step Two: Assign Cluster
config_name.target = name_of_compute_cluster
config_name.environment = env_name
Step Three: Pipeline Data Function
from azureml.pipeline.core import PipelineData
PipelineData( )
Step Four: Define Pipeline Steps Function
from azureml.pipeline.steps import PythonScriptStep
PythonScriptStep( )
use twice of data prep & training the model
Step Five: Build Pipeline Function
Pipeline( )
Step Five: Experiment Function & Submit
Experiment( )
experiment_name.submit( )
Command Line Arguments