How to specify the data set as input using Python SDK code
The data must be in this form and must specify a certain column…
You need an Input instance and to initialize it with an AssetType and the path to your data asset:
from azure.ai.ml.constants import AssetTypes from azure.ai.ml import Input training_data_input = Input( type=AssetTypes.MLTABLE, path="azureml:input-data-automl:1")
For ML tasks, the data must be in tabular form and include a target column.
Explain what this code is doing:
from azure.ai.ml import automl
classification_job = automl.classification(
compute="aml-cluster",
experiment_name="auto-ml-class-dev",
training_data=my_training_data_input,
target_column_name="Diabetic",
primary_metric="accuracy",
n_cross_validations=5,
enable_model_explainability=True
)What my_training_data_input and primary_metric are.
This code uses the automl module from the Python SDK v2 to create a classification job instance. Noteable:
- Uses my_training_data_input as the training data source. It should represent an MLTable data asset from your Workspace since AutoML requires one for input.
- Sets the primary_metric to “accuracy”. It’s the target performance metric for which the optimal model will be determined.
Get a list of avaliable metrics to train a classification model
Use the ClassificationPrimaryMetrics enum to get a list of them:
from azure.ai.ml.automl import ClassificationPrimaryMetrics list(ClassificationPrimaryMetrics)
TM TTM MT EET
Four limits you can set once you instantiate an AutoML experiment or job
The four limits you’d set for the job:
* timeout_minutes - int. for terminating the AutoML expermiment
* trial_timeout_minutes - int. max minutes a trial can take
* max_trials - int. max number of trials or models that will be trained
* enable_early_termination - bool. end experiment if score isn’t improving over the short term
The method you call when you want to set limits on your AutoML job
Call the job’s set_limits method:
classification_job.set_limits( timout_minutes= 10, trial_timeout_minutes= 10, max_trials= 5, enable_early_termination= true)
Code to submit your AutoML Job
// submit the new job returned_job = ml_client.jobs.create_or_update(classification_job)
Code to get the url to monitor your job
// get the studio url so you can monitor your job
aml_url = returned_job.studio.url
print("Monitor job here:", aml_url)The method you call when you want to set optional training properties on your AutoML job
set the training properties (optional) using the set_training method
classification_job.set_training(
blocked_training_algorithms=["LogisticRegression"],
enable_onnx_compatible_models=TrueThe above code blocks LogisticRegression from being used for training models and enables ONNX compatible model creation.
See set_training