L BS DLG2 Ds
Creating a Data Asset: URI File supported paths
./<path to file>wasbs://<account>.blob.core.windows.net/<container>/<folder>/<file>abfss://<file_system>@<account>.dfs.core.windows.net/<folder>/<file>azureml://datastores/<name>/paths/<folder>/<file>Behavior when creating a Local Data Asset
A copy of the Local Data Asset is uploaded to the default datastore workspaceblobstore in the LocalUpload folder, making it available even when the local device is unavailable
The context for using an MLTable Data Asset
When the schema of your data is complex or frequently changes.
For MLTable Data Assets, you specify the schema definition for reading the data. So instead of changing how to read the data for each script that uses it, you only change the schema stored in the Data Asset itself.
(T/F):
- Certain Azure ML features like Automated ML require an MLTable Data Asset to understand how to read its data
- MLTable Schemas are stored in an Azure Blob, then pulled in by your job via parameter input
Describe what this code is doing:
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes
my_data = Data(
path='<supported-path>',
type=AssetTypes.URI_FILE,
description="<description>",
name="<name>",
version="<version>"
)
ml_client.data.create_or_update(my_data)Creates a URI_FILE Data Asset (the type parameter). Uses <supported-path> to represent a local device path
Describe three things that this code is doing and give an alternative for when the input is in JSON:
import argparse
import pandas as pd
parser = argparse.ArgumentParser()
parser.add_argument("--input_data", type=str)
args = parser.parse_args()
df = pd.read_csv(args.input_data)
print(df.head(10))--input_data to your URI FILE data assetpd.read_csvpd.read_json()Describe what this code is doing:
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes
my_data = Data(
path='<supported-path>',
type=AssetTypes.URI_FOLDER,
description="<description>",
name="<name>",
version='<version>'
)
ml_client.data.create_or_update(my_data)Describe what this code is doing:
import argparse
import glob
import pandas as pd
parser = argparse.ArgumentParser()
parser.add_argument("--input_data", type=str)
args = parser.parse_args()
data_path = args.input_data
all_files = glob.glob(data_path + "/*.csv")
df = pd.concat((pd.read_csv(f) for f in all_files), sort=False)--input_data--input_data to your URI FOLDER data assetglob all the csv files together with their target path to create a collection of themDescribe what this code is doing:
type: mltable
paths:
- pattern: ./*.txt
transformations:
- read_delimited:
delimiter: ','
encoding: ascii
header: all_files_same_headersCLI YAML for creating an MLTable; For all the .txt files in the current folder, read them as comma separated files encoded in ascii
Describe what this code is doing:
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes
my_data = Data(
path= '<path-including-mltable-file>',
type=AssetTypes.MLTABLE,
description="<description>",
name="<name>",
version='<version>'
)
ml_client.data.create_or_update(my_data)Describe what this code is doing:
import argparse
import mltable
import pandas
parser = argparse.ArgumentParser()
parser.add_argument("--input_data", type=str)
args = parser.parse_args()
tbl = mltable.load(args.input_data)
df = tbl.to_pandas_dataframe()
print(df.head(10))mltable.load then converts it to a Pandas DataFrame (a common conversion approach).