Three Processes we combine when discussing MLOps
Describe CI/CD w.r.t. MLOps
It is preferrable that Training Data is not stored in Source Control. Store it in a database or data lake and retrieve it directly from the source via a Data Asset (T/F)
FALSE. Retrieval should be through a Datastore since the Datastore will store all credentials for connecting to the storage source.
Otherwise TRUE you should NOT store your training data in Source Control.
BRP IRA
DevOps tools in Azure DevOps and GitHub and what we get out of them
Azure DevOps:
* Azure Boards - sprint planning and tracking boards
* Azure Repos - code repositories
* Azure Pipelines - CI/CD
GitHub:
* Issues - sprint planning and trackgin boards
* Repos
* Actions - automated workflows
c ad/gh s d p t n
Best Practices for MLOps Repository Structure (what to put in which folders)
.cloud: cloud-specific code (ARM templates, etc)
.ad/.github: Azure DevOps or GitHub artifacts like YAML pipelines to automate workflows.
src: contains any code (Python) used for ML workloads
docs: Markdown files or other documentation describing the project.
pipelines: Azure Machine Learning pipelines definitions.
tests: contains unit and integration tests
notebooks: contains notebooks, mostly used for experimentation.
What to use:
- for building Azure ML Pipeline templates
- for Local Development
Python SDK or YAML + CLI, and VS Code respectively
non func mod unit wkfl
Steps for refactoring Notebooks to production Python code
Three ways to create an Azure ML Pipeline (aka Pipeline Job)
What YAML would go into pipeline-job.yml
For the pipeline-job.yml approach you’d define:
- type: pipeline
- jobs for transform-job and train-job