An open-source CLI tool for implementing CI/CD workflows with a focus on MLOps, automating ML experiments and reporting.
Continuous Machine Learning (CML) is an open-source CLI tool that brings continuous integration and delivery (CI/CD) practices to machine learning projects. It automates key MLOps workflows such as model training, evaluation, and reporting directly within Git-based platforms like GitHub, GitLab, and Bitbucket. CML generates visual reports with metrics and plots automatically in pull requests, helping teams make data-driven decisions.
Machine learning engineers and data science teams who use Git-based platforms (GitHub, GitLab, Bitbucket) for version control and want to automate and standardize their ML experimentation, training, and reporting workflows. It is particularly useful for teams practicing MLOps and seeking to integrate rigorous engineering practices into their ML development.
Developers choose CML because it enables building custom ML platforms using existing Git and cloud infrastructure without requiring additional databases or services. Its unique selling point is the seamless automation of ML reporting within CI/CD pipelines, combined with native integration with DVC for data versioning and the ability to provision cloud runners (AWS, Azure, GCP, Kubernetes) for compute-intensive tasks directly from workflows.
♾️ CML - Continuous Machine Learning | CI/CD for ML
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Generates visual reports with metrics and plots automatically in pull requests, as shown in the example where metrics.txt and plot.png are appended to a markdown file and posted via cml comment create.
Manages ML experiments using Git branching strategies, promoting rigorous engineering practices for data-driven decisions within existing Git workflows, as emphasized in the CML principles.
Launches cloud instances (AWS, Azure, GCP, Kubernetes) for compute-intensive tasks directly from CI/CD pipelines, with support for spot instances and automatic shutdown, demonstrated in the Train-in-the-cloud workflow.
Builds ML platforms using existing Git and cloud infrastructure without requiring extra databases or services, reducing vendor lock-in, as stated in the philosophy section.
Requires configuration of personal access tokens, cloud credentials, and environment variables, which can be error-prone and adds security management burden, especially for teams new to CI/CD.
Advanced features like data versioning and experiment comparison rely heavily on DVC, adding another layer to the stack that teams must install and maintain, as seen in the DVC-integrated workflow examples.
As a CLI tool, it lacks built-in GUIs for monitoring workflows or visualizing results, which might not suit teams accustomed to dashboard-based platforms like MLflow or Kubeflow.