An open-source machine learning system for the end-to-end data science lifecycle from data preparation to model serving.
Apache SystemDS is an open-source machine learning system that supports the entire data science lifecycle from data preparation and cleaning to model training, debugging, and serving. It allows users to specify ML algorithms in a high-level language with R-like syntax or through Python and Java APIs, while automatically generating optimized runtime plans for local, distributed, or GPU-based execution.
Data scientists, ML engineers, and researchers who need a unified system for developing, optimizing, and deploying machine learning pipelines across different computational environments.
Developers choose SystemDS for its automatic optimization of ML workflows across multiple backends (including Spark and GPUs), its support for declarative programming, and its comprehensive coverage of the end-to-end data science lifecycle in a single open-source platform.
An open source ML system for the end-to-end data science lifecycle
Supports R-like syntax and Python/Java APIs with built-in ML primitives, making it accessible for data scientists familiar with these languages, as highlighted in the README.
Automatically generates hybrid runtime plans combining local and distributed operations on Apache Spark, optimizing execution without manual tuning, per the project's key features.
Includes backends for GPUs and federated learning, providing flexibility for high-performance computing and privacy-preserving scenarios, as noted in the documentation.
Unifies data preparation, model training, debugging, and serving in one system, streamlining the ML workflow from start to finish, based on the overview.
Requires building from source or managing dependencies for multiple backends like Spark and GPUs, which can be cumbersome compared to pip-install frameworks like scikit-learn.
Has a smaller user base and fewer pre-built models or integrations than mainstream ML frameworks, limiting out-of-the-box functionality and community support.
Automatic optimization and distributed backends may introduce unnecessary overhead for small-scale or straightforward ML operations, making it less efficient than lightweight libraries.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Deep Learning for humans
Streamlit — A faster way to build and share data apps.
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.