An in-memory machine learning library for Scala with a scikit-learn-like API, built on Breeze for parallel and distributed systems.
doddle-model is a Scala-based machine learning library designed for in-memory processing, built on top of the Breeze numerical library. It provides a lightweight alternative to heavier frameworks like Spark ML, focusing on immutability and ease of use in parallel and distributed environments. The library exposes functionality through a scikit-learn-like API implemented with typeclasses for idiomatic Scala usage.
Scala developers building machine learning applications that require in-memory processing, especially those working in concurrent or distributed systems using frameworks like Akka or Apache Beam. It is suitable for projects where datasets fit into RAM and immutability is a priority.
Developers choose doddle-model for its lightweight, immutable design that simplifies parallel programming and its familiar scikit-learn-inspired API tailored for Scala. It offers deployment flexibility, allowing fitted models to be used anywhere from standalone apps to distributed systems, without the overhead of larger frameworks.
:cake: doddle-model: machine learning in Scala.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Built on Breeze for high-performance linear algebra and numerical computations, with optional native libraries providing significant speed boosts.
Estimators are immutable, making them safe and straightforward to use in parallel code without side effects, ideal for concurrent systems like Akka.
Exposes a scikit-learn-like API implemented with Scala typeclasses, easing adoption for those already comfortable with Python's scikit-learn.
Fitted models can be deployed anywhere from standalone applications to distributed systems, avoiding the overhead of heavier frameworks like Spark ML.
Training is entirely in-memory, limiting use to datasets that fit into RAM and potentially causing OutOfMemoryErrors, as admitted in the performance documentation.
Limited to Scala projects, making it less suitable for teams using other languages or requiring the extensive tooling of Python-based ML libraries.
Compared to scikit-learn or Spark ML, it has a smaller community, fewer built-in algorithms, and less battle-tested in production environments.
Optimal performance may require additional setup like breeze-natives, adding complexity compared to drop-in solutions with pre-optimized defaults.
doddle-model is an open-source alternative to the following products: