Elephas is a Keras extension for distributed deep learning on Apache Spark, enabling data-parallel training at scale.
Elephas is a Python library that extends Keras to enable distributed deep learning on Apache Spark. It allows users to train Keras models in a data-parallel fashion across Spark clusters, making it possible to scale deep learning workflows to large datasets. The library integrates seamlessly with Spark's RDDs, DataFrames, and MLlib, providing a bridge between Keras' ease of use and Spark's distributed computing power.
Data scientists and machine learning engineers who use Keras for deep learning and need to scale their training to large datasets using Apache Spark clusters. It is also suitable for teams already invested in the Spark ecosystem looking to incorporate deep learning workflows.
Elephas offers a straightforward way to distribute Keras model training without leaving the familiar Keras API, reducing the complexity of implementing distributed deep learning. It leverages Spark's robust distributed data processing capabilities, making it a practical choice for organizations with existing Spark infrastructure.
Distributed Deep learning with Keras & Spark
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Elephas preserves the familiar Keras API, allowing users to define and compile models as usual, then distribute training with minimal code changes, as shown in the basic Spark integration example.
Integrates with Spark RDDs, DataFrames, and MLlib, enabling deep learning within existing Spark pipelines, such as using the ElephasEstimator for Spark ML workflows.
Supports data-parallel training and distributed prediction/evaluation on large datasets, leveraging Spark's cluster capabilities for scalable deep learning without leaving the Keras framework.
Facilitates fast iteration on distributed models by maintaining Keras' simplicity, ideal for experimenting with large-scale deep learning in Spark environments.
Elephas only supports data-parallel algorithms, not model-parallelism, which restricts its use for complex models that need to be split across workers, as acknowledged in the README's discussion section.
Requires a Spark cluster setup and can suffer from driver memory bottlenecks, with the README noting that increasing driver memory is necessary for large models, adding operational complexity.
Hyper-parameter optimization features were removed in version 3.0.0, and maintenance has moved to a new repository, which may lead to instability or slower updates for users relying on these capabilities.