Enables distributed TensorFlow training and inferencing on Apache Spark and Hadoop clusters with minimal code changes.
TensorFlowOnSpark is an open-source framework that enables distributed TensorFlow programs to run on Apache Spark and Hadoop clusters. It allows data scientists and engineers to perform scalable deep learning training and inferencing directly within their existing big data infrastructure, eliminating the need for separate machine learning clusters. The framework minimizes code changes required to adapt existing TensorFlow applications for distributed execution.
Data engineers and machine learning practitioners working in organizations with existing Hadoop/Spark clusters who need to run TensorFlow workloads at scale. It's particularly valuable for teams looking to integrate deep learning into their big data pipelines without managing separate infrastructure.
Developers choose TensorFlowOnSpark because it provides a seamless bridge between TensorFlow and big data ecosystems, allowing them to leverage existing cluster resources while maintaining full TensorFlow functionality. Its minimal code change requirement and support for flexible data ingestion make it practical for production deployments.
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Migrating existing TensorFlow programs requires fewer than 10 lines of code changes, as emphasized in key features, reducing development time for integration.
Supports all TensorFlow functionalities including synchronous/asynchronous training, model/data parallelism, inferencing, and TensorBoard, ensuring no feature loss in distributed environments.
Offers multiple data ingestion options: direct HDFS access via TensorFlow APIs or Spark RDD feeding through TFNode.DataFeed, accommodating various big data sources.
Enables direct server-to-server communication for faster distributed training when available, optimizing performance in cluster settings.
Does not support Windows due to an unresolved issue, restricting development and deployment options for teams on that OS.
TensorFlow 2.x breaks API compatibility, requiring specific versioning and adjustments, which complicates upgrades and maintenance for existing codebases.
Deployment on distributed clusters like YARN or AWS EC2 requires detailed environment-specific guides, adding initial configuration overhead and learning curve.