Apache Spark is an open-source unified analytics engine for large-scale data processing, providing high-level APIs in Java, Scala, Python, and R.. There are currently 5 open-source alternatives to Apache Spark, with a combined total of 74.2k GitHub stars. The most common language among these projects is Python.
Showing 5 open-source alternatives
A Python ETL framework for stream processing, real-time analytics, and building live LLM/RAG pipelines, powered by a scalable Rust engine.
A high-performance distributed map/reduce system with DAG execution, written in Go, supporting standalone or distributed modes.
A distributed computation system written in Go for parallel and cluster processing, similar to Hadoop MapReduce and Spark.
A distributed query execution engine that extends Apache DataFusion to run SQL queries in parallel across multiple nodes.
A Python framework and Rust-based distributed processing engine for stateful event and stream processing.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.