A masterless, cloud-scale, fault-tolerant distributed computation system for batch and stream processing written in Clojure.
Onyx is a distributed computation system built in Clojure that handles both batch and stream processing workloads. It provides a masterless, fault-tolerant architecture for building scalable data pipelines and workflows. The system solves the problem of processing large volumes of data reliably across distributed clusters without a single point of failure.
Data engineers and Clojure developers building scalable data processing pipelines, ETL systems, or real-time stream processing applications.
Developers choose Onyx for its masterless architecture that eliminates coordination bottlenecks, its hybrid processing model that unifies batch and stream workloads, and its native Clojure implementation that offers expressive workflow definitions.
Distributed, masterless, high performance, fault tolerant data processing
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Eliminates single points of failure with a horizontally scalable architecture, ensuring high availability and resilience in cloud environments as described in its features.
Unifies batch and stream processing in one framework, simplifying pipeline design for diverse workloads like real-time event processing and ETL.
Leverages Clojure's functional programming strengths to provide a declarative information model for workflow construction, offering a cohesive API as per the philosophy.
Supports official plugins for Kafka, Datomic, SQL, and Amazon services, with a template for custom integrations, enhancing adaptability to various data sources.
Being written in pure Clojure, it requires team proficiency in Clojure, limiting adoption in polyglot environments and increasing the learning curve for non-Clojure developers.
Has a smaller community and fewer third-party plugins compared to giants like Spark or Flink, with some plugins unsupported in the latest version per the README.
Masterless architecture demands careful cluster configuration and management, increasing DevOps burden compared to more managed alternatives.
Onyx is an open-source alternative to the following products:
Apache Flink is an open-source, distributed stream processing framework for stateful computations over data streams, designed for high performance and low latency.
Cascading is a Java-based abstraction layer for building data processing applications on Apache Hadoop, providing a higher-level API for defining complex data workflows.
Map/Reduce is a programming model and software framework for processing large datasets in parallel across distributed clusters, popularized by Google and implemented in Hadoop.
Storm is a distributed real-time computation system for processing large streams of data, designed to be fast, scalable, and fault-tolerant.
Spark is an email client application for macOS, iOS, and Android developed by Readdle, featuring smart inbox organization, email scheduling, and team collaboration features.
Cascalog is a Clojure-based query language for processing data on Hadoop, offering a declarative syntax for batch processing and analytics similar to SQL or Datalog.