An open-source, in-memory, distributed batch and stream processing engine for Java applications.
Hazelcast Jet is a distributed stream and batch processing engine built for high-performance data pipelines. It processes large volumes of real-time events or static datasets with predictable low latency, automatically scaling across a cluster. Developers use its Java API and dataflow programming model to build applications that aggregate, transform, and analyze data from various sources.
Java developers and data engineers building high-throughput, low-latency data processing applications that need to scale across multiple nodes. It's particularly suited for teams requiring exactly-once processing guarantees without complex infrastructure.
Jet offers a unique combination of sub-10ms latency at millions of events per second, built-in fault tolerance without external dependencies, and automatic cluster scaling. Unlike some alternatives, it provides exactly-once processing guarantees using an in-memory storage approach rather than requiring distributed file systems.
Distributed Stream and Batch Processing
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Achieves predictable low latency under load, processing millions of events per second on a single node with cooperative multithreading, as benchmarked in the README.
Provides exactly-once processing guarantees without external dependencies like Zookeeper, using distributed in-memory storage and the Chandy-Lamport algorithm, as highlighted in the features.
Applications scale up or down automatically as nodes are added or removed, preserving computational state and ensuring continuous processing, as described in the architecture.
First-class support for event time processing with distributed watermarks, effectively managing out-of-order event data, which is crucial for real-time streams.
Out-of-the-box support for Kafka, Hadoop, JDBC, Elasticsearch, and more, simplifying integration with common data sources and sinks.
Limited to Java development, which may not suit teams using other programming languages or seeking a polyglot data processing framework, as evidenced by the API examples.
Requires manual setup and maintenance of the Jet cluster, including deployment and scaling, unlike managed services that handle infrastructure automatically.
Relies on in-memory storage for state, which can be costly or impractical for datasets that exceed available RAM, limiting use for extremely large-scale batch jobs.
With development moved to the core Hazelcast repository, the standalone Jet version might face reduced updates or breaking changes, as noted in the README, potentially affecting long-term stability.
Hazelcast Jet is an open-source alternative to the following products:
Apache Spark is an open-source unified analytics engine for large-scale data processing, providing high-level APIs in Java, Scala, Python, and R.
Apache Storm is a distributed real-time computation system for processing large volumes of high-velocity data, commonly used for stream processing.