A unified platform for big data stream and batch processing on Hadoop YARN with enterprise-grade operability.
Apache Apex is a unified big data processing platform that handles both stream and batch workloads natively on Hadoop YARN. It provides enterprise-grade operability features like fault tolerance and state management while simplifying development of applications for ingestion, ETL, real-time analytics, and alerts. The platform uses HDFS by default and offers high performance through in-memory processing and scalability.
Big data engineers and developers building production-grade stream and batch processing applications on Hadoop infrastructure, particularly those needing enterprise reliability features and YARN integration.
Developers choose Apache Apex for its unified approach to stream and batch processing, native Hadoop YARN integration, and enterprise-grade features like fault tolerance and no data loss that simplify building reliable big data applications.
Mirror of Apache Apex core
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Handles both stream and batch workloads in a single framework, simplifying architecture and reducing operational overhead, as highlighted in its unified platform approach.
Built-in fault tolerance, state management, and event processing guarantees ensure no data loss and production-grade operability, key for use cases like ingestion and real-time analytics.
Runs natively on YARN and uses HDFS by default, making it a first-class application in Hadoop infrastructures, reducing integration complexity.
Supports high-throughput processing with in-memory computations and horizontal scaling on YARN, enabling real-time analytics and alerts.
Designed specifically for YARN and HDFS, limiting portability and flexibility for non-Hadoop or modern cloud-native environments, as it lacks out-of-the-box support for alternatives.
Requires configuration and management of Hadoop clusters, including YARN and HDFS, adding overhead compared to more lightweight or managed streaming platforms.
Smaller community and ecosystem mean fewer ready-to-use connectors and integrations, relying heavily on the Malhar library for building blocks, which may not cover all use cases.