Showing 36 of 76 projects
A curated list of amazingly awesome Hadoop and Hadoop ecosystem resources.
An open-source workflow management system for bioinformatics that scales from one-off use cases to massive production environments.
A distributed stream processing framework built on Apache Kafka and Apache Hadoop YARN for fault-tolerant, stateful processing.
A header-only C++14 library providing push-based pipelines for expressive collection processing with operators like filter, transform, and fork.
A Python library that brings R's dplyr data manipulation syntax to pandas DataFrames using a pipe operator.
A self-service IoT toolbox enabling non-technical users to connect, analyze, and explore industrial IoT data streams.
A pure Go task-parallel programming framework with integrated visualizer and profiler for managing complex task dependencies.
A Clojure library for writing map-reduce queries that compile to Apache Pig or Cascading, enabling distributed data processing with Clojure syntax.
A .NET stream processing library for Apache Kafka, providing a Kafka Streams-like API for building real-time applications.
An open-source MLOps framework for defining and deploying machine learning and LLM workloads across any cloud infrastructure.
A clean and powerful Haskell stream processing library for building and connecting reusable streaming components.
A Python data validation toolkit that finds data quality issues and generates beautiful, shareable reports for team collaboration.
A Clojure framework for building stateless stream processing applications on Kafka with built-in retry mechanisms.
A robust, high-performance JSON parser and generator for R, optimized for statistical data and web APIs.
A Ruby gem providing TensorFlow bindings for basic tensor operations and machine learning tasks.
A distributed, scalable database built for stream processing applications on Apache Kafka using SQL syntax.
A Go library providing efficient, parallel, lazy map, reduce, filter, and other functional operations on sequences with built-in error handling.
A stream processing tool with a web interface for building and monitoring Apache Storm workflows using drag-and-drop components.
A Python toolkit for developing, testing, and managing Apache Storm streaming data processing topologies.
A Python wrapper for Cascading that enables building and controlling Hadoop data processing workflows entirely in Python.
A Scala and JVM machine learning toolbox for research, education, and industry with an interactive REPL and end-to-end pipelines.
A DataOps-friendly data quality monitoring platform with customizable checks, dashboards, and incident management for multiple data sources.
A Python library implementing CSP-style concurrency with channels, inspired by Go and Clojure's core.async.
A type-safe functional Stream processing library for Go, inspired by the Java Streams API.
A blazingly fast, highly scalable graph-based stream processing framework for latency-critical applications like electronic trading and real-time AI.
Run Jupyter notebooks as REST API endpoints, enabling programmatic execution of notebook workflows.
Legacy Java-based model-driven tool for generating, anonymizing, and migrating test data for development and testing.
A lightweight, typed Actor library for Scala and Scala.js to build concurrent data pipelines and state machines.
A Python tool for validating data using JSON Schema and converting schemas into data-interchange formats like Avro.
A Ruby interface for the Amazon Kinesis Client Library, enabling developers to build robust streaming data applications.
A Rust DataFrame and data engineering library with PySpark/SQL-like syntax, built for business data pipelines with Microsoft stack integration.
A Go library for writing Storm spouts and bolts that communicate with Storm shells via the multilang protocol.
A Java library for enriching, transforming, and filtering JSON documents using configurable pipelines.
A PMML evaluator library for Apache Spark that provides ML-compatible transformers for deploying predictive models.
A distributed stream processing system written in Haskell that guarantees exactly-once semantics.
A TypeScript/JavaScript library providing Python-inspired iteration utilities for working with iterables, streams, and pipes.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.