Showing 36 of 99 projects
A fully asynchronous, futures-based Apache Kafka client library for Rust built on librdkafka.
A fault-tolerant service that persists Kafka log data to cloud storage like S3, GCS, Azure Blob Storage, and OpenStack Swift.
A parallel bulk data loader that transfers data between various storages, databases, NoSQL, and cloud services via plugins.
A Ruby framework for writing reliable, concise, and maintainable ETL (Extract-Transform-Load) data processing jobs.
A federated Big Data orchestration service that simplifies job execution across distributed clusters by abstracting infrastructure complexity.
A command-line utility for processing JSON and JavaScript data, inspired by Perl and Unix tools like sed and awk.
A Java library for declarative JSON-to-JSON transformations using JSON-based specifications.
An open-source Reverse ETL platform for syncing data from warehouses to business tools like Salesforce, HubSpot, and Slack.
A unified data pipeline tool for ingestion, transformation with SQL/Python/R, and data quality checks across major platforms.
A library enabling MongoDB to serve as input source or output destination for Hadoop MapReduce tasks and ecosystem tools.
A high-performance CSV ingestion and generation library for Ruby with C acceleration, designed for real-world data with intelligent defaults.
A Reactive Streams connector for Apache Kafka built on Akka Streams, enabling back-pressured integration for Java and Scala.
Train neural networks with OpenStreetMap data and satellite imagery to classify roads and map features.
A collection of pre-built Google Cloud Dataflow templates for common data import/export, backup, and bulk API operations.
A high-performance Rust stream processing engine with integrated AI capabilities for real-time data processing and intelligent analysis.
An open-source, in-memory, distributed batch and stream processing engine for Java applications.
Run lambda functions over S3 objects with concurrency control for data pipelining and analytics.
Run lambda functions over S3 objects with concurrency control for data pipelining and analytics.
An R package providing the %>% pipe operator to improve code readability by structuring data operations left-to-right.
A scalable n:m message multiplexer written in Go for routing messages from multiple sources to multiple destinations.
LinkedIn's previous generation Kafka to HDFS pipeline for batch data ingestion.
A fast, secure, and standalone log collector written in Rust that parses, validates, and forwards log data.
A distributed data pipeline service for collecting, aggregating, and dispatching large volumes of application events and log data.
A data pipeline engine for security teams to collect, transform, enrich, and route telemetry data at scale.
A simple, lightweight batch processing framework for Java designed for ETL jobs.
A .NET stream processing library for Apache Kafka, providing a Kafka Streams-like API for building real-time applications.
Sample AWS Data Pipeline templates for automating data movement and transformation workflows.
A visualization framework for Apache Pig workflows that combines graphical depictions with real-time execution information.
A Python library for constructing reactive dataflow graphs and streaming computations as data models.
A unified data replication platform for TiDB, providing MySQL/MariaDB migration and change data capture to downstream systems.
A serverless toolkit for routing, normalizing, and enriching security event and audit logs in AWS.
A RESTful engine for orchestrating sequential Docker container workflows, marshaling data between steps.
A Python interface to the Amazon Kinesis Client Library for building distributed applications that process streaming data reliably at scale.
A Java library for building data pipelines that connect Amazon Kinesis streams to AWS and non-AWS services like DynamoDB, Redshift, S3, and Elasticsearch.
A Kotlin library for extracting path-based code representations and ASTs from multiple languages to prepare code for machine learning models.
A curated reference hub of tools and real-world examples for designing effective threat detection and response pipelines.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.