Showing 36 of 47 projects
A distributed event streaming platform for building high-performance data pipelines, streaming analytics, and data integration.
A high-performance, end-to-end observability data pipeline for collecting, transforming, and routing logs and metrics.
A high-performance, end-to-end observability data pipeline for collecting, transforming, and routing logs and metrics.
A curated list of awesome open-source libraries for deploying, monitoring, versioning, and scaling production machine learning systems.
A curated list of awesome open-source libraries for deploying, monitoring, versioning, and scaling production machine learning systems.
A plugin-driven agent for collecting, processing, aggregating, and writing metrics, logs, and arbitrary data.
A server-side data processing pipeline that ingests, transforms, and ships logs and events from multiple sources.
A curated list of awesome big data frameworks, resources, and tools across various categories.
An open-source log collector that unifies logging infrastructure by collecting events from various sources and routing them to multiple destinations.
A low-latency platform for change data capture (CDC) that streams row-level changes from databases to applications.
A practical booklet covering the four main steps of designing machine learning systems with 27 interview questions.
A high-performance, declarative stream processor that connects various sources and sinks with built-in data transformation capabilities.
A high-performance, resilient stream processor that connects various sources and sinks, performs data transformations, and guarantees at-least-once delivery.
An open-source ETL (Extract, Transform, Load) tool for data integration and migration.
A polyglot document intelligence framework with a Rust core for extracting text, metadata, and structured data from 91+ file formats.
Open-source customer data infrastructure that collects, validates, and enriches behavioral event data for AI and analytics.
Open-source data pipelines for cloud asset inventory, CSPM, FinOps, and vulnerability management across AWS, Azure, GCP, and 70+ sources.
Open-source data pipelines to sync cloud infrastructure metadata from AWS, Azure, GCP, and 70+ sources into your data warehouse.
An easy-to-use, powerful, and reliable system to process and distribute data across cybersecurity, observability, and AI pipelines.
A lightweight, non-JVM command-line tool for producing, consuming, and inspecting Apache Kafka messages.
A lightweight command-line tool for producing, consuming, and inspecting Apache Kafka messages, similar to netcat for Kafka.
A distributed data streaming engine with stateful stream processing for building responsive data-intensive applications.
An open-source, privacy-focused customer data platform (CDP) that collects, processes, and routes event data to warehouses and tools.
A lightweight, efficient, and fast high-level web crawling and scraping framework for .NET.
A fast, embeddable scripting language for Go applications, compiled to bytecode and executed on a stack-based VM.
A source-agnostic distributed change data capture system for reliably capturing and streaming primary data changes.
A CLI tool to copy data between any databases and platforms with a single command, no code required.
A local-first, single-binary workflow orchestration engine that runs declarative DAGs from laptop to distributed cluster.
An AI-native modular infrastructure for quantitative trading, featuring a weight-centric architecture for building, testing, and deploying algorithmic strategies.
A curated list of awesome streaming frameworks, applications, readings, and resources for stream processing.
Build concurrent, multi-stage data ingestion and processing pipelines with Elixir, supporting back-pressure, batching, and fault tolerance.
A Scala API for Apache Beam and Google Cloud Dataflow, enabling unified batch and streaming data processing.
A Kubernetes-native, serverless platform for running massively parallel data and streaming jobs with exactly-once semantics.
A single C++ binary SQL engine for high-performance stream processing, analytics, observability, and AI/ML pipelines.
A lightweight and efficient stream processing library for Go, providing a declarative DSL to build data pipelines.
Native integration library for using Elasticsearch with Hadoop, Spark, and Hive for real-time search and analytics on big data.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.