The "Awesome Streaming" project is a curated collection of resources focused on streaming technologies, which enable the real-time processing and distribution of data. This list encompasses a variety of categories including frameworks, libraries, tools, tutorials, and community resources that cater to different streaming protocols and architectures. It is beneficial for developers, data engineers, and researchers who are looking to implement or enhance streaming solutions in their applications. With a wealth of information and tools at your disposal, users can explore innovative ways to manage and analyze streaming data effectively.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
The "Awesome Public Datasets" project is a curated collection of publicly available datasets across various domains, including government, healthcare, finance, and social sciences. This list features datasets in multiple formats, along with links to tools and platforms that facilitate data analysis and visualization. It is an invaluable resource for researchers, data scientists, and students looking to access high-quality data for their projects or studies. By providing a wide array of datasets, this collection empowers users to explore, analyze, and derive insights from real-world data. Dive in to discover the wealth of information available for your next data-driven endeavor!
The "Awesome Big Data" project is a curated collection of resources focused on big data technologies and practices that enable the processing and analysis of vast amounts of data. This list encompasses a variety of categories, including frameworks, tools, libraries, databases, and tutorials that cater to both beginners and experienced data professionals. Users can explore resources related to data storage, processing, analytics, and visualization, making it an invaluable asset for data scientists, engineers, and researchers. Whether you're looking to enhance your big data skills or find the right tools for your projects, this collection provides a comprehensive guide to navigating the big data landscape.
The "Awesome Data Engineering" project is a curated collection of resources aimed at supporting professionals in the field of data engineering, which involves the design and construction of systems for collecting, storing, and analyzing data. This list encompasses a variety of categories, including data pipelines, ETL tools, data warehousing solutions, frameworks, and best practices, as well as tutorials and community resources. Whether you are a beginner looking to understand the fundamentals or an experienced engineer seeking advanced techniques, this list offers valuable insights and tools to enhance your data engineering projects. Dive into this collection to discover the tools and methodologies that can streamline your data workflows and improve your data management capabilities.
The "Awesome Network Analysis" project is a curated collection of resources focused on the study and analysis of networks, which are structures made up of interconnected elements. This list encompasses a variety of tools, libraries, datasets, and tutorials that facilitate the exploration of network theory, graph analysis, and visualization techniques. It serves as a valuable resource for researchers, data scientists, and enthusiasts interested in understanding complex systems, social networks, and data relationships. Whether you are a beginner looking to grasp the basics or an experienced analyst seeking advanced methodologies, this collection provides essential tools and insights to enhance your network analysis projects.
A unified platform for big data stream and batch processing on Hadoop YARN with enterprise-grade operability.
A distributed query execution engine that extends Apache DataFusion to run SQL queries in parallel across multiple nodes.
Apache Heron is a real-time, distributed, fault-tolerant stream processing engine developed by Twitter.
A distributed stream processing framework built on Apache Kafka and Apache Hadoop YARN for fault-tolerant, stateful processing.
A high-performance Rust stream processing engine with integrated AI capabilities for real-time data processing and intelligent analysis.
A distributed stream processing engine in Rust that performs stateful computations on real-time data with subsecond results.
SQL-based streaming analytics platform that scales to process hundreds of billions of real-time events daily.
A Python framework and Rust-based distributed processing engine for stateful event and stream processing.
An ultra-performant data transformation framework for AI, with incremental processing and data lineage built-in.
A lightweight real-time big data streaming engine built on Akka for high-throughput, low-latency data processing.
An open-source, in-memory, distributed batch and stream processing engine for Java applications.
A distributed stream processing system written in Haskell that guarantees exactly-once semantics.
A platform for building realtime, cost-effective, operations-focused applications.
A MapReduce-style framework for processing fast/streaming data, implementing the MapUpdate model.
An end-to-end data management system for IoT, optimizing stream processing across cloud, edge, and sensor deployments.
A Kubernetes-native, serverless platform for running massively parallel data and streaming jobs with exactly-once semantics.
A masterless, cloud-scale, fault-tolerant distributed computation system for batch and stream processing written in Clojure.
A Python ETL framework for stream processing, real-time analytics, and building live LLM/RAG pipelines, powered by a scalable Rust engine.
A runtime supervisor for deploying and running data processing programs called Sequences on Linux servers, Docker, and Kubernetes clusters.
An open-source real-time stream processing framework combining high-throughput event processing with low-latency SQL-like streaming queries.
A high-performance one-pass in-memory streaming analytics engine for temporal and streaming data.
A fast, resilient distributed stream processing framework that simplifies real-time data applications with high performance and easy scaling.
A multi-core stream processing engine for high-throughput window aggregation with optional exactly-once fault tolerance.
An open-source, cloud-native streaming database designed for real-time data processing and IoT applications.
A lightweight IoT data analytics and stream processing engine for resource-constrained edge devices.
An enterprise-grade event streaming platform that ingests, processes, and manages real-time event data with PostgreSQL compatibility and Apache Iceberg™ integration.
A distributed event streaming platform for building high-performance data pipelines, streaming analytics, and data integration.
A .NET stream processing library for Apache Kafka, providing a Kafka Streams-like API for building real-time applications.
A platform for building highly responsive, resilient, and scalable distributed systems using the actor model.
A high-performance, resilient stream processor that connects various sources and sinks, performs data transformations, and guarantees at-least-once delivery.
A purely functional, effectful, and polymorphic stream processing library for Scala built on Cats and Cats-Effect.
An asynchronous Python framework for building services that interact with Apache Kafka, RabbitMQ, NATS, and Redis event streams.
A high-performance Scala library for composing asynchronous, event-based programs with strong functional programming influences.
A Python framework for building real-time data pipelines and event-driven microservices on Apache Kafka using a Streaming DataFrame API.
A visual development platform for building, deploying, and managing streaming analytics applications with multiple engine bindings.
A serverless toolkit for routing, normalizing, and enriching security event and audit logs in AWS.
A Python library for constructing reactive dataflow graphs and streaming computations as data models.
An open-source LLM function calling framework for building scalable, low-latency AI agents with geo-distributed edge infrastructure.
Cross-platform framework for building customizable on-device machine learning pipelines for live and streaming media.
A scalable real-time search platform for streaming data using Apache Storm, Kafka, and Lucene.
A scalable, mature, and versatile web crawler built on Apache Storm for building low-latency, distributed crawling systems.
A stateless, multi-protocol proxy that bridges web apps, IoT devices, and microservices directly to Apache Kafka via declarative APIs.
A lightweight stream processing engine designed specifically for IoT data processing and analytics.
An open source programming model and runtime for analyzing data and events on edge devices, reducing data transmission and storage costs.
A self-service IoT toolbox enabling non-technical users to connect, analyze, and explore industrial IoT data streams.
A Java/.NET component for complex event processing (CEP), streaming SQL, and event series analysis.
A library for writing MapReduce programs that execute on distributed platforms like Storm and Scalding using Scala/Java collection-like syntax.