The "Awesome Streaming" project is a curated collection of resources focused on streaming technologies, which enable the real-time processing and distribution of data. This list encompasses a variety of categories including frameworks, libraries, tools, tutorials, and community resources that cater to different streaming protocols and architectures. It is beneficial for developers, data engineers, and researchers who are looking to implement or enhance streaming solutions in their applications. With a wealth of information and tools at your disposal, users can explore innovative ways to manage and analyze streaming data effectively.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
The "Awesome Public Datasets" project is a curated collection of publicly available datasets across various domains, including government, healthcare, finance, and social sciences. This list features datasets in multiple formats, along with links to tools and platforms that facilitate data analysis and visualization. It is an invaluable resource for researchers, data scientists, and students looking to access high-quality data for their projects or studies. By providing a wide array of datasets, this collection empowers users to explore, analyze, and derive insights from real-world data. Dive in to discover the wealth of information available for your next data-driven endeavor!
The "Awesome Big Data" project is a curated collection of resources focused on big data technologies and practices that enable the processing and analysis of vast amounts of data. This list encompasses a variety of categories, including frameworks, tools, libraries, databases, and tutorials that cater to both beginners and experienced data professionals. Users can explore resources related to data storage, processing, analytics, and visualization, making it an invaluable asset for data scientists, engineers, and researchers. Whether you're looking to enhance your big data skills or find the right tools for your projects, this collection provides a comprehensive guide to navigating the big data landscape.
The "Awesome Data Engineering" project is a curated collection of resources aimed at supporting professionals in the field of data engineering, which involves the design and construction of systems for collecting, storing, and analyzing data. This list encompasses a variety of categories, including data pipelines, ETL tools, data warehousing solutions, frameworks, and best practices, as well as tutorials and community resources. Whether you are a beginner looking to understand the fundamentals or an experienced engineer seeking advanced techniques, this list offers valuable insights and tools to enhance your data engineering projects. Dive into this collection to discover the tools and methodologies that can streamline your data workflows and improve your data management capabilities.
The "Awesome Network Analysis" project is a curated collection of resources focused on the study and analysis of networks, which are structures made up of interconnected elements. This list encompasses a variety of tools, libraries, datasets, and tutorials that facilitate the exploration of network theory, graph analysis, and visualization techniques. It serves as a valuable resource for researchers, data scientists, and enthusiasts interested in understanding complex systems, social networks, and data relationships. Whether you are a beginner looking to grasp the basics or an experienced analyst seeking advanced methodologies, this collection provides essential tools and insights to enhance your network analysis projects.
A distributed query execution engine that extends Apache DataFusion to run SQL queries in parallel across multiple nodes.
Apache Heron is a real-time, distributed, fault-tolerant stream processing engine developed by Twitter.
A distributed stream processing engine in Rust that performs stateful computations on real-time data with subsecond results.
A Python framework and Rust-based distributed processing engine for stateful event and stream processing.
An ultra-performant data transformation framework for AI, with incremental processing and data lineage built-in.
A Kubernetes-native, serverless platform for running massively parallel data and streaming jobs with exactly-once semantics.
A masterless, cloud-scale, fault-tolerant distributed computation system for batch and stream processing written in Clojure.
A Python ETL framework for stream processing, real-time analytics, and building live LLM/RAG pipelines, powered by a scalable Rust engine.
A lightweight IoT data analytics and stream processing engine for resource-constrained edge devices.
An enterprise-grade event streaming platform that ingests, processes, and manages real-time event data with PostgreSQL compatibility and Apache Iceberg™ integration.
A distributed event streaming platform for building high-performance data pipelines, streaming analytics, and data integration.
A platform for building highly responsive, resilient, and scalable distributed systems using the actor model.
A high-performance, resilient stream processor that connects various sources and sinks, performs data transformations, and guarantees at-least-once delivery.
A purely functional, effectful, and polymorphic stream processing library for Scala built on Cats and Cats-Effect.
An asynchronous Python framework for building services that interact with Apache Kafka, RabbitMQ, NATS, and Redis event streams.
A high-performance Scala library for composing asynchronous, event-based programs with strong functional programming influences.
An open-source LLM function calling framework for building scalable, low-latency AI agents with geo-distributed edge infrastructure.
Cross-platform framework for building customizable on-device machine learning pipelines for live and streaming media.