Showing 20 of 20 projects
A high-performance, S3-compatible distributed object storage system built in Rust, optimized for data lakes and AI workloads.
An enterprise distributed database ecosystem that enhances heterogeneous databases with sharding, scalability, and security via JDBC and Proxy access layers.
A curated list of awesome big data frameworks, resources, and tools across various categories.
A curated list of awesome big data frameworks, resources, and tools across various categories.
An open-source enterprise data warehouse built in Rust for AI agents, analytics, vector search, and full-text search.
A high-performance Python DataFrame library for lazy out-of-core processing and visualization of billion-row datasets at interactive speeds.
An open data lakehouse platform for incremental data processing with upserts, deletes, and time-travel queries.
.NET for Apache Spark provides high-performance .NET APIs for Apache Spark, enabling C# and F# developers to work with structured and streaming data.
An easy-to-use, self-hosted SQL reporting application for creating interactive business intelligence dashboards.
A federated Big Data orchestration service that simplifies job execution across distributed clusters by abstracting infrastructure complexity.
A Python library for agile data preparation workflows that works with Pandas, Dask, cuDF, Dask-cuDF, Vaex, and PySpark.
A Kubernetes batch scheduler for high-performance workloads like AI/ML, BigData, and HPC.
A REST interface for interacting with Apache Spark from anywhere, enabling remote code execution and job submissions.
A lightweight real-time big data streaming engine built on Akka for high-throughput, low-latency data processing.
A fast, open-source platform for topic modeling using Additive Regularization of Topic Models (ARTM).
Kotlin bindings and extensions for Apache Spark, enabling idiomatic Kotlin development with data classes, lambdas, and null safety.
A command-line interface for AWS Athena with auto-completion and syntax highlighting.
A collection of robust and fast Python tools for parsing, extracting, and analyzing web archive data, including a high-performance WARC parser.
An end-to-end data management system for IoT, optimizing stream processing across cloud, edge, and sensor deployments.
A collection of interactive Jupyter notebooks for learning Hadoop, Spark, and MapReduce with hands-on tutorials and demos.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.