Showing 17 of 17 projects
An extremely fast query engine for DataFrames, written in Rust, with multi-language frontends.
A Python library that enables conversational data analysis on SQL, CSV, and parquet files using LLMs and RAG.
A drop-in replacement for pandas that scales data analysis workflows to use all CPU cores and handle out-of-memory datasets.
A GPU-accelerated DataFrame library for tabular data processing, part of the RAPIDS data science suite.
An extensible SQL query engine written in Rust, using Apache Arrow as its in-memory format for building fast database and analytic systems.
A high-performance Python DataFrame library for lazy out-of-core processing and visualization of billion-row datasets at interactive speeds.
A flexible and expressive API for performing statistical data validation on dataframe-like objects.
A high-performance R package for fast data manipulation of large datasets, extending data.frame with concise syntax and memory efficiency.
A Java dataframe and visualization library for data loading, cleaning, transformation, and analysis.
Koalas provides the pandas DataFrame API on Apache Spark, enabling data scientists to work with big data using familiar pandas syntax.
A Go library providing DataFrames, Series, and data wrangling operations for tabular data manipulation.
A Python package that automatically accelerates pandas and Modin DataFrame apply operations by choosing the fastest available method.
A lightweight Python library for creating portable, expressive, and testable data transformation DAGs with built-in lineage and metadata.
A Python library for defining portable, modular, and testable data transformation DAGs with built-in lineage and metadata.
.NET for Apache Spark provides high-performance .NET APIs for Apache Spark, enabling C# and F# developers to work with structured and streaming data.
A distributed query execution engine that extends Apache DataFusion to run SQL queries in parallel across multiple nodes.
A high-performance Python package for fast, multi-threaded manipulation of large tabular datasets, inspired by R's data.table.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.