Showing 36 of 47 projects
An extremely fast query engine for DataFrames, written in Rust, with multi-language frontends.
A Python library that enables conversational data analysis on SQL, CSV, and parquet files using LLMs and RAG.
A drop-in replacement for pandas that scales data analysis workflows to use all CPU cores and handle out-of-memory datasets.
A GPU-accelerated DataFrame library for tabular data processing, part of the RAPIDS data science suite.
An extensible SQL query engine written in Rust, using Apache Arrow as its in-memory format for building fast database and analytic systems.
A high-performance Python DataFrame library for lazy out-of-core processing and visualization of billion-row datasets at interactive speeds.
A flexible and expressive API for performing statistical data validation on dataframe-like objects.
A high-performance R package for fast data manipulation of large datasets, extending data.frame with concise syntax and memory efficiency.
A Java dataframe and visualization library for data loading, cleaning, transformation, and analysis.
Koalas provides the pandas DataFrame API on Apache Spark, enabling data scientists to work with big data using familiar pandas syntax.
A Go library providing DataFrames, Series, and data wrangling operations for tabular data manipulation.
A Python package that automatically accelerates pandas and Modin DataFrame apply operations by choosing the fastest available method.
A lightweight Python library for creating portable, expressive, and testable data transformation DAGs with built-in lineage and metadata.
A Python library for defining portable, modular, and testable data transformation DAGs with built-in lineage and metadata.
.NET for Apache Spark provides high-performance .NET APIs for Apache Spark, enabling C# and F# developers to work with structured and streaming data.
A distributed query execution engine that extends Apache DataFusion to run SQL queries in parallel across multiple nodes.
A high-performance Python package for fast, multi-threaded manipulation of large tabular datasets, inspired by R's data.table.
Python library providing clean, chainable functions for data cleaning and manipulation with pandas DataFrames.
A pandas DataFrame wrapper for calculating over 70 stock market indicators and statistics with inline column access.
Query pandas DataFrames using SQL syntax, similar to sqldf in R.
A lightweight and intuitive Go library for data manipulation, statistics, and machine learning using DataFrames.
A DataFrame-based graph processing library for Apache Spark, enabling scalable graph analytics and algorithms.
A high-performance, fully-featured CSV parser and serializer for modern C++ with streaming, random access, and robust format handling.
A Ruby library for data analysis with DataFrame and Vector structures, offering storage, manipulation, and visualization.
Display Pandas and Polars DataFrames as interactive, sortable, and searchable DataTables in Jupyter notebooks and Python applications.
C# and F# language binding and extensions for Apache Spark, enabling .NET developers to write Spark driver programs and data processing operations.
A PySpark testing library providing fast helper methods with descriptive, color-coded error messages for DataFrame and column comparisons.
A Scala library providing essential Spark extensions, helper methods, and custom transformations to maximize developer productivity.
A high-performance, functional tabular data processing library for Clojure, similar to Python's Pandas or R's data.table.
A Rust numeric library for linear algebra, numerical analysis, statistics, and machine learning with high performance and syntax inspired by R, MATLAB, and Python.
A quick reference guide to the most commonly used patterns and functions in PySpark SQL.
A Python library for comparing Pandas, Polars, Spark, and Snowpark DataFrames with detailed reporting and flexible matching.
A library enabling Apache Spark to read from and write to Apache HBase tables as external data sources using DataFrames and SQL.
A library for parsing and querying XML data with Apache Spark SQL and DataFrames.
A fast Apache Spark testing helper library with beautifully formatted error messages for Scala applications.
A Clojure dataset manipulation library providing a dplyr-like API on top of tech.ml.dataset.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.