Dataframe

55 projects

Showing 36 of 55 projects

polarsRust

An extremely fast query engine for DataFrames, written in Rust, with multi-language frontends.

#out-of-core#apache-arrow#simd

Stars39.1k

Forks3.0k

Last commit3 days ago

PandasAIPython

A Python library that enables conversational data analysis on SQL, CSV, and parquet files using LLMs and RAG.

#ai#database#python-library

Stars23.7k

Forks2.3k

Last commit8 months ago

modinPython

A drop-in replacement for pandas that scales data analysis workflows to use all CPU cores and handle out-of-memory datasets.

#parallel-computing#distributed#data-science

Stars10.4k

Forks676

Last commit5 months ago

cudfC++

A GPU-accelerated DataFrame library for tabular data processing, part of the RAPIDS data science suite.

#cudf#cuda#apache-arrow

An extensible SQL query engine written in Rust, using Apache Arrow as its in-memory format for building fast database and analytic systems.

#columnar-database#apache-arrow#dataframe

A high-performance Python DataFrame library for lazy out-of-core processing and visualization of billion-row datasets at interactive speeds.

#out-of-core#python-dataframe#apache-arrow

Stars8.5k

Forks603

Last commit3 months ago

panderaPython

A flexible and expressive API for performing statistical data validation on dataframe-like objects.

#data-cleaning#pandas-validation#python-library

Stars4.4k

Forks422

Last commit3 days ago

data.table <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">R

A high-performance R package for fast data manipulation of large datasets, extending data.frame with concise syntax and memory efficiency.

#parallel-computing#high-performance#r-package

A Java dataframe and visualization library for data loading, cleaning, transformation, and analysis.

#statistical-analysis#chart#data-science

Stars3.8k

Forks649

Last commit21 days ago

KoalasPython

Koalas provides the pandas DataFrame API on Apache Spark, enabling data scientists to work with big data using familiar pandas syntax.

#apache-spark#spark#mlflow

Stars3.4k

Forks369

Last commit2 years ago

gotaGo

A Go library providing DataFrames, Series, and data wrangling operations for tabular data manipulation.

#dataframe#data-wrangling#series

Stars3.3k

Forks290

Last commit2 years ago

swifterPython

A Python package that automatically accelerates pandas and Modin DataFrame apply operations by choosing the fastest available method.

#parallelization#parallel-computing#data-science

Stars2.6k

Forks104

Last commit2 years ago

HamiltonJupyter Notebook

A Python library for defining portable, modular, and testable data transformation DAGs with built-in lineage and metadata.

#data-lineage#etl-pipeline#python-library

Stars2.6k

Forks200

Last commit2 days ago

HamiltonJupyter Notebook

A lightweight Python library for creating portable, expressive, and testable data transformation DAGs with built-in lineage and metadata.

#data-lineage#etl-pipeline#python-library

Stars2.6k

Forks200

Last commit2 days ago

.NET for Apache SparkC#

.NET for Apache Spark provides high-performance .NET APIs for Apache Spark, enabling C# and F# developers to work with structured and streaming data.

#apache-spark#spark#dataframe

Stars2.1k

Forks332

Last commit2 months ago

Apache BallistaRust

A distributed query execution engine that extends Apache DataFusion to run SQL queries in parallel across multiple nodes.

#parallel-computing#distributed#dataframe

A high-performance Python package for fast, multi-threaded manipulation of large tabular datasets, inspired by R's data.table.

#data-science#multi-threading#dataframe

Stars1.9k

Forks166

Last commit1 year ago

skrubPython

Machine learning with dataframes

#data-cleaning#data-science#dataframe

Stars1.6k

Forks268

Last commit2 days ago

pyjanitorPython

Python library providing clean, chainable functions for data cleaning and manipulation with pandas DataFrames.

#data-cleaning#hacktoberfest#python-library

A pandas DataFrame wrapper for calculating over 70 stock market indicators and statistics with inline column access.

#technical-indicators#algorithmic-trading#stock-market

Stars1.5k

Forks318

Last commit1 month ago

pandasqlPython

Query pandas DataFrames using SQL syntax, similar to sqldf in R.

#data-querying#sqlite-syntax#python-library

Stars1.4k

Forks183

Last commit2 years ago

dataframe-goGo

A lightweight and intuitive Go library for data manipulation, statistics, and machine learning using DataFrames.

#data-science#statistics#dataframe

Stars1.3k

Forks100

Last commit4 years ago

GraphFramesScala

A DataFrame-based graph processing library for Apache Spark, enabling scalable graph analytics and algorithms.

#graph-processing#apache-spark#network-motifs

Stars1.2k

Forks268

Last commit28 days ago

Vince's CSV ParserC++

A high-performance, fully-featured CSV parser and serializer for modern C++ with streaming, random access, and robust format handling.

#csv-reader#tab-separated#high-performance

A Ruby library for data analysis with DataFrame and Vector structures, offering storage, manipulation, and visualization.

#scientific-computing#vector#statistics

Stars1.1k

Forks140

Last commit2 years ago

ITablesPython

Display Pandas and Polars DataFrames as interactive, sortable, and searchable DataTables in Jupyter notebooks and Python applications.

#notebook-tools#streamlit-component#dataframe

Stars967

Forks61

Last commit1 day ago

Mobius: C# API for SparkC#

C# and F# language binding and extensions for Apache Spark, enabling .NET developers to write Spark driver programs and data processing operations.

#rdd#apache-spark#spark

Stars947

Forks209

Last commit7 months ago

chispaPython

A PySpark testing library providing fast helper methods with descriptive, color-coded error messages for DataFrame and column comparisons.

#apache-spark#unit-testing#dataframe

Stars772

Forks80

Last commit9 days ago

spark-dariaScala

A Scala library providing essential Spark extensions, helper methods, and custom transformations to maximize developer productivity.

#apache-spark#spark-extensions#spark

Stars767

Forks150

Last commit29 days ago

tech.ml.datasetClojure

A high-performance, functional tabular data processing library for Clojure, similar to Python's Pandas or R's data.table.

#etl-pipeline#functional-programming#high-performance

Stars750

Forks33

Last commit1 month ago

pdpipeJupyter Notebook

Easy pipelines for pandas DataFrames.

#data-science#pipeline#dataframe

Stars729

Forks48

Last commit15 days ago

PeroxideRust

A Rust numeric library for linear algebra, numerical analysis, statistics, and machine learning with high performance and syntax inspired by R, MATLAB, and Python.

#scientific-computing#spline#high-performance

Stars721

Forks43

Last commit10 days ago

PySpark Cheatsheet

A quick reference guide to the most commonly used patterns and functions in PySpark SQL.

#apache-spark#reference-guide#data-science

Stars696

Forks211

Last commit3 years ago

datacompyPython

A Python library for comparing Pandas, Polars, Spark, and Snowpark DataFrames with detailed reporting and flexible matching.

#apache-spark#fugue#spark

Stars654

Forks161

Last commit1 month ago

SparkScala

A library enabling Apache Spark to read from and write to Apache HBase tables as external data sources using DataFrames and SQL.

#apache-spark#data-integration#dataframe

Stars546

Forks273

Last commit5 years ago

Spark XMLScala

A library for parsing and querying XML data with Apache Spark SQL and DataFrames.

#apache-spark#dataframe#xml-parser

Stars513

Forks223

Last commit1 year ago

Page 1 of 2Next

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub