Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Tags
  3. Dataframe

Dataframe

47 projects

Showing 36 of 47 projects

polars
polarsRust

An extremely fast query engine for DataFrames, written in Rust, with multi-language frontends.

#out-of-core#apache-arrow#simd
Stars38.7k
Forks2.9k
Last commit3 days ago
PandasAI
PandasAIPython

A Python library that enables conversational data analysis on SQL, CSV, and parquet files using LLMs and RAG.

#ai#database#python-library
Stars23.6k
Forks2.3k
Last commit7 months ago
modin
modinPython

A drop-in replacement for pandas that scales data analysis workflows to use all CPU cores and handle out-of-memory datasets.

#parallel-computing#distributed#data-science
Stars10.4k
Forks676
Last commit3 months ago
cudf
cudfC++

A GPU-accelerated DataFrame library for tabular data processing, part of the RAPIDS data science suite.

#cudf#cuda#apache-arrow
Stars9.7k
Forks1.1k
Last commit1 day ago
datafusion
datafusionRust

An extensible SQL query engine written in Rust, using Apache Arrow as its in-memory format for building fast database and analytic systems.

#columnar-database#apache-arrow#dataframe
Stars8.9k
Forks2.2k
Last commit21 hours ago
vaex
vaexPython

A high-performance Python DataFrame library for lazy out-of-core processing and visualization of billion-row datasets at interactive speeds.

#out-of-core#python-dataframe#apache-arrow
Stars8.5k
Forks602
Last commit2 months ago
pandera
panderaPython

A flexible and expressive API for performing statistical data validation on dataframe-like objects.

#data-cleaning#pandas-validation#python-library
Stars4.4k
Forks401
Last commit1 day ago
data.table <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">
data.table <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">R

A high-performance R package for fast data manipulation of large datasets, extending data.frame with concise syntax and memory efficiency.

#parallel-computing#high-performance#r-package
Stars3.9k
Forks1.0k
Last commit1 day ago
Tablesaw
TablesawJava

A Java dataframe and visualization library for data loading, cleaning, transformation, and analysis.

#statistical-analysis#chart#data-science
Stars3.8k
Forks649
Last commit3 months ago
Koalas
KoalasPython

Koalas provides the pandas DataFrame API on Apache Spark, enabling data scientists to work with big data using familiar pandas syntax.

#apache-spark#spark#mlflow
Stars3.4k
Forks371
Last commit2 years ago
gota
gotaGo

A Go library providing DataFrames, Series, and data wrangling operations for tabular data manipulation.

#dataframe#data-wrangling#series
Stars3.3k
Forks290
Last commit2 years ago
swifter
swifterPython

A Python package that automatically accelerates pandas and Modin DataFrame apply operations by choosing the fastest available method.

#parallelization#parallel-computing#data-science
Stars2.6k
Forks104
Last commit2 years ago
Hamilton
HamiltonJupyter Notebook

A lightweight Python library for creating portable, expressive, and testable data transformation DAGs with built-in lineage and metadata.

#data-lineage#etl-pipeline#python-library
Stars2.5k
Forks192
Last commit1 day ago
Hamilton
HamiltonJupyter Notebook

A Python library for defining portable, modular, and testable data transformation DAGs with built-in lineage and metadata.

#data-lineage#etl-pipeline#python-library
Stars2.5k
Forks192
Last commit1 day ago
.NET for Apache Spark
.NET for Apache SparkC#

.NET for Apache Spark provides high-performance .NET APIs for Apache Spark, enabling C# and F# developers to work with structured and streaming data.

#apache-spark#spark#dataframe
Stars2.1k
Forks329
Last commit25 days ago
Apache Ballista
Apache BallistaRust

A distributed query execution engine that extends Apache DataFusion to run SQL queries in parallel across multiple nodes.

#parallel-computing#distributed#dataframe
Stars2.1k
Forks284
Last commit2 days ago
datatable
datatableC++

A high-performance Python package for fast, multi-threaded manipulation of large tabular datasets, inspired by R's data.table.

#data-science#multi-threading#dataframe
Stars1.9k
Forks167
Last commit1 year ago
pyjanitor
pyjanitorPython

Python library providing clean, chainable functions for data cleaning and manipulation with pandas DataFrames.

#data-cleaning#hacktoberfest#python-library
Stars1.5k
Forks186
Last commit5 days ago
stockstats
stockstatsPython

A pandas DataFrame wrapper for calculating over 70 stock market indicators and statistics with inline column access.

#technical-indicators#algorithmic-trading#stock-market
Stars1.5k
Forks317
Last commit2 months ago
pandasql
pandasqlPython

Query pandas DataFrames using SQL syntax, similar to sqldf in R.

#data-querying#sqlite-syntax#python-library
Stars1.3k
Forks183
Last commit1 year ago
dataframe-go
dataframe-goGo

A lightweight and intuitive Go library for data manipulation, statistics, and machine learning using DataFrames.

#data-science#statistics#dataframe
Stars1.3k
Forks99
Last commit4 years ago
GraphFrames
GraphFramesScala

A DataFrame-based graph processing library for Apache Spark, enabling scalable graph analytics and algorithms.

#graph-processing#apache-spark#network-motifs
Stars1.2k
Forks268
Last commit2 days ago
Vince's CSV Parser
Vince's CSV ParserC++

A high-performance, fully-featured CSV parser and serializer for modern C++ with streaming, random access, and robust format handling.

#csv-reader#tab-separated#high-performance
Stars1.1k
Forks196
Last commit13 days ago
daru
daruRuby

A Ruby library for data analysis with DataFrame and Vector structures, offering storage, manipulation, and visualization.

#scientific-computing#vector#statistics
Stars1.1k
Forks140
Last commit2 years ago
ITables
ITablesPython

Display Pandas and Polars DataFrames as interactive, sortable, and searchable DataTables in Jupyter notebooks and Python applications.

#notebook-tools#streamlit-component#dataframe
Stars963
Forks62
Last commit3 days ago
Mobius: C# API for Spark
Mobius: C# API for SparkC#

C# and F# language binding and extensions for Apache Spark, enabling .NET developers to write Spark driver programs and data processing operations.

#rdd#apache-spark#spark
Stars946
Forks208
Last commit5 months ago
chispa
chispaPython

A PySpark testing library providing fast helper methods with descriptive, color-coded error messages for DataFrame and column comparisons.

#apache-spark#unit-testing#dataframe
Stars769
Forks80
Last commit19 days ago
spark-daria
spark-dariaScala

A Scala library providing essential Spark extensions, helper methods, and custom transformations to maximize developer productivity.

#apache-spark#spark-extensions#spark
Stars767
Forks150
Last commit8 months ago
tech.ml.dataset
tech.ml.datasetClojure

A high-performance, functional tabular data processing library for Clojure, similar to Python's Pandas or R's data.table.

#etl-pipeline#functional-programming#high-performance
Stars748
Forks34
Last commit18 days ago
Peroxide
PeroxideRust

A Rust numeric library for linear algebra, numerical analysis, statistics, and machine learning with high performance and syntax inspired by R, MATLAB, and Python.

#scientific-computing#spline#high-performance
Stars703
Forks40
Last commit23 days ago
PySpark Cheatsheet
PySpark Cheatsheet

A quick reference guide to the most commonly used patterns and functions in PySpark SQL.

#apache-spark#reference-guide#data-science
Stars687
Forks210
Last commit3 years ago
datacompy
datacompyPython

A Python library for comparing Pandas, Polars, Spark, and Snowpark DataFrames with detailed reporting and flexible matching.

#apache-spark#fugue#spark
Stars647
Forks161
Last commit3 days ago
Spark
SparkScala

A library enabling Apache Spark to read from and write to Apache HBase tables as external data sources using DataFrames and SQL.

#apache-spark#data-integration#dataframe
Stars547
Forks274
Last commit5 years ago
Spark XML
Spark XMLScala

A library for parsing and querying XML data with Apache Spark SQL and DataFrames.

#apache-spark#dataframe#xml-parser
Stars513
Forks223
Last commit1 year ago
spark-fast-tests
spark-fast-testsScala

A fast Apache Spark testing helper library with beautifully formatted error messages for Scala applications.

#apache-spark#spark#unit-testing
Stars456
Forks77
Last commit2 months ago
Tablecloth
TableclothClojure

A Clojure dataset manipulation library providing a dplyr-like API on top of tech.ml.dataset.

#columnar-data#dataframe#dataset-api
Stars363
Forks29
Last commit1 month ago
Page 1 of 2Next

Related Tags

#Python22#Data Analysis21#Big Data21#Data Science18#Apache Spark16#Data Processing16#Pandas15#Spark10#Machine Learning10#Distributed Computing10#Data Engineering9#Data Manipulation8
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub