Pandas

#automl-algorithms#python-library#data-science

AutoVizPython

Automatically visualize any dataset with a single line of code, including data quality assessment and fixes.

Stars1.9k

Forks214

#educational-resources#data-science#statistics

Awesome Data Analysis

A curated collection of 500+ resources for data analysis and data science, covering Python, SQL, ML, visualization, roadmaps, and interview prep.

Stars1.7k

Forks248

#data-cleaning#cudf#spark

OptimusPython

A Python library for agile data preparation workflows that works with Pandas, Dask, cuDF, Dask-cuDF, Vaex, and PySpark.

#data-cleaning#hacktoberfest#python-library

Forks232

Last commit1 year ago

pyjanitorPython

Python library providing clean, chainable functions for data cleaning and manipulation with pandas DataFrames.

#risk-metrics#backtesting#performance-analysis

Forks189

Last commit4 days ago

empyricalPython

A Python library for calculating common financial risk and performance metrics used in quantitative finance.

Forks454

#technical-indicators#algorithmic-trading#stock-market

stockstatsPython

A pandas DataFrame wrapper for calculating over 70 stock market indicators and statistics with inline column access.

Forks317

#jupyterlab-extension#analytics-automation#datatransformation

d3pyPython

A Python plotting library that generates interactive D3.js visualizations from pandas DataFrames.

#svg#python#plotting

Stars1.4k

Forks200

Last commit5 years ago

amphi-etlTypeScript

A visual, low-code data preparation tool that generates Python code for ETL, reporting, and AI-assisted workflows.

Query pandas DataFrames using SQL syntax, similar to sqldf in R.

#data-querying#sqlite-syntax#python-library

Stars1.4k

Forks183

#data-science#statistics#dataframe

dataframe-goGo

A lightweight and intuitive Go library for data manipulation, statistics, and machine learning using DataFrames.

Stars1.3k

Forks100

Last commit4 years ago

fecon235Jupyter Notebook

A collection of Jupyter notebooks for financial economics, providing high-level APIs to retrieve, analyze, and visualize economic data from sources like FRED.

#fx#gold#financial-economics

Stars1.3k

Forks348

Last commit3 years ago

Covid-19Python

A cleaned and normalized time series dataset of global COVID-19 confirmed cases, deaths, and recoveries, updated daily.

#epidemiology#data-cleaning#data-science

Stars1.2k

Forks599

Last commit4 months ago

ITablesPython

Display Pandas and Polars DataFrames as interactive, sortable, and searchable DataTables in Jupyter notebooks and Python applications.

#notebook-tools#streamlit-component#dataframe

Stars969

Forks62

Last commit2 days ago

data_hackingJupyter Notebook

A collection of IPython notebooks demonstrating data analysis and machine learning techniques on security datasets.

#security-analytics#educational#python

Stars783

Forks297

Last commit7 years ago

DplythonPython

A Python library that brings R's dplyr data manipulation syntax to pandas DataFrames using a pipe operator.

#dplyr#python-library#data-science

Stars761

Forks52

Last commit9 years ago

pdpipeJupyter Notebook

Easy pipelines for pandas DataFrames.

#data-science#pipeline#dataframe

Stars729

Forks48

Last commit18 days ago

datacompyPython

A Python library for comparing Pandas, Polars, Spark, and Snowpark DataFrames with detailed reporting and flexible matching.

#apache-spark#fugue#spark

Stars654

Forks162

Last commit3 days ago

DoraPython

A Python library that automates the tedious parts of exploratory data analysis with cleaning, feature engineering, visualization, and versioning.

#data-cleaning#data-versioning#python

Stars647

Forks75

Last commit11 months ago

UStoreC++

A modular multi-modal transactional database for AI and semantic search, replacing MongoDB, Neo4J, and Elastic with a single ACID solution.

#networkx#semantic-search#database

Stars636

Forks36

Data science your wayJupyter Notebook

A tutorial series comparing how to implement data science concepts and build applications in both Python and R ecosystems.

#notebook#educational#data-science

Stars617

Forks253

Last commit5 years ago

Data Wrangler

A VS Code extension for visually exploring, cleaning, and transforming tabular data with automatic Pandas code generation.

#data-cleaning#vscode-extension#data-science

Stars606

Forks41

Last commit7 months ago

pandas_summaryPython

An engine for ML/data tracking, visualization, explainability, drift detection, and dashboards, integrated with Polyaxon.

#spark#matplotlib#data-science

Stars534

Forks47

Documentation website from Jupyter NotebookPython

A lightweight Python tool for generating rich summary statistics of pandas and Polars dataframes directly in the console.

#data-science#statistics#eda

Stars514

Forks29

Last commit4 days ago

dopandaPython

An overlay companion for pandas that provides real-time hints and tips to improve data analysis code.

#productivity-tool#data-science#code-assistance

Stars476

Forks22

Last commit1 year ago

mezaPython

A Python toolkit for processing tabular data

#functional-programming#library#csv

Stars423

Forks29

#apache-spark#dataframe#python

SparklingPandasPython

A Python library that provides a Pandas-like API on top of Apache Spark DataFrames for distributed data analysis.

Stars361

Forks79

Last commit3 years ago

pandaset-devkitJupyter Notebook

A Python devkit for loading, exploring, and manipulating the PandaSet, a large-scale autonomous driving dataset with LiDAR, camera, and annotations.

#lidar#autonomous-driving#sensor-fusion

Stars278

Forks74

#scientific-computing#ai#fft

scirsRust

A comprehensive scientific computing and AI/ML library in pure Rust, offering SciPy-compatible APIs with 10-100x performance gains.

Stars274

Forks33

Last commit2 days ago

RosettaJupyter Notebook

A Python toolkit for text-focused data science on medium-sized datasets, bridging memory and cluster-scale processing.

#stream-processing#multiprocessing#scientific-computing

A Python package for automated univariate and bivariate data analysis and visualization to streamline machine learning workflows.

#statistical-analysis#statisics#feature-selection

Stars205

Forks29

Last commit9 years ago

PantheraClojure

A Clojure library providing data-frames and arrays through Python's pandas and numpy.

#array#data-science#dataframe

Stars191

Forks15

Last commit6 years ago

partridgePython

A fast, forgiving GTFS reader built on pandas DataFrames

#python#gtfs#pandas

Stars185

Forks24