Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Tags
  3. Data Science

Data Science

506 projects

Showing 36 of 506 projects

DuckDB.NET
DuckDB.NETC#

ADO.NET provider and native bindings for DuckDB, enabling C# applications to interact with the in-process analytical database.

#database-driver#hacktoberfest#duckdb-database
Stars676
Forks91
Last commit3 days ago
lineapy
lineapyJupyter Notebook

Capture, analyze, and transform messy Jupyter notebooks into production data pipelines with just two lines of code.

#data-science#model-deployment#reproducible-research
Stars670
Forks59
Last commit1 year ago
RStartHere
RStartHereR

A curated guide to essential R packages organized by their role in the data science workflow.

#workflow-guide#r-ecosystem#data-science
Stars663
Forks217
Last commit6 years ago
datacompy
datacompyPython

A Python library for comparing Pandas, Polars, Spark, and Snowpark DataFrames with detailed reporting and flexible matching.

#apache-spark#fugue#spark
Stars647
Forks161
Last commit3 days ago
Learning
Learning

A curated collection of free resources to help deepen your understanding of the R programming language.

#community#data-science#r-programming
Stars643
Forks104
Last commit1 year ago
SparkR <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">
SparkR <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">R

An R package providing a lightweight frontend to use Apache Spark for distributed data processing from R.

#apache-spark#r-package#data-science
Stars642
Forks322
Last commit
GLM
GLMJulia

A Julia package for fitting linear and generalized linear models with comprehensive statistical functionality.

#statistical-models#regression-analysis#scientific-computing
Stars636
Forks117
Last commit11 days ago
Data Science Roadmap
Data Science Roadmap

A comprehensive roadmap chart and resource guide for aspiring data scientists, based on insights from Silicon Valley tech companies.

#educational-resources#data-science#skill-development
Stars635
Forks125
Last commit2 years ago
rio
rioR

An R package that simplifies data import and export by automatically selecting the correct function based on file extension.

#stata#statistical-analysis#spss
Stars619
Forks77
Last commit2 months ago
Data science your way
Data science your wayJupyter Notebook

A tutorial series comparing how to implement data science concepts and build applications in both Python and R ecosystems.

#notebook#educational#data-science
Stars616
Forks253
Last commit5 years ago
Data Wrangler
Data Wrangler

A VS Code extension for visually exploring, cleaning, and transforming tabular data with automatic Pandas code generation.

#data-cleaning#vscode-extension#data-science
Stars596
Forks40
Last commit6 months ago
Food-Recipe-CNN
Food-Recipe-CNNJupyter Notebook

A deep learning system that classifies food images into 230 categories and retrieves matching recipes using convolutional neural networks.

#inceptionv3#transfer-learning#food-recognition
Stars586
Forks131
Last commit
recmetrics
recmetricsJupyter Notebook

A Python library providing evaluation metrics and diagnostic tools for recommender systems.

#evaluation-metrics#python-library#data-science
Stars583
Forks101
Last commit2 years ago
xgboost <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">
xgboost <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">C++

An optimized distributed gradient boosting library for fast and accurate machine learning on large datasets.

#parallel-computing#gbdt#ml-library
Stars578
Forks258
Last commit
PyToune
PyTounePython

A simplified Keras-like framework for PyTorch that reduces boilerplate code for training neural networks.

#callbacks#neural-network#model-training
Stars578
Forks64
Last commit2 days ago
ipython-notebooks
ipython-notebooksJupyter Notebook

A collection of IPython notebooks containing machine learning experiments and examples using scikit-learn and related Python libraries.

#data-science#jupyter#python
Stars575
Forks198
Last commit1 month ago
TabGAN
TabGANPython

A Python library for generating high-quality synthetic tabular data using GANs, diffusion models, and large language models.

#gans#privacy-preservation#train-dataframe
Stars570
Forks83
Last commit2 months ago
tidygraph
tidygraphR

A tidy API for graph manipulation in R, providing dplyr verbs and igraph algorithms for network analysis.

#graph-manipulation#igraph#r-package
Stars566
Forks61
Last commit1 year ago
TypeDB-ML
TypeDB-MLPython

A machine learning integrations library for TypeDB, enabling graph algorithms and Graph Neural Networks on strongly-typed graph data.

#networkx#graph-neural-networks#knowledge-graph-completion
Stars552
Forks93
Last commit2 years ago
kand
kandRust

A modern, high-performance technical analysis library built in Rust with Python and WebAssembly bindings.

#technical-analysis#webassembly#high-performance
Stars551
Forks24
Last commit4 months ago
DataExplorer
DataExplorerR

An R package that automates exploratory data analysis and data treatment with one-line reports and visualizations.

#r-package#data-science#statistics
Stars544
Forks95
Last commit3 months ago
pandas_summary
pandas_summaryPython

An engine for ML/data tracking, visualization, explainability, drift detection, and dashboards, integrated with Polyaxon.

#spark#matplotlib#data-science
Stars533
Forks47
Last commit1 month ago
Neovim
NeovimLua

A Neovim plugin providing language support, code execution, and preview features for working with Quarto documents.

#document-preview#code-execution#language-server
Stars522
Forks22
Last commit1 month ago
Aqueduct
AqueductGo

An open-source MLOps framework for defining and deploying machine learning and LLM workloads across any cloud infrastructure.

#cloud-infrastructure#ai#open-source
Stars519
Forks20
Last commit3 years ago
Open Data Sources
Open Data Sources

A curated collection of open data sources across government, academic, and private sectors for data science and research.

#data-curation#data-science#government-data
Stars517
Forks192
Last commit8 years ago
Documentation website from Jupyter Notebook
Documentation website from Jupyter NotebookPython

A lightweight Python tool for generating rich summary statistics of pandas and Polars dataframes directly in the console.

#data-science#statistics#eda
Stars512
Forks28
Last commit4 days ago
Lightwood
LightwoodPython

An AutoML framework that generates and customizes machine learning pipelines using declarative JSON-AI syntax.

#hacktoberfest#ml-pipeline#data-science
Stars507
Forks101
Last commit3 months ago
Data Frames Meta
Data Frames MetaJulia

A Julia package providing metaprogramming macros to simplify DataFrame manipulation with a more concise syntax.

#hacktoberfest#julia#metaprogramming
Stars498
Forks56
Last commit6 months ago
desbordante
desbordanteC++

A high-performance data profiler for discovering and validating complex patterns like functional dependencies, inclusion dependencies, and association rules.

#data-cleaning#pattern-discovery#data-science
Stars482
Forks100
Last commit
Desbordante
DesbordanteC++

A high-performance data profiler for discovering and validating complex patterns in datasets, enabling data cleaning and quality analysis.

#data-cleaning#cpp-library#data-science
Stars482
Forks100
Last commit7 days ago
leaves
leavesGo

A pure Go library for making predictions with Gradient Boosting Regression Trees models from LightGBM, XGBoost, and scikit-learn.

#gbdt#data-science#go-library
Stars479
Forks87
Last commit1 year ago
dopanda
dopandaPython

An overlay companion for pandas that provides real-time hints and tips to improve data analysis code.

#productivity-tool#data-science#code-assistance
Stars478
Forks22
Last commit1 year ago
open-solution-home-credit
open-solution-home-creditPython

An open-source machine learning solution for the Home Credit Default Risk Kaggle competition, providing reproducible code and experiments.

#pipeline-framework#reproducible-experiments#hyperparameter-tuning
Stars464
Forks171
Last commit
IfSharp
IfSharpJupyter Notebook

F# kernel for Jupyter notebooks, enabling interactive data science and exploration with F#.

#mono#notebook#f-sharp
Stars442
Forks70
Last commit4 years ago
sql
sqlTypeScript

A SQL GUI extension for JupyterLab that enables point-and-click database exploration and query execution.

#jupyterlab-extension#database#sql-gui
Stars433
Forks51
Last commit3 years ago
imbalanced-ensemble
imbalanced-ensemblePython

A Python library for class-imbalanced ensemble learning with 30+ algorithms, built on scikit-learn.

#ensemble-learning#imbalanced-data#imbalanced-learning
Stars426
Forks60
Last commit3 months ago
PreviousPage 10 of 15

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
10 years ago
3 years ago
8 years ago
7 days ago
4 years ago
Next
#Machine Learning288
#Python245
#Deep Learning84
#Data Analysis79
#Data Visualization79
#Statistics61
#Python Library55
#Jupyter Notebook53
#R52
#Jupyter49
#Scikit Learn48
#Pandas43