Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Tags
  3. Data Science

Data Science

252 projects

Showing 36 of 252 projects

data-diff
data-diffPython

Fast tool for comparing datasets within or across SQL databases to identify differences.

#database#python-library#data-science
Stars3.0k
Forks305
Last commit1 year ago
Deepnote
DeepnoteTypeScript

An open-source, AI-first data notebook that extends Jupyter with a sleek UI, reactive execution, and native data integrations.

#version-control#jupyterhub#data-science
Stars2.8k
Forks192
Last commit2 days ago
AI Fairness 360
AI Fairness 360Python

An extensible open-source toolkit for detecting, mitigating, and explaining bias in machine learning datasets and models.

#fairness-awareness-model#bias-mitigation#ai
Stars2.8k
Forks910
Last commit5 months ago
ml.js
ml.jsJavaScript

A comprehensive collection of machine learning algorithms and mathematical utilities implemented in JavaScript for browser and Node.js.

#browser-ml#regression-analysis#data-science
Stars2.7k
Forks213
Last commit1 year ago
docker-python
docker-pythonPython

Docker image providing the Python environment used by Kaggle Notebooks for data science competitions.

#data-science#kaggle#reproducible-research
Stars2.7k
Forks1.0k
Last commit1 month ago
swifter
swifterPython

A Python package that automatically accelerates pandas and Modin DataFrame apply operations by choosing the fastest available method.

#parallelization#parallel-computing#data-science
Stars2.6k
Forks104
Last commit2 years ago
Data Science Projects
Data Science ProjectsJupyter Notebook

A curated collection of hands-on data science project ideas and resources for learning machine learning and AI concepts.

#data-science#kaggle#deep-learning
Stars2.6k
Forks623
Last commit2 years ago
Sungjoon's TensorFlow-101
Sungjoon's TensorFlow-101Jupyter Notebook

A collection of beginner-friendly TensorFlow tutorials using Jupyter Notebook, covering deep learning fundamentals and practical applications.

#python-tutorials#data-science#deep-learning
Stars2.6k
Forks734
Last commit
lifelines
lifelinesPython

A pure Python library for survival analysis, modeling time-to-event data with censoring.

#maximum-likelihood#kaplan-meier#data-science
Stars2.6k
Forks566
Last commit1 month ago
Bayesian Modelling in Python
Bayesian Modelling in PythonJupyter Notebook

A Python tutorial and cookbook for implementing Bayesian modeling techniques using PyMC3.

#pymc3#bayesian-statistics#data-science
Stars2.5k
Forks406
Last commit9 years ago
PyGraphistry
PyGraphistryPython

A Python library for loading, shaping, embedding, and exploring large graphs with GPU-accelerated visualization and analytics.

#networkx#graph#graph-query-language
Stars2.5k
Forks226
Last commit2 days ago
lgo
lgoGo

A Jupyter Notebook kernel and interactive REPL for Go (golang) that enables interactive programming and data analysis.

#jupyter-kernel#notebook#data-science
Stars2.5k
Forks118
Last commit5 years ago
Hamilton
HamiltonJupyter Notebook

A Python library for defining portable, modular, and testable data transformation DAGs with built-in lineage and metadata.

#data-lineage#etl-pipeline#python-library
Stars2.5k
Forks184
Last commit2 days ago
Hamilton
HamiltonJupyter Notebook

A lightweight Python library for creating portable, expressive, and testable data transformation DAGs with built-in lineage and metadata.

#data-lineage#etl-pipeline#python-library
Stars2.5k
Forks184
Last commit2 days ago
hyperlearn
hyperlearnJupyter Notebook

HyperLearn provides 2-2000x faster machine learning algorithms with 50% less memory usage, optimized for all hardware.

#parallel-computing#high-performance#python-library
Stars2.4k
Forks158
Last commit1 year ago
scikit-plot
scikit-plotPython

An intuitive Python library that adds single-line plotting functions for scikit-learn and other machine learning objects.

#classification-metrics#plot#python-library
Stars2.4k
Forks288
Last commit1 year ago
Probability and Statistics Cookbook
Probability and Statistics CookbookTeX

A concise mathematical reference covering essential topics in probability theory and statistics.

#data-science#statistics#probability-theory
Stars2.3k
Forks347
Last commit3 years ago
radian <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">
radian <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">Python

A modern R console with multiline editing, syntax highlighting, and improved REPL features.

#python-integration#data-science#syntax-highlighting
Stars2.3k
Forks87
Last commit
SGX-Full-OrderBook-Tick-Data-Trading-Strategy
SGX-Full-OrderBook-Tick-Data-Trading-StrategyJupyter Notebook

A machine learning framework for developing high-frequency trading strategies using full orderbook tick data.

#market-microstructure#time-series-prediction#high-frequency-trading
Stars2.3k
Forks694
Last commit
dataprep
dataprepPython

An open-source Python library for low-code data preparation, offering fast EDA, data cleaning, and collection from APIs and databases.

#data-cleaning#datacleaning#connector
Stars2.2k
Forks222
Last commit1 year ago
Feature Engine
Feature EnginePython

A Python library for feature engineering and selection with scikit-learn compatible transformers.

#open-source#feature-selection#data-science
Stars2.2k
Forks340
Last commit27 days ago
ML with Ruby
ML with RubyRuby

A curated list of libraries, tutorials, and resources for implementing machine learning in the Ruby programming language.

#ai#open-source#data-science
Stars2.2k
Forks181
Last commit1 year ago
Awesome Machine Learning with Ruby
Awesome Machine Learning with RubyRuby

A curated list of awesome libraries, data sources, tutorials, and resources for machine learning using the Ruby programming language.

#ruby-ecosystem#data-science#deep-learning
Stars2.2k
Forks181
Last commit
RAPIDS cuGraph
RAPIDS cuGraphCuda

A collection of GPU-accelerated graph analytics libraries for creating, manipulating, and executing scalable graph algorithms.

#cuda#high-performance-computing#graph
Stars2.2k
Forks350
Last commit10 hours ago
PyFlux
PyFluxPython

An open-source Python library for probabilistic time series modeling with both frequentist and Bayesian inference methods.

#state-space-models#probabilistic-modeling#python-library
Stars2.1k
Forks247
Last commit2 years ago
Curated list of R tutorials for Data Science, NLP and Machine Learning
Curated list of R tutorials for Data Science, NLP and Machine LearningR

A curated collection of R tutorials, packages, and resources for Data Science, NLP, and Machine Learning.

#data-science#statistics#r-programming
Stars2.1k
Forks879
Last commit
TileDB
TileDBC++

An embeddable C++ storage engine for dense and sparse multi-dimensional arrays, dataframes, and key-value stores.

#multi-dimensional-arrays#c-plus-plus-library#scientific-computing
Stars2.0k
Forks210
Last commit2 days ago
Bytewax
BytewaxPython

A Python framework and Rust-based distributed processing engine for stateful event and stream processing.

#stream-processing#event-driven#data-science
Stars2.0k
Forks107
Last commit1 year ago
MLJ
MLJJulia

A Julia machine learning framework providing a unified interface and meta-algorithms for over 200 models.

#scientific-computing#julia#pipelines
Stars1.9k
Forks159
Last commit8 days ago
AutoViz
AutoVizPython

Automatically visualize any dataset with a single line of code, including data quality assessment and fixes.

#automl-algorithms#python-library#data-science
Stars1.9k
Forks215
Last commit1 year ago
Szilard's machine learning benchmark
Szilard's machine learning benchmarkR

A minimal benchmark comparing scalability, speed, and accuracy of popular open-source machine learning libraries for binary classification.

#h2o#random-forest#open-source
Stars1.9k
Forks330
Last commit3 years ago
datatable
datatableC++

A high-performance Python package for fast, multi-threaded manipulation of large tabular datasets, inspired by R's data.table.

#data-science#multi-threading#dataframe
Stars1.9k
Forks167
Last commit1 year ago
Apache Spark
Apache SparkShell

A curated list of awesome Apache Spark packages, libraries, and resources for data engineers and scientists.

#apache-spark#data-science#spark-ecosystem
Stars1.9k
Forks344
Last commit1 month ago
NGBoost
NGBoostJupyter Notebook

A Python library for probabilistic prediction using natural gradient boosting, built on scikit-learn.

#natural-gradients#ngboost#uncertainty-estimation
Stars1.9k
Forks248
Last commit1 month ago
DataFrames
DataFramesJulia

A flexible and fast package for in-memory tabular data manipulation and analysis in the Julia programming language.

#hacktoberfest#julia#missing-data
Stars1.8k
Forks374
Last commit12 days ago
CausalImpact <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">
CausalImpact <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">R

An R package for estimating causal effects in time series using Bayesian structural time-series models.

#bayesian-statistics#r-package#data-science
Stars1.8k
Forks261
Last commit
PreviousPage 6 of 7

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
6 years ago
1 month ago
3 years ago
1 year ago
3 years ago
24 days ago
Next
#Machine Learning160
#Python154
#Deep Learning59
#Data Visualization50
#Python Library39
#Data Analysis38
#Scikit Learn34
#Statistics32
#Jupyter Notebook32
#Jupyter Notebooks27
#Jupyter26
#R25