Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Tags
  3. Data Science

Data Science

506 projects

Showing 36 of 484 projects

iRuby
iRubyRuby

A Ruby kernel for Jupyter notebooks, enabling interactive data science and computational workflows in Ruby.

#scientific-computing#jupyter-kernel#notebook
Stars925
Forks35
Last commit4 days ago
smartcore
smartcoreRust

A fast, ergonomic machine learning library for Rust with broad algorithm coverage and WASM-first defaults.

#statistical-models#data-science#machine-learning-algorithms
Stars923
Forks95
Last commit1 month ago
Learn Statistics Using Python
Learn Statistics Using PythonJupyter Notebook

Learn statistics through Python with real-world examples like analyzing marijuana price data across US states.

#regression-analysis#educational#data-science
Stars920
Forks381
Last commit5 years ago
rumale
rumaleRuby

A Ruby machine learning library with a Scikit-Learn-like interface for classification, regression, clustering, and dimensionality reduction.

#random-forest#data-science#dimensionality-reduction
Stars912
Forks35
Last commit20 days ago
gopherdata
gopherdata

A curated collection of resources for Go-based data analysis, visualization, machine learning, and data science.

#data-science#developer-resources#tooling
Stars888
Forks83
Last commit2 years ago
quanteda
quantedaR

An R package for the quantitative analysis of textual data, providing comprehensive tools for natural language processing and text management.

#computational-linguistics#parallel-computing#r-package
Stars883
Forks191
Last commit19 hours ago
OpenFE
OpenFEPython

An automated feature generation framework for tabular data that discovers expert-level features to boost machine learning model performance.

#parallel-computing#python-library#data-science
Stars872
Forks112
Last commit2 years ago
Cheminformatics
Cheminformatics

A curated list of awesome cheminformatics software, libraries, resources, and tools, primarily command-line based and open-source.

#scientific-computing#cheminformatics#open-source
Stars868
Forks143
Last commit2 years ago
clojupyter
clojupyterClojure

A Jupyter kernel for Clojure, enabling Clojure code execution in Jupyter Lab, Notebook, and Console.

#jupyter-lab#hacktoberfest#jupyter-kernel
Stars865
Forks95
Last commit1 year ago
Awesome R Shiny
Awesome R ShinyR

A curated list of resources for R Shiny, including tutorials, packages, deployment guides, and app examples.

#deployment#open-source#data-science
Stars864
Forks147
Last commit3 years ago
notedown
notedownJupyter Notebook

Convert IPython/Jupyter notebooks to markdown and back, enabling seamless editing of notebooks as markdown files.

#python-tool#data-science#workflow-automation
Stars859
Forks112
Last commit4 years ago
Chapyter
ChapyterPython

A JupyterLab extension that integrates GPT-4 as a code interpreter, translating natural language to Python and executing it automatically.

#jupyterlab-extension#data-science#productivity-tools
Stars831
Forks69
Last commit2 years ago
statrs
statrsRust

A comprehensive statistical computation library for Rust, providing distributions, functions, and utilities for scientific computing.

#scientific-computing#data-science#statistics
Stars804
Forks109
Last commit1 month ago
JSAT
JSATJava

A pure Java machine learning library with no external dependencies, offering a wide collection of algorithms and parallel execution support.

#gpl-licensed#statistical-analysis#parallel-computing
Stars796
Forks207
Last commit3 years ago
PlantCV
PlantCVPython

An open-source image analysis software package for plant phenotyping using computer vision.

#image-analysis#science#agricultural-technology
Stars794
Forks284
Last commit5 days ago
PyTorch Frame
PyTorch FramePython

A modular deep learning framework for PyTorch to build neural networks on heterogeneous tabular data.

#data-science#deep-learning#data-frame
Stars786
Forks71
Last commit2 days ago
engsoccerdata
engsoccerdataR

An R package providing comprehensive historical soccer match datasets and analysis functions for European and MLS leagues.

#league-tables#football-statistics#soccer-data
Stars781
Forks190
Last commit3 months ago
Rgee
RgeeR

An R binding package for calling Google Earth Engine API from within R, integrating with the R spatial ecosystem.

#environmental-data#r-package#google-earth-engine
Stars775
Forks159
Last commit3 days ago
PrimeKG
PrimeKGJupyter Notebook

A biomedical knowledge graph integrating 20 resources to describe 17,080 diseases with over 4 million relationships across ten biological scales.

#biomedical-data#disease-modeling#therapeutics
Stars771
Forks151
Last commit2 years ago
jupynium.nvim
jupynium.nvimPython

A Neovim plugin that provides real-time, bidirectional synchronization with Jupyter Notebook using Selenium automation.

#interactive-computing#data-science#vim
Stars767
Forks17
Last commit1 month ago
datascience
datascienceJupyter Notebook

A Python library for introductory data science education, developed for Berkeley's Data 8 course.

#data-science#statistics#education
Stars764
Forks337
Last commit4 months ago
BreakoutDetection <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">
BreakoutDetection <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">C++

An R package for detecting statistically significant breakpoints in time series using robust energy statistics.

#statistical-analysis#r-package#data-science
Stars762
Forks179
Last commit
Dplython
DplythonPython

A Python library that brings R's dplyr data manipulation syntax to pandas DataFrames using a pipe operator.

#dplyr#python-library#data-science
Stars761
Forks52
Last commit9 years ago
aequitas
aequitasPython

An open-source toolkit for auditing bias and experimenting with fairness methods in machine learning models.

#bias#data-science#fairness
Stars760
Forks123
Last commit23 days ago
RHadoop
RHadoop

A collection of R packages for interacting with Hadoop ecosystems, enabling big data analysis from R.

#mapreduce#data-science#hbase
Stars760
Forks275
Last commit10 years ago
Apache Toree
Apache ToreeScala

A Jupyter Notebook kernel for interactive data exploration and analysis using Apache Spark with Scala.

#apache-spark#spark-integration#jupyter-kernel
Stars750
Forks226
Last commit5 days ago
tech.ml.dataset
tech.ml.datasetClojure

A high-performance, functional tabular data processing library for Clojure, similar to Python's Pandas or R's data.table.

#etl-pipeline#functional-programming#high-performance
Stars747
Forks34
Last commit14 days ago
JupyterWith
JupyterWithNix

A Nix-based framework for creating declarative and reproducible Jupyter environments with configurable kernels and extensions.

#data-science#reproducible-environments#jupyterlab
Stars739
Forks153
Last commit4 days ago
Interactive Web Plotting with Bokeh
Interactive Web Plotting with BokehJupyter Notebook

A collection of Jupyter notebooks providing examples and tutorials for the Bokeh interactive visualization library.

#data-science#interactive-plots#python
Stars734
Forks663
Last commit2 years ago
MLEM
MLEMPython

A tool to package, serve, and deploy any ML model on any platform using a GitOps approach.

#model-packaging#deployment#developer-tools
Stars718
Forks42
Last commit2 years ago
MeTA
MeTAC++

A modern C++ toolkit for text retrieval and analysis, featuring indexing, ranking, topic modeling, classification, and language models.

#information-retrieval#text-classification#graph-algorithms
Stars714
Forks237
Last commit3 years ago
Reproducible Experiment Platform (REP)
Reproducible Experiment Platform (REP)Jupyter Notebook

IPython-based environment for reproducible machine learning research with unified wrappers for multiple ML libraries.

#parallel-computing#data-science#experiment-tracking
Stars700
Forks148
Last commit
vecstack
vecstackPython

A Python package for stacking (stacked generalization) with both functional and scikit-learn compatible APIs.

#ensemble-learning#blending#stacked-generalization
Stars699
Forks81
Last commit7 months ago
Awesome Data Science Ideas
Awesome Data Science Ideas

A curated list of proven AI use cases that generate business value across departments and industries.

#use-cases#data-science#business-intelligence
Stars696
Forks90
Last commit2 years ago
perpetual
perpetualRust

A hyperparameter-free gradient boosting machine with a simple budget parameter, built for high performance with Rust and bindings for Python and R.

#causal-ml#gbdt#data-science
Stars691
Forks40
Last commit2 months ago
PySpark Cheatsheet
PySpark Cheatsheet

A quick reference guide to the most commonly used patterns and functions in PySpark SQL.

#apache-spark#reference-guide#data-science
Stars682
Forks210
Last commit3 years ago
PreviousPage 9 of 14

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
8 years ago
1 year ago
Next
#Machine Learning288
#Python245
#Deep Learning84
#Data Analysis79
#Data Visualization79
#Statistics61
#Python Library55
#Jupyter Notebook53
#R52
#Jupyter49
#Scikit Learn48
#Pandas43