Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Tags
  3. Data Science

Data Science

506 projects

Showing 36 of 506 projects

Jupyter Notebook REST API
Jupyter Notebook REST APIJupyter Notebook

Run Jupyter notebooks as REST API endpoints, enabling programmatic execution of notebook workflows.

#fastapi#uvicorn#data-science
Stars166
Forks11
Last commit3 years ago
RDataSets
RDataSetsR

Julia package providing easy access to 700+ standard R datasets for data analysis and statistical learning.

#julia#data-science#statistics
Stars166
Forks54
Last commit1 month ago
jRuby Mahout
jRuby MahoutRuby

A JRuby gem that provides Ruby-friendly access to Apache Mahout's scalable machine learning capabilities for recommendations.

#jruby#data-science#ruby-gem
Stars165
Forks14
Last commit10 years ago
checkpoint
checkpointR

An R package that installs packages from MRAN snapshots to ensure reproducible environments by locking package versions to a specific date.

#mran#version-control#r-package
Stars165
Forks37
Last commit4 years ago
DistributedR
DistributedRR

A scalable high-performance platform for R that enables large-scale machine learning, statistical analysis, and graph processing across clusters.

#statistical-analysis#graph-processing#high-performance-computing
Stars162
Forks54
Last commit
OpenBioLink
OpenBioLinkPython

A resource and evaluation framework for benchmarking link prediction models on large-scale, heterogeneous biomedical knowledge graphs.

#heterogeneous-graphs#knowledge-graphs#data-science
Stars161
Forks24
Last commit2 years ago
DataDeps
DataDepsJulia

A Julia package for reproducible data setup, automating dataset downloads and management for scientific computing.

#scientific-computing#dataset-download#julia
Stars160
Forks43
Last commit5 days ago
ClojisR
ClojisRClojure

A bridge library enabling Clojure to call R functions and use R objects for statistical computing and data science.

#data-science#rlang#r-language
Stars159
Forks11
Last commit8 days ago
A list of colleges and universities offering degrees in data science.
A list of colleges and universities offering degrees in data science.Python

A curated list of colleges and universities worldwide offering data science degrees.

#higher-education#universities#data-science
Stars159
Forks196
Last commit
open-solution-data-science-bowl-2018
open-solution-data-science-bowl-2018Python

Open-source implementation of the winning solution for the 2018 Data Science Bowl Kaggle competition using PyTorch and U-Net.

#data-science#kaggle#deep-learning
Stars155
Forks42
Last commit
open-solution-toxic-comments
open-solution-toxic-commentsPython

An open-source starter solution for the Kaggle Toxic Comment Classification Challenge, providing ready-to-use machine learning pipelines for detecting online harassment.

#ensemble-learning#text-classification#data-science
Stars155
Forks55
Last commit
MLKit
MLKitSwift

A simple machine learning framework written in Swift, currently focusing on regression algorithms.

#genetic-algorithms#machine-learning-library#data-science
Stars153
Forks14
Last commit7 years ago
crowdAI
crowdAIJavaScript

An open platform for hosting and participating in data science challenges focused on open science and open data.

#data-science#open-science#challenge-platform
Stars152
Forks30
Last commit3 years ago
Automatically Dockerize A Data-Science Repo As A Jupyter Server
Automatically Dockerize A Data-Science Repo As A Jupyter ServerShell

A GitHub Action to build and push Jupyter-enabled Docker images from data science repositories using repo2docker.

#actions#containerization#devops
Stars152
Forks34
Last commit
treebeard
treebeardTypeScript

A GitHub Action that automatically tests Jupyter notebooks from top to bottom using nbmake and pytest.

#scientific-computing#pytest#notebook
Stars151
Forks8
Last commit4 years ago
networkdata
networkdataR

An R package providing 2,260 network datasets in igraph format from diverse sources like social networks, animal interactions, and movie co-stars.

#igraph#r-package#data-science
Stars146
Forks16
Last commit1 month ago
RJulia
RJuliaC

An R package that provides a bidirectional interface for calling Julia code from R and mapping objects between both languages.

#scientific-computing#julia#r-package
Stars145
Forks23
Last commit8 years ago
Stats
StatsJulia

A convenience meta-package that loads essential Julia packages for statistics with a single import.

#julia#meta-package#data-science
Stars143
Forks14
Last commit3 years ago
elusion
elusionRust

A Rust DataFrame and data engineering library with PySpark/SQL-like syntax, built for business data pipelines with Microsoft stack integration.

#pyspark-alternative#sql-like#data-science
Stars141
Forks4
Last commit2 months ago
doddle-model
doddle-modelScala

An in-memory machine learning library for Scala with a scikit-learn-like API, built on Breeze for parallel and distributed systems.

#parallel-computing#in-memory#data-science
Stars139
Forks22
Last commit1 year ago
UCLA: Tools in Data Science (STATS 418)
UCLA: Tools in Data Science (STATS 418)HTML

Course materials for UCLA's STATS 418 - Tools in Data Science covering R packages, machine learning libraries, databases, and reproducibility tools.

#analytical-databases#data-science#r-programming
Stars138
Forks63
Last commit
steppy
steppyPython

A lightweight Python library for building reproducible machine learning pipelines with minimal interface constraints.

#experimentation#python-library#data-science
Stars136
Forks32
Last commit7 years ago
PAQUO
PAQUOPython

A Python library for interacting with QuPath, providing a pythonic interface to manage and analyze digital pathology projects.

#image-analysis#qupath#open-source
Stars135
Forks19
Last commit1 month ago
clj-ml
clj-mlClojure

A machine learning library for Clojure built on top of Weka, providing filters, classifiers, regression, and clustering algorithms.

#data-science#classification#weka
Stars134
Forks20
Last commit4 years ago
Numsw
NumswSwift

A Swift library for numerical computing with numpy-like APIs and Jupyter-like playground notebooks.

#scientific-computing#data-science#linearalgebra
Stars132
Forks9
Last commit8 years ago
Azure Machine Learning With GitHub Actions
Azure Machine Learning With GitHub ActionsPython

A GitHub template for automating machine learning workflows on Azure using GitHub Actions.

#devops#azure#data-science
Stars131
Forks90
Last commit4 years ago
ipychart
ipychartPython

A Python library that brings Chart.js interactive charts to Jupyter notebooks with a familiar API.

#python-library#data-science#jupyter
Stars131
Forks11
Last commit1 year ago
enlighten-apply
enlighten-applySAS

Example code and materials demonstrating practical applications of SAS machine learning techniques.

#sas-aiml#educational-resources#data-science
Stars130
Forks111
Last commit2 years ago
ML_Tables
ML_TablesSAS

Example code and materials demonstrating practical applications of SAS machine learning techniques.

#statistical-analysis#sas-aiml#data-science
Stars130
Forks111
Last commit2 years ago
Reco4PHP
Reco4PHPPHP

A PHP framework for building complex recommendation engines on top of Neo4j graph databases.

#personalization#data-science#recommendation-engine
Stars129
Forks22
Last commit3 years ago
nteract
nteractRust

A desktop application for interactive computing with Jupyter notebooks, supporting multiple kernels and rich outputs.

#desktop-application#scientific-computing#interactive-computing
Stars126
Forks7
Last commit1 day ago
themis-ml
themis-mlJupyter Notebook

A Python library implementing fairness-aware machine learning algorithms for measuring and mitigating discrimination in predictive models.

#algorithmic-bias#data-science#machine-learning-algorithms
Stars126
Forks27
Last commit
open-solution-salt-identification
open-solution-salt-identificationPython

An open-source benchmark solution for the Kaggle TGS Salt Identification Challenge using semantic segmentation.

#pipeline-framework#data-science#pipeline
Stars121
Forks44
Last commit
xgb
xgbRuby

A Ruby interface to XGBoost, providing high-performance gradient boosting for machine learning tasks.

#data-science#ffi#classification
Stars120
Forks8
Last commit2 months ago
MachineLearning
MachineLearningJulia

A Julia library providing a consistent API for common machine learning algorithms, designed for practitioners working with in-memory datasets.

#random-forest#julia#ml-library
Stars119
Forks26
Last commit10 years ago
zenoh-flow
zenoh-flowRust

A declarative data-flow programming framework built on Zenoh for building applications that span from cloud to edge devices.

#robotics#iot#dataflow-programming
Stars119
Forks25
Last commit1 year ago
PreviousPage 13 of 15

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
10 years ago
5 years ago
4 years ago
4 years ago
3 months ago
9 years ago
5 years ago
5 years ago
Next
#Machine Learning288
#Python245
#Deep Learning84
#Data Analysis79
#Data Visualization79
#Statistics61
#Python Library55
#Jupyter Notebook53
#R52
#Jupyter49
#Scikit Learn48
#Pandas43