Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Tags
  3. Data Science

Data Science

506 projects

Showing 36 of 506 projects

Knet.jl
Knet.jlJupyter Notebook

A deep learning framework for Julia with GPU support and automatic differentiation using dynamic computational graphs.

#research-tool#knet#julia
Stars1.4k
Forks224
Last commit1 year ago
awesome-python-chemistry
awesome-python-chemistry

A curated list of awesome Python frameworks, libraries, software, and resources for chemistry and cheminformatics.

#scientific-computing#cheminformatics#simulation
Stars1.4k
Forks229
Last commit8 months ago
amphi-etl
amphi-etlTypeScript

A visual, low-code data preparation tool that generates Python code for ETL, reporting, and AI-assisted workflows.

#jupyterlab-extension#analytics-automation#datatransformation
Stars1.4k
Forks105
Last commit1 day ago
CV-pretrained-model
CV-pretrained-model

A curated collection of open-source computer vision pre-trained models across TensorFlow, Keras, PyTorch, Caffe, and MXNet frameworks.

#mxnet#data-science#deep-learning
Stars1.4k
Forks192
Last commit5 years ago
sparkmagic
sparkmagicPython

Jupyter magics and kernels for interactively working with remote Spark clusters via Livy, Lighter, or Ilum.

#apache-spark#spark#notebook
Stars1.4k
Forks446
Last commit9 months ago
awesome-earthobservation-code
awesome-earthobservation-codeHTML

A curated collection of tools, tutorials, code, and resources for Earth Observation and geospatial satellite imagery analysis.

#satellite-data#geospatial-data#google-earth-engine
Stars1.4k
Forks247
Last commit
Awesome Data Analysis
Awesome Data Analysis

A curated collection of 500+ resources for data analysis and data science, covering Python, SQL, ML, visualization, roadmaps, and interview prep.

#educational-resources#data-science#statistics
Stars1.3k
Forks192
Last commit6 days ago
Intel(R) Extension for Scikit-learn
Intel(R) Extension for Scikit-learnPython

A free software AI accelerator that speeds up scikit-learn applications by 10-100x on CPUs and GPUs with no code changes.

#oneapi#ai-machine-learning#ai-accelerator
Stars1.3k
Forks187
Last commit
Dex
DexJavaScript

A Java/Groovy/JavaFX data visualization tool for ETL, machine learning, and publishing web visualizations.

#desktop-application#datavis#dataviz
Stars1.3k
Forks307
Last commit7 years ago
sklearn-porter
sklearn-porterPython

Transpile trained scikit-learn estimators to C, Java, JavaScript, Go, PHP, and Ruby for embedded systems and performance-critical applications.

#deployment#embedded-systems#sklearn
Stars1.3k
Forks169
Last commit2 years ago
Hopsworks - A Feature Store for ML and Data-Intensive AI
Hopsworks - A Feature Store for ML and Data-Intensive AIJava

A real-time AI lakehouse platform with a Python-centric feature store and comprehensive MLOps capabilities.

#azure#data-science#kserve
Stars1.3k
Forks158
Last commit1 year ago
dataframe-go
dataframe-goGo

A lightweight and intuitive Go library for data manipulation, statistics, and machine learning using DataFrames.

#data-science#statistics#dataframe
Stars1.3k
Forks99
Last commit4 years ago
Xcessiv
XcessivPython

A web-based tool for automated hyperparameter tuning and stacked ensemble creation in Python.

#ensemble-learning#hyperparameter-optimization#hyperparameter-tuning
Stars1.3k
Forks106
Last commit8 years ago
rusty-machine
rusty-machineRust

A general-purpose machine learning library for Rust, focusing on speed and ease of use with minimal dependencies.

#data-science#statistics#neural-networks
Stars1.3k
Forks150
Last commit5 years ago
Awesome Random Forest (GitHub)**
Awesome Random Forest (GitHub)**

A curated list of resources for random forest and other tree-based machine learning methods.

#random-forest#ensemble-methods#data-science
Stars1.2k
Forks333
Last commit2 years ago
mlforecast
mlforecastPython

A Python framework for scalable time series forecasting using machine learning models, designed for production environments.

#data-science#time-series-forecasting#production-ml
Stars1.2k
Forks125
Last commit4 days ago
ml-tooling/best-of-jupyter, "Notebook Environments"
ml-tooling/best-of-jupyter, "Notebook Environments"

A ranked list of 300+ awesome Jupyter Notebook, Hub, and Lab projects (extensions, kernels, tools) updated weekly.

#jupyterlab-extension#developer-tools#jupyterhub
Stars1.2k
Forks92
Last commit4 days ago
NFStream
NFStreamPython

A flexible Python framework for fast network flow data analysis, offering encrypted application identification, statistical feature extraction, and extensibility via plugins.

#pcap#data-science#network-monitoring
Stars1.2k
Forks143
Last commit3 days ago
plotly-resampler
plotly-resamplerPython

A Python library that adds dynamic data aggregation to Plotly figures for scalable visualization of large time series.

#large-datasets#data-science#downsampling
Stars1.2k
Forks74
Last commit6 months ago
Distributions
DistributionsJulia

A comprehensive Julia package for probability distributions, providing properties, PDFs, sampling, and maximum likelihood estimation.

#julia#data-science#statistics
Stars1.2k
Forks440
Last commit5 days ago
molten-nvim
molten-nvimPython

A Neovim plugin for interactively running code with Jupyter kernels, providing a REPL and notebook-like experience directly in the editor.

#code-runner#jupyter-kernel#notebook
Stars1.2k
Forks63
Last commit3 months ago
PyMC-Marketing
PyMC-MarketingPython

A Bayesian marketing analytics toolbox for Media Mix Modeling (MMM), Customer Lifetime Value (CLV), and customer choice analysis.

#mmm#probabilistic-modeling#media-mix-modeling
Stars1.2k
Forks381
Last commit1 day ago
Covid-19
Covid-19Python

A cleaned and normalized time series dataset of global COVID-19 confirmed cases, deaths, and recoveries, updated daily.

#epidemiology#data-cleaning#data-science
Stars1.2k
Forks600
Last commit2 months ago
renv <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">
renv <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">R

renv creates isolated, portable, and reproducible project environments for R by managing private package libraries and lockfiles.

#environment-management#lockfile#r-package
Stars1.2k
Forks165
Last commit
PyCall
PyCallC

A Ruby library that enables direct calling of Python functions and modules with automatic type conversion.

#ruby-python-bridge#scientific-computing#rubydatascience
Stars1.1k
Forks88
Last commit1 month ago
SystemML
SystemMLJava

An open-source machine learning system for the end-to-end data science lifecycle from data preparation to model serving.

#federated-learning#apache-spark#data-science
Stars1.1k
Forks536
Last commit1 day ago
Datumbox
DatumboxJava

An open-source Java framework for rapid development of machine learning and statistical applications with large dataset support.

#regression-analysis#statistical-analysis#large-datasets
Stars1.1k
Forks280
Last commit2 years ago
mlr3 <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">
mlr3 <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">R

A modern, object-oriented machine learning framework for R, providing efficient building blocks for ML workflows.

#hyperparameter-tuning#model-training#r-package
Stars1.1k
Forks98
Last commit
GoNB
GoNBGo

A Go kernel for Jupyter notebooks that compiles each cell for fast execution and full Go compatibility.

#gonb#jupyter-kernel#notebook
Stars1.0k
Forks56
Last commit1 day ago
graspologic
graspologicPython

A Python package providing specialized statistical algorithms for graph and network analysis.

#graph#networks#data-science
Stars1.0k
Forks171
Last commit6 months ago
pyGAM
pyGAMPython

A Python library for building Generalized Additive Models (GAMs) with a scikit-learn-like API, emphasizing interpretability and performance.

#scientific-computing#data-science#interpretable-ai
Stars1.0k
Forks288
Last commit1 month ago
SciRuby
SciRubyRuby

A meta gem that bundles scientific computing and visualization libraries for Ruby, enabling data analysis and plotting.

#scientific-computing#data-science#statistics
Stars1.0k
Forks79
Last commit6 years ago
OpenMetricLearning
OpenMetricLearningPython

A PyTorch-based framework for training and validating models that produce high-quality embeddings for metric learning and retrieval tasks.

#hacktoberfest#similarity-learning#image-retrieval
Stars991
Forks78
Last commit
sparklyr
sparklyrR

An R interface for Apache Spark that enables distributed data processing, machine learning, and SQL queries using familiar R syntax.

#apache-spark#distributed#dplyr
Stars970
Forks308
Last commit25 days ago
scikit-multilearn
scikit-multilearnPython

A scikit-learn compatible Python module for multi-label classification tasks.

#scikit#scipy#data-science
Stars953
Forks175
Last commit2 years ago
TensorFlow Scala
TensorFlow ScalaScala

A strongly-typed Scala API for TensorFlow, providing functionality similar to the official Python API with additional features.

#data-science#deep-learning#neural-networks
Stars940
Forks91
Last commit4 years ago
PreviousPage 8 of 15

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
26 days ago
1 day ago
6 days ago
8 days ago
6 months ago
Next
#Machine Learning288
#Python245
#Deep Learning84
#Data Analysis79
#Data Visualization79
#Statistics61
#Python Library55
#Jupyter Notebook53
#R52
#Jupyter49
#Scikit Learn48
#Pandas43