Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Tags
  3. Data Science

Data Science

506 projects

Showing 36 of 506 projects

the-elements-of-statistical-learning
the-elements-of-statistical-learningJupyter Notebook

Jupyter notebooks implementing algorithms, proofs, and summaries from 'The Elements of Statistical Learning' textbook.

#algorithm-implementation#data-science#statistics
Stars426
Forks84
Last commit
cdlib
cdlibPython

A Python meta-library for community detection in complex networks, implementing algorithms, fitness functions, and visualization.

#networkx#igraph#python-library
Stars425
Forks77
Last commit5 months ago
scikit-rebate
scikit-rebatePython

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for machine learning.

#relief-algorithms#feature-selection#data-science
Stars421
Forks72
Last commit3 years ago
jupyterlab_templates
jupyterlab_templatesJavaScript

A JupyterLab extension that adds support for creating notebooks from customizable templates.

#jupyterlab-extension#notebook#dataviz
Stars413
Forks70
Last commit2 days ago
LotteryPredict
LotteryPredictJupyter Notebook

A practical demo using LSTM neural networks with TensorFlow to predict lottery numbers.

#time-series-prediction#data-science#neural-networks
Stars412
Forks199
Last commit7 years ago
H2O
H2O

A curated list of research, applications, tutorials, and software built using the H2O open-source machine learning platform.

#h2o#data-science#deep-learning
Stars393
Forks72
Last commit3 years ago
TDSP-Utilities
TDSP-UtilitiesHTML

A collection of utilities and scripts for interactive data exploration, analysis, and automated modeling within Microsoft's Team Data Science Process.

#team-data-science-process#microsoft-r-server#reporting
Stars378
Forks266
Last commit
goro
goroGo

A high-level machine learning library for Go with a Keras-like API, built on Gorgonia.

#model-training#data-science#deep-learning
Stars374
Forks19
Last commit2 years ago
Clustering
ClusteringJulia

A Julia package providing comprehensive clustering algorithms and validation metrics for data analysis.

#julia#k-means#data-science
Stars373
Forks123
Last commit6 months ago
Time Series
Time SeriesJulia

A lightweight Julia toolkit for working with time series data, providing efficient data structures and operations.

#julia#quantitative-analysis#data-science
Stars369
Forks75
Last commit2 months ago
IElixir
IElixirJupyter Notebook

A Jupyter kernel that enables interactive computing with the Elixir programming language.

#elixir#jupyter-kernel#notebook
Stars368
Forks41
Last commit2 years ago
Upgini
UpginiPython

An intelligent data search and enrichment library for machine learning that automatically finds and adds relevant external features to ML pipelines.

#python-library#data-science#kaggle
Stars350
Forks26
Last commit3 days ago
hypergraph
hypergraphRust

A Rust library for creating directed hypergraphs where hyperedges can connect any number of vertices.

#parallel-computing#hypergraphs#relational-modeling
Stars346
Forks15
Last commit14 days ago
RPostgres
RPostgresR

A DBI-compliant R interface to PostgreSQL, rewritten in C++ for improved performance and reliability.

#database#postgres#r-package
Stars338
Forks81
Last commit15 days ago
scrape
scrapeElixir

An Elixir library for structured data extraction from websites, articles, and RSS/Atom feeds using information-retrieval techniques.

#readability#elixir#information-retrieval
Stars337
Forks41
Last commit5 years ago
skpro
skproPython

A scikit-learn compatible Python library for probabilistic regression, survival analysis, and probability distributions.

#ai#distributional-regression#probabilistic-machine-learning
Stars327
Forks187
Last commit6 days ago
Parris
ParrisPython

Automated infrastructure setup tool for training machine learning algorithms on AWS.

#devops#data-science#infrastructure-automation
Stars314
Forks23
Last commit3 months ago
Hivemall
HivemallJava

A scalable machine learning library that runs on Apache Hive, Spark, and Pig for distributed ML directly in SQL.

#apache-spark#data-science#apache-hive
Stars313
Forks111
Last commit3 years ago
Jupyter
JupyterJupyter Notebook

An OCaml kernel for Jupyter notebooks, providing an OCaml REPL with markdown/HTML documentation, LaTeX, and image embedding.

#functional-programming#jupyter-kernel#notebook
Stars311
Forks47
Last commit2 months ago
ChatGPT for Jupyter
ChatGPT for JupyterTypeScript

A browser extension that adds AI-powered code assistance to Jupyter Notebooks and Jupyter Lab using ChatGPT/GPT-4.

#jupyter-lab#debugging-tools#browser-extension
Stars308
Forks56
Last commit
gapminder
gapminderR

An R data package providing an excerpt from Gapminder's global development data for teaching and examples.

#gapminder-data#r-package#teaching
Stars304
Forks676
Last commit1 year ago
voyager
voyagerTypeScript

A JupyterLab extension to visualize CSV and JSON data interactively using Voyager 2.

#jupyterlab-extension#vega-voyager#data-science
Stars304
Forks35
Last commit3 years ago
Node-SVM
Node-SVMJavaScript

A Node.js library implementing Support Vector Machines (SVM) for classification and regression tasks.

#libsvm#data-science#classification
Stars301
Forks46
Last commit7 years ago
R package
R packageR

R package containing datasets and code examples for the book 'Statistical Analysis of Network Data with R, 2nd Edition'.

#statistical-analysis#igraph#r-package
Stars301
Forks188
Last commit6 years ago
Geni
GeniClojure

An idiomatic Clojure dataframe library that runs on Apache Spark, providing a seamless interface for data processing and machine learning.

#apache-spark#high-performance-computing#spark
Stars295
Forks26
Last commit2 years ago
SuperLearner
SuperLearnerR

An R package for automatic optimal predictor ensembling via cross-validation with dozens of machine learning algorithms.

#ensemble-learning#parallel-computing#hyperparameter-optimization
Stars294
Forks76
Last commit5 months ago
terraform-provider-iterative
terraform-provider-iterativeGo

A Terraform plugin for managing machine learning compute resources across AWS, GCP, Azure, and Kubernetes with spot instance recovery and auto-termination.

#developer-tools#devops#multi-cloud
Stars294
Forks30
Last commit
Maze
MazePython

An application-oriented Deep Reinforcement Learning framework for real-world decision problems, covering simulation to deployment.

#hydra-config#simulation#distributed
Stars290
Forks12
Last commit8 days ago
ipycytoscape
ipycytoscapePython

A Jupyter widget for interactive graph visualization using cytoscape.js in notebooks and JupyterLab.

#notebook-tools#data-science#cytoscape
Stars289
Forks61
Last commit2 months ago
JuliaCall
JuliaCallHTML

An R package that embeds Julia for high-performance numerical computing, enabling seamless interoperability between R and Julia.

#scientific-computing#julia#high-performance
Stars286
Forks41
Last commit13 days ago
mlr3book
mlr3bookTeX

Free online version of the 'Applied Machine Learning Using mlr3 in R' textbook, built with Quarto.

#bookdown#mlr3#data-science
Stars280
Forks70
Last commit6 days ago
rb-libsvm
rb-libsvmC++

Ruby language bindings for the LIBSVM library, enabling support vector machine (SVM) classification and regression in Ruby.

#libsvm#svm-learning#ruby-bindings
Stars279
Forks34
Last commit2 years ago
R Books List
R Books ListR

A curated, categorized collection of books about the R programming language for data science, statistics, and visualization.

#data-science#statistics#r-programming
Stars276
Forks29
Last commit8 years ago
R Books
R BooksR

A curated, categorized collection of books about the R programming language for data science, statistics, and visualization.

#data-science#statistics#r-programming
Stars276
Forks29
Last commit8 years ago
pandaset-devkit
pandaset-devkitJupyter Notebook

A Python devkit for loading, exploring, and manipulating the PandaSet, a large-scale autonomous driving dataset with LiDAR, camera, and annotations.

#lidar#autonomous-driving#sensor-fusion
Stars276
Forks73
Last commit2 years ago
Probably
ProbablySwift

A Swift library providing probability distributions and statistical functions for probabilistic computing.

#data-science#statistics#swift-package-manager
Stars268
Forks9
Last commit9 years ago
PreviousPage 11 of 15

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
3 months ago
7 years ago
2 years ago
1 year ago
Next
#Machine Learning288
#Python245
#Deep Learning84
#Data Analysis79
#Data Visualization79
#Statistics61
#Python Library55
#Jupyter Notebook53
#R52
#Jupyter49
#Scikit Learn48
#Pandas43