Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Categories
  3. Python
  4. Data Science

Data Science

The "Awesome Data Science" project is a curated collection of resources aimed at supporting individuals interested in the field of data science, which encompasses data analysis and machine learning techniques. This list includes a variety of resources such as libraries, frameworks, tutorials, datasets, and tools that facilitate the process of extracting meaningful insights from data. Whether you are a beginner looking to understand the basics or an experienced data scientist seeking advanced techniques, this list offers valuable information and tools to enhance your skills and projects. Dive into this collection to discover the vast possibilities within data science and elevate your analytical capabilities.

data-analysismachine-learningdata-visualizationstatisticspythonrbig-datadata-engineering
RSSView on GitHub
3.4k stars432 forks0 contributorsUpdated
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub

Table of Contents

46 sections · 302 projects

Purpose Machine Learning

18 projects
SciPy
scipy.org
scikit-learn
scikit-learn.org
PyCaret
PyCaret

An open-source, low-code Python library that automates end-to-end machine learning workflows.

Python9,8084 days ago
shogun
shogun

A unified and efficient machine learning toolbox with C++ core and multi-language interfaces, developed since 1999.

C++3,0672 years ago
xLearn
xLearn

A high-performance, easy-to-use, and scalable machine learning package for linear models, factorization machines, and field-aware factorization machines.

C++3,0952 years ago
cuML
cuML

A suite of GPU-accelerated machine learning algorithms with scikit-learn compatible APIs for 10-50x faster performance on large datasets.

Python5,2073 days ago
modAL
modAL

A modular active learning framework for Python built on scikit-learn, enabling rapid creation of custom workflows.

Python2,3522 years ago
mlpack
mlpack

A fast, header-only C++ machine learning library with bindings for Python, R, Julia, and Go.

C++5,6541 day ago
DLIB
DLIB

A modern C++ toolkit for machine learning, computer vision, and data analysis applications.

C++14,3931 month ago
MLxtend
MLxtend

A Python library providing extensions and utilities for data science and machine learning tasks.

Python5,1512 days ago
hyperlearn
hyperlearn

HyperLearn provides 2-2000x faster machine learning algorithms with 50% less memory usage, optimized for all hardware.

Jupyter Notebook2,4661 year ago
Reproducible Experiment Platform (REP)
Reproducible Experiment Platform (REP)

IPython-based environment for reproducible machine learning research with unified wrappers for multiple ML libraries.

Jupyter Notebook7001 year ago
scikit-multilearn
scikit-multilearn

A scikit-learn compatible Python module for multi-label classification tasks.

Python9532 years ago
pystruct
pystruct

A Python library for structured learning and prediction with max-margin methods and a scikit-learn compatible interface.

Python6704 years ago
sklearn-expertsys
sklearn-expertsys

A scikit-learn compatible classifier that produces human-interpretable decision rules instead of black box models.

Python4908 years ago
RuleFit
RuleFit

Python implementation of the RuleFit algorithm for interpretable machine learning predictions using rule ensembles.

Python4462 years ago
pyGAM
pyGAM

A Python library for building Generalized Additive Models (GAMs) with a scikit-learn-like API, emphasizing interpretability and performance.

Python1,0041 month ago
causalml
causalml

A Python package for uplift modeling and causal inference using machine learning algorithms to estimate treatment effects.

Python5,8593 days ago

Gradient Boosting

5 projects
XGBoost
XGBoost

A scalable, portable, and distributed gradient boosting library for efficient machine learning across multiple languages and platforms.

C++28,4513 days ago
CatBoost
CatBoost

A high-performance gradient boosting library with best-in-class handling of categorical features and support for CPU/GPU training.

C++8,9731 day ago
ThunderGBM
ThunderGBM

A fast GPU-accelerated library for training Gradient Boosting Decision Trees (GBDT) and Random Forests.

C++7131 year ago
NGBoost
NGBoost

A Python library for probabilistic prediction using natural gradient boosting, built on scikit-learn.

Jupyter Notebook1,8772 months ago
TensorFlow Decision Forests
TensorFlow Decision Forests

A TensorFlow library for training, serving, and interpreting decision forest models like Random Forests and Gradient Boosted Trees.

Python69420 days ago

Ensemble Methods

3 projects
ML-Ensemble
ml-ensemble.com
Stacking
Stacking

A Python library for stacked generalization (ensemble learning) that supports scikit-learn, XGBoost, and Keras models with out-of-fold prediction saving.

Python2308 years ago
vecstack
vecstack

A Python package for stacking (stacked generalization) with both functional and scikit-learn compatible APIs.

Python6997 months ago

Imbalanced Datasets

0 projects

Kernel Methods

6 projects
pyFM
pyFM

A Python implementation of Factorization Machines for recommendation and classification tasks using stochastic gradient descent with adaptive regularization.

Python9255 years ago
fastFM
fastFM

A Python library implementing Factorization Machines with a scikit-learn compatible API for regression, classification, and ranking tasks.

Python1,0883 years ago
tffm
tffm

TensorFlow implementation of arbitrary order (≥2) Factorization Machines for classification and regression tasks.

Jupyter Notebook7784 years ago
liquidSVM
liquidSVM

A fast and versatile implementation of support vector machines with integrated hyper-parameter selection and support for multiple learning scenarios.

C++716 years ago
scikit-rvm
scikit-rvm

A scikit-learn compatible Python implementation of the Relevance Vector Machine for sparse Bayesian learning.

Python2379 months ago
ThunderSVM
ThunderSVM

A fast Support Vector Machine (SVM) library that leverages GPUs and multi-core CPUs for high-performance machine learning.

C++1,6242 years ago

Related Awesome Lists

📦
Asyncio

The "Awesome Asyncio" project is a curated collection of resources dedicated to Asyncio, an asynchronous I/O framework in Python 3 that enables concurrent code execution using the async/await syntax. This list encompasses a variety of categories, including libraries, frameworks, tutorials, and tools that facilitate asynchronous programming. It is beneficial for both beginners looking to understand the fundamentals of asynchronous programming and experienced developers seeking advanced techniques and libraries to enhance their applications. Users can explore a wealth of information and tools that empower them to build efficient, non-blocking applications in Python.

5.0k
📦
Typing

The "Awesome Typing" project is a curated collection of resources focused on optional static typing in Python, a feature that enhances code quality and maintainability. This list encompasses type checkers, libraries, tools, tutorials, and community resources that support developers in implementing type hints and static analysis in their Python projects. Beneficial for both beginners looking to understand typing concepts and experienced developers aiming to improve their codebases, this collection provides valuable insights and practical tools. Users can explore various resources to effectively leverage typing in Python, ultimately leading to more robust and error-free applications.

2.0k
🐍
MicroPython

The "Awesome MicroPython" project is a curated collection of resources aimed at developers using MicroPython, a lean and efficient implementation of Python 3 specifically designed for microcontrollers. This list includes libraries, tools, tutorials, and community resources that help users leverage MicroPython's capabilities for embedded systems and IoT applications. Whether you are a beginner looking to get started with microcontroller programming or an experienced developer seeking advanced techniques, this list provides valuable insights and tools to enhance your projects. Dive into the world of MicroPython and discover how to bring your hardware projects to life with ease and efficiency.

1.8k
📦
Scientific Audio

The "Awesome Scientific Audio" project is a curated collection of resources focused on the intersection of audio technology and scientific research. This list encompasses a wide range of topics including audio analysis, sound synthesis, music perception, and signal processing, featuring libraries, software tools, research papers, and tutorials. It is particularly beneficial for researchers, audio engineers, and music technologists who seek to deepen their understanding of audio science and its applications. Whether you are exploring new algorithms or studying the psychological effects of sound, this collection provides a wealth of information to enhance your projects and research endeavors.

1.7k