Showing 36 of 252 projects
Fast tool for comparing datasets within or across SQL databases to identify differences.
An open-source, AI-first data notebook that extends Jupyter with a sleek UI, reactive execution, and native data integrations.
An extensible open-source toolkit for detecting, mitigating, and explaining bias in machine learning datasets and models.
A comprehensive collection of machine learning algorithms and mathematical utilities implemented in JavaScript for browser and Node.js.
Docker image providing the Python environment used by Kaggle Notebooks for data science competitions.
A Python package that automatically accelerates pandas and Modin DataFrame apply operations by choosing the fastest available method.
A curated collection of hands-on data science project ideas and resources for learning machine learning and AI concepts.
A collection of beginner-friendly TensorFlow tutorials using Jupyter Notebook, covering deep learning fundamentals and practical applications.
A pure Python library for survival analysis, modeling time-to-event data with censoring.
A Python tutorial and cookbook for implementing Bayesian modeling techniques using PyMC3.
A Python library for loading, shaping, embedding, and exploring large graphs with GPU-accelerated visualization and analytics.
A Jupyter Notebook kernel and interactive REPL for Go (golang) that enables interactive programming and data analysis.
A Python library for defining portable, modular, and testable data transformation DAGs with built-in lineage and metadata.
A lightweight Python library for creating portable, expressive, and testable data transformation DAGs with built-in lineage and metadata.
HyperLearn provides 2-2000x faster machine learning algorithms with 50% less memory usage, optimized for all hardware.
An intuitive Python library that adds single-line plotting functions for scikit-learn and other machine learning objects.
A concise mathematical reference covering essential topics in probability theory and statistics.
A modern R console with multiline editing, syntax highlighting, and improved REPL features.
A machine learning framework for developing high-frequency trading strategies using full orderbook tick data.
An open-source Python library for low-code data preparation, offering fast EDA, data cleaning, and collection from APIs and databases.
A Python library for feature engineering and selection with scikit-learn compatible transformers.
A curated list of libraries, tutorials, and resources for implementing machine learning in the Ruby programming language.
A curated list of awesome libraries, data sources, tutorials, and resources for machine learning using the Ruby programming language.
A collection of GPU-accelerated graph analytics libraries for creating, manipulating, and executing scalable graph algorithms.
An open-source Python library for probabilistic time series modeling with both frequentist and Bayesian inference methods.
A curated collection of R tutorials, packages, and resources for Data Science, NLP, and Machine Learning.
An embeddable C++ storage engine for dense and sparse multi-dimensional arrays, dataframes, and key-value stores.
A Python framework and Rust-based distributed processing engine for stateful event and stream processing.
A Julia machine learning framework providing a unified interface and meta-algorithms for over 200 models.
Automatically visualize any dataset with a single line of code, including data quality assessment and fixes.
A minimal benchmark comparing scalability, speed, and accuracy of popular open-source machine learning libraries for binary classification.
A high-performance Python package for fast, multi-threaded manipulation of large tabular datasets, inspired by R's data.table.
A curated list of awesome Apache Spark packages, libraries, and resources for data engineers and scientists.
A Python library for probabilistic prediction using natural gradient boosting, built on scikit-learn.
A flexible and fast package for in-memory tabular data manipulation and analysis in the Julia programming language.
An R package for estimating causal effects in time series using Bayesian structural time-series models.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.