Showing 36 of 506 projects
ADO.NET provider and native bindings for DuckDB, enabling C# applications to interact with the in-process analytical database.
Capture, analyze, and transform messy Jupyter notebooks into production data pipelines with just two lines of code.
A curated guide to essential R packages organized by their role in the data science workflow.
A Python library for comparing Pandas, Polars, Spark, and Snowpark DataFrames with detailed reporting and flexible matching.
A curated collection of free resources to help deepen your understanding of the R programming language.
An R package providing a lightweight frontend to use Apache Spark for distributed data processing from R.
A Julia package for fitting linear and generalized linear models with comprehensive statistical functionality.
A comprehensive roadmap chart and resource guide for aspiring data scientists, based on insights from Silicon Valley tech companies.
An R package that simplifies data import and export by automatically selecting the correct function based on file extension.
A tutorial series comparing how to implement data science concepts and build applications in both Python and R ecosystems.
A VS Code extension for visually exploring, cleaning, and transforming tabular data with automatic Pandas code generation.
A deep learning system that classifies food images into 230 categories and retrieves matching recipes using convolutional neural networks.
A Python library providing evaluation metrics and diagnostic tools for recommender systems.
An optimized distributed gradient boosting library for fast and accurate machine learning on large datasets.
A simplified Keras-like framework for PyTorch that reduces boilerplate code for training neural networks.
A collection of IPython notebooks containing machine learning experiments and examples using scikit-learn and related Python libraries.
A Python library for generating high-quality synthetic tabular data using GANs, diffusion models, and large language models.
A tidy API for graph manipulation in R, providing dplyr verbs and igraph algorithms for network analysis.
A machine learning integrations library for TypeDB, enabling graph algorithms and Graph Neural Networks on strongly-typed graph data.
A modern, high-performance technical analysis library built in Rust with Python and WebAssembly bindings.
An R package that automates exploratory data analysis and data treatment with one-line reports and visualizations.
An engine for ML/data tracking, visualization, explainability, drift detection, and dashboards, integrated with Polyaxon.
A Neovim plugin providing language support, code execution, and preview features for working with Quarto documents.
An open-source MLOps framework for defining and deploying machine learning and LLM workloads across any cloud infrastructure.
A curated collection of open data sources across government, academic, and private sectors for data science and research.
A lightweight Python tool for generating rich summary statistics of pandas and Polars dataframes directly in the console.
An AutoML framework that generates and customizes machine learning pipelines using declarative JSON-AI syntax.
A Julia package providing metaprogramming macros to simplify DataFrame manipulation with a more concise syntax.
A high-performance data profiler for discovering and validating complex patterns like functional dependencies, inclusion dependencies, and association rules.
A high-performance data profiler for discovering and validating complex patterns in datasets, enabling data cleaning and quality analysis.
A pure Go library for making predictions with Gradient Boosting Regression Trees models from LightGBM, XGBoost, and scikit-learn.
An overlay companion for pandas that provides real-time hints and tips to improve data analysis code.
An open-source machine learning solution for the Home Credit Default Risk Kaggle competition, providing reproducible code and experiments.
F# kernel for Jupyter notebooks, enabling interactive data science and exploration with F#.
A SQL GUI extension for JupyterLab that enables point-and-click database exploration and query execution.
A Python library for class-imbalanced ensemble learning with 30+ algorithms, built on scikit-learn.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.