Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Tags
  3. Data Science

Data Science

252 projects

Showing 36 of 252 projects

ArviZ
ArviZTeX

A Python library for exploratory analysis, diagnostics, and visualization of Bayesian models.

#bayesian-statistics#statistical-inference#data-science
Stars1.8k
Forks493
Last commit19 hours ago
Awesome Fraud Detection Research Papers
Awesome Fraud Detection Research PapersPython

A curated collection of academic papers on data mining and machine learning techniques for fraud detection across various domains.

#graph-neural-networks#fraud-checker#financial-security
Stars1.8k
Forks330
Last commit
blogdown
blogdownR

Create blogs and websites with R Markdown, integrating dynamic R code, graphics, and technical writing elements.

#website-generation#bookdown#knitr
Stars1.8k
Forks324
Last commit3 months ago
tidyverse
tidyverseR

A meta-package for installing and loading core R packages for data science that share common design principles.

#data-tidying#data-science#r packages
Stars1.8k
Forks294
Last commit10 months ago
tidyverse
tidyverseR

A collection of R packages for data science that share common design principles and work together seamlessly.

#dplyr#data-science#r packages
Stars1.8k
Forks294
Last commit10 months ago
AI Explainability 360
AI Explainability 360Python

An open-source Python toolkit providing a comprehensive collection of algorithms for interpreting and explaining machine learning models and datasets.

#codait#explainabil#python-library
Stars1.8k
Forks328
Last commit1 month ago
Variety
VarietyJavaScript

A lightweight MongoDB schema analyzer that reveals document structure, field frequencies, and data outliers.

#bson#devops#schema-analyzer
Stars1.8k
Forks242
Last commit9 hours ago
reticulate
reticulateR

A comprehensive R package that embeds Python within R sessions, enabling seamless interoperability between the two languages.

#conda#python-integration#r-package
Stars1.7k
Forks348
Last commit13 hours ago
IRkernel <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">
IRkernel <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">Jupyter Notebook

A native R kernel for Jupyter notebooks, enabling R programming within the Jupyter ecosystem.

#jupyter-kernel#notebook#data-science
Stars1.7k
Forks298
Last commit
thinking bayes
thinking bayesTeX

Python code and examples for Bayesian statistics from the book 'Think Bayes: Bayesian Statistics Made Simple'.

#scientific-computing#bayesian-statistics#educational
Stars1.7k
Forks1.9k
Last commit5 years ago
mlr
mlrR

A unified interface and infrastructure for machine learning in R, supporting classification, regression, clustering, and survival analysis.

#hyperparameter-tuning#feature-selection#r-package
Stars1.7k
Forks404
Last commit8 months ago
Auto ML
Auto MLPython

Automated machine learning library for production and analytics, handling feature engineering, model selection, and hyperparameter optimization.

#hyperparameter-optimization#machine-learning-library#data-science
Stars1.7k
Forks309
Last commit
hyperopt-sklearn
hyperopt-sklearnPython

Hyperopt-sklearn automates hyperparameter optimization and model selection for scikit-learn machine learning pipelines.

#hyperparameter-optimization#data-science#bayesian-optimization
Stars1.6k
Forks275
Last commit
boruta_py
boruta_pyPython

Python implementation of the Boruta all-relevant feature selection method with scikit-learn compatibility.

#statistical-analysis#random-forest#ensemble-methods
Stars1.6k
Forks267
Last commit5 months ago
Enterprise™
Enterprise™JavaScript

A satirical programming language designed to mock enterprise software development culture with intentionally cumbersome syntax and corporate jargon.

#buzzword-bingo#enterprise-software#programming-language
Stars1.6k
Forks37
Last commit
goml
gomlGo

A Go machine learning library with online learning capabilities and a variety of implemented models.

#text-classification#data-science#statistics
Stars1.6k
Forks135
Last commit3 years ago
Readings in Applied Data Science
Readings in Applied Data ScienceR

A curated reading list and syllabus for a Stanford discussion class on applied data science topics.

#applied-statistics#course-syllabus#discussion-class
Stars1.6k
Forks223
Last commit7 years ago
ChatGPT Prompts for Data Science
ChatGPT Prompts for Data Science

A curated collection of 60 ChatGPT prompts for data science tasks, from model building to code explanation.

#productivity#ai-assistant#data-science
Stars1.6k
Forks279
Last commit2 years ago
jupyterlab-git
jupyterlab-gitTypeScript

A JupyterLab extension for version control using Git, enabling Git operations directly within the JupyterLab interface.

#jupyterlab-extension#version-control#developer-tools
Stars1.6k
Forks399
Last commit7 days ago
imodels
imodelsJupyter Notebook

A Python package for concise, transparent, and accurate predictive modeling with sklearn-compatible interpretable models.

#ai#rule-based-models#data-science
Stars1.6k
Forks138
Last commit11 days ago
keras-contrib
keras-contribPython

A deprecated repository for community-contributed Keras extensions like layers, activations, and loss functions.

#experimental-features#python-library#data-science
Stars1.6k
Forks643
Last commit3 years ago
scikit-feature
scikit-featurePython

An open-source Python repository providing around 40 feature selection algorithms for machine learning applications.

#feature-selection#scipy#data-science
Stars1.6k
Forks442
Last commit1 year ago
Data Profiler
Data ProfilerPython

A Python library that automatically extracts schema, statistics, and sensitive entities (PII/NPI) from datasets.

#sensitive-data-detection#data-labels#python-library
Stars1.6k
Forks186
Last commit17 days ago
Quix Streams
Quix StreamsPython

A Python framework for building real-time data pipelines and event-driven microservices on Apache Kafka using a Streaming DataFrame API.

#stream-processing#streaming-data-processing#event-driven-architecture
Stars1.5k
Forks105
Last commit
ipyleaflet
ipyleafletTypeScript

Interactive maps in Jupyter notebooks using Leaflet.js with Python bindings.

#jupyterlab-extension#data-science#geospatial-visualization
Stars1.5k
Forks363
Last commit1 month ago
Optimus
OptimusPython

A Python library for agile data preparation workflows that works with Pandas, Dask, cuDF, Dask-cuDF, Vaex, and PySpark.

#data-cleaning#cudf#spark
Stars1.5k
Forks232
Last commit1 year ago
Julia
JuliaJulia

A curated, categorized directory of packages, libraries, and resources for the Julia programming language.

#programming-language#scientific-computing#julia
Stars1.5k
Forks203
Last commit2 years ago
PyCM
PyCMPython

A comprehensive Python library for generating and analyzing multi-class confusion matrices with extensive statistical metrics.

#classification-metrics#statistical-analysis#ai
Stars1.5k
Forks125
Last commit3 days ago
pyjanitor
pyjanitorPython

Python library providing clean, chainable functions for data cleaning and manipulation with pandas DataFrames.

#data-cleaning#hacktoberfest#python-library
Stars1.5k
Forks184
Last commit13 days ago
skforecast
skforecastPython

A Python library for time series forecasting using scikit-learn compatible machine learning models.

#data-science#lightgbm#python
Stars1.5k
Forks187
Last commit10 hours ago
skforecast
skforecastPython

A Python library for time series forecasting using scikit-learn compatible machine learning models.

#data-science#time-series-forecasting#catboost
Stars1.5k
Forks187
Last commit10 hours ago
eBay's TSV utilities
eBay's TSV utilitiesD

A suite of high-performance command line tools for filtering, summarizing, joining, and manipulating large tabular data files.

#delimited-files#command-line-tools#data-science
Stars1.5k
Forks83
Last commit3 years ago
dalex
dalexPython

A model-agnostic toolkit for exploring and explaining the behavior of complex machine learning models in R and Python.

#explainable-artificial-intelligence#xai#r-package
Stars1.5k
Forks169
Last commit3 months ago
igraph
igraphPython

A Python interface for the igraph library, enabling fast creation, manipulation, and analysis of large graphs and networks.

#scientific-computing#mathematics#python-library
Stars1.4k
Forks266
Last commit18 days ago
plumber
plumberR

An R package that converts R functions into web APIs using special code annotations.

#deployment#api#api-framework
Stars1.4k
Forks258
Last commit2 months ago
Knet.jl
Knet.jlJupyter Notebook

A deep learning framework for Julia with GPU support and automatic differentiation using dynamic computational graphs.

#research-tool#knet#julia
Stars1.4k
Forks226
Last commit1 year ago
PreviousPage 7 of 7

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
3 months ago
2 years ago
5 years ago
1 year ago
3 years ago
21 hours ago
#Machine Learning160
#Python154
#Deep Learning59
#Data Visualization50
#Python Library39
#Data Analysis38
#Scikit Learn34
#Statistics32
#Jupyter Notebook32
#Jupyter Notebooks27
#Jupyter26
#R25