Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Tags
  3. Data Science

Data Science

506 projects

Showing 36 of 506 projects

kaggle-blackbox
kaggle-blackboxMATLAB

A collection of scripts for training random forests and sparse filtering models on Kaggle datasets.

#random-forest#model-training#data-science
Stars116
Forks61
Last commit12 years ago
weightedcalcs
weightedcalcsPython

A pandas-based Python library for calculating weighted statistics like means, medians, standard deviations, and distributions.

#data-science#statistics#census-data
Stars113
Forks7
Last commit1 year ago
BayesPy
BayesPyHTML

Bayesian inference tools in Python for estimating Dirichlet priors and multinomial mixture models from discrete event data.

#mixture-models#gradient-descent#python-library
Stars110
Forks33
Last commit3 years ago
NitroFE
NitroFEPython

A Python feature engineering engine that internally manages past dependent values for continuous calculation of time-based features.

#features#technical-indicators#data-science
Stars109
Forks7
Last commit4 years ago
Pink Gorilla Notebook
Pink Gorilla NotebookClojure

A lightweight, extensible web-based notebook REPL for Clojure and ClojureScript with rich UI visualizations.

#clojurescript#reagent#notebook
Stars107
Forks10
Last commit5 years ago
Ark-Analysis
Ark-AnalysisJupyter Notebook

A Python toolbox for analyzing multiplexed imaging data, featuring segmentation, pixel/cell clustering, and spatial analysis.

#bioimage-analysis#data-science#deep-learning
Stars106
Forks30
Last commit5 months ago
rb-gsl
rb-gslC

A Ruby interface to the GNU Scientific Library (GSL) for numerical computing.

#scientific-computing#mathematics#data-science
Stars104
Forks50
Last commit2 years ago
Forecast the US demand for electricity
Forecast the US demand for electricityR

A dashboard for real-time tracking and 72-hour forecasting of US electricity demand using open-source tools.

#h2o#data-science#dashboard
Stars100
Forks14
Last commit4 years ago
dl4clj
dl4cljClojure

A Clojure wrapper for Deeplearning4j, providing idiomatic access to neural networks, data import, and distributed training.

#spark#wrapper-library#data-science
Stars99
Forks18
Last commit8 years ago
topik
topikPython

A high-level Python toolbox for topic modeling with easy-to-use functions and command-line interface.

#text-analysis#data-science#natural-language-processing
Stars93
Forks23
Last commit10 years ago
jupyterlab-tensorboard-pro
jupyterlab-tensorboard-proTypeScript

A TensorBoard JupyterLab plugin that integrates TensorBoard directly into JupyterLab with improved user experience and long-term maintenance.

#jupyterlab-extension#data-science#jupyterlab
Stars93
Forks11
Last commit
MonkeyLearn
MonkeyLearnR

Archived R package for accessing the Monkeylearn API for text classification and extraction.

#text-extraction#peer reviewed#text-classification
Stars92
Forks16
Last commit4 years ago
influxdbr
influxdbrR

An R package providing an interface to InfluxDB for fetching, writing, and managing time series data.

#database#r-package#data-science
Stars92
Forks34
Last commit1 year ago
Synthetic Adversarial Log Objects (SALO) | Splunk
Synthetic Adversarial Log Objects (SALO) | SplunkPython

A Python framework for generating synthetic log events without requiring actual infrastructure or actions.

#devops#data-science#synthetic-data
Stars91
Forks11
Last commit2 years ago
Urbansprawl
UrbansprawlPython

An open-source framework for calculating spatial urban sprawl indices and performing disaggregated population estimates using OpenStreetMap data.

#urban-planning#urban#urban-accessibility
Stars88
Forks20
Last commit7 years ago
Quarto tip a day
Quarto tip a dayJavaScript

A daily blog sharing practical Quarto tips for 30 days leading up to the rstudio::conf(2022) keynote.

#publishing#data-science#reproducible-research
Stars87
Forks26
Last commit
DockerDL
DockerDLDockerfile

A pre-configured Docker image with deep learning frameworks, data science tools, and GPU support for rapid environment setup.

#cuda#data-science#deep-learning
Stars85
Forks11
Last commit3 months ago
lightgbm
lightgbmRuby

A Ruby gem providing high-performance gradient boosting with LightGBM for machine learning tasks.

#model-training#data-science#ffi
Stars84
Forks6
Last commit1 month ago
liblinear-ruby
liblinear-rubyC++

Ruby interface to LIBLINEAR for machine learning classification and regression tasks using SWIG bindings.

#swig-bindings#data-science#classification
Stars82
Forks7
Last commit7 years ago
Qlik
Qlik

A curated collection of extensions, guides, blogs, and resources for Qlik Sense and QlikView developers.

#qlik-sense-extension#data-science#qlik
Stars78
Forks8
Last commit6 years ago
lambda-ml
lambda-mlClojure

A small machine learning library written in Clojure providing simple, concise implementations of ML algorithms.

#functional-programming#lisp#ml-library
Stars78
Forks9
Last commit7 years ago
Big-fish
Big-fishPython

A Python toolbox for analyzing smFISH microscopy images, including spot detection and cell segmentation.

#scientific-computing#data-science#smfish
Stars76
Forks24
Last commit2 years ago
spacetime
spacetimeR

R package providing classes and methods for handling and analyzing spatio-temporal data.

#environmental-data#r-package#data-science
Stars76
Forks20
Last commit1 year ago
libsvm
libsvmGo

A Go port of LIBSVM 3.14, providing support vector machine (SVM) algorithms for classification and regression.

#data-science#classification#go-library
Stars72
Forks11
Last commit10 years ago
py2neo
py2neoPython

A comprehensive Python client library and toolkit for working with Neo4j graph databases.

#database-driver#python-library#data-science
Stars69
Forks20
Last commit9 years ago
Quarto Devcontainer Feature
Quarto Devcontainer FeatureShell

A collection of Dev Container Features for adding Rocker Project and R-related functionality to development containers.

#conda#data-science#r-language
Stars69
Forks18
Last commit29 days ago
Synthia
SynthiaPython

A Python package for generating multidimensional synthetic data using Copula and fPCA models to preserve statistical properties.

#data-anonymization#fpca#finance
Stars68
Forks10
Last commit2 years ago
goldfish
goldfishR

An R package for statistical modeling of dynamic network data using actor-oriented and tie-based relational event models.

#rem#r-package#data-science
Stars66
Forks14
Last commit1 month ago
Common Crawl Jupyter notebooks
Common Crawl Jupyter notebooksJupyter Notebook

A collection of Jupyter notebooks for analyzing Common Crawl web archive data using columnar indexes and webgraph datasets.

#warc-files#data-science#web-archive-analysis
Stars66
Forks11
Last commit
kaggle_acquire-valued-shoppers-challenge
kaggle_acquire-valued-shoppers-challengePython

Feature generation code for the Kaggle Acquire Valued Shoppers Challenge, focusing on customer behavior prediction.

#data-science#kaggle-competition#customer-behavior
Stars66
Forks58
Last commit
ipyaggrid
ipyaggridJupyter Notebook

A Jupyter widget that integrates the powerful ag-Grid data grid into Jupyter notebooks for interactive data exploration.

#data-grid#notebook-tools#data-science
Stars65
Forks16
Last commit2 years ago
open-solution-ship-detection
open-solution-ship-detectionPython

An open-source solution for the Airbus Ship Detection Challenge, providing a benchmark and base for ship detection in satellite imagery.

#unet-image-segmentation#data-science#deep-learning
Stars65
Forks23
Last commit
Quantium Research
Quantium ResearchJupyter Notebook

A collection of quantitative trading research experiments exploring uncommon strategies and techniques through Jupyter notebooks.

#market-analysis#trading#algorithmic-trading
Stars65
Forks13
Last commit
Learning.js
Learning.jsJavaScript

A JavaScript library implementing logistic regression and C4.5 decision tree algorithms for machine learning in the browser and Node.js.

#browser-ml#data-science#logistic-regression
Stars65
Forks16
Last commit7 years ago
weka
wekaRuby

A JRuby gem providing Ruby interfaces for Weka's machine learning and data mining algorithms.

#jruby#weka-wrapper#data-science
Stars65
Forks8
Last commit5 months ago
dataframe
dataframeElixir

An Elixir library providing a DataFrame API similar to Python's Pandas and R's data.frame for data manipulation.

#functional-programming#elixir#data-science
Stars63
Forks7
Last commit7 years ago
PreviousPage 14 of 15

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
1 year ago
1 year ago
6 months ago
12 years ago
4 years ago
5 months ago
Next
#Machine Learning288
#Python245
#Deep Learning84
#Data Analysis79
#Data Visualization79
#Statistics61
#Python Library55
#Jupyter Notebook53
#R52
#Jupyter49
#Scikit Learn48
#Pandas43