Data Science

624 projects

Showing 36 of 624 projects

A curated guide to learning machine learning with Python and Jupyter Notebook, featuring courses, notebooks, and practical resources.

#beginner-friendly#data-science#mlops

Stars11.4k

Forks1.9k

Last commit4 years ago

Dive into Machine Learning

A curated guide to learning machine learning with Python and Jupyter Notebook, featuring hands-on tutorials, courses, and ethical considerations.

#data-science#deep-learning#mlops

Stars11.4k

Forks1.9k

Last commit4 years ago

KedroPython

A Python framework for creating reproducible, maintainable, and modular data engineering and data science pipelines.

#agentic-workflow#hacktoberfest#agentic-ai

An automated machine learning library that trains and deploys high-accuracy models for tabular, text, image, and time series data with minimal code.

#ensemble-learning#python-library#data-science

Stars10.6k

Forks1.2k

Last commit1 day ago

Machine Learning InterviewsHTML

A practical booklet covering the four main steps of designing machine learning systems with 27 interview questions.

#data-science#machine-learning-production#production-ml

A declarative statistical visualization library for Python built on Vega-Lite.

#declarative#vega-lite#statistical graphics

Stars10.4k

Forks863

Last commit4 days ago

modinPython

A drop-in replacement for pandas that scales data analysis workflows to use all CPU cores and handle out-of-memory datasets.

#parallel-computing#distributed#data-science

Stars10.4k

Forks676

Last commit5 months ago

Data Science Interviews QuestionsHTML

A community-driven collection of data science interview questions and answers covering theory, technical skills, and probability.

#community-driven#data-science#python

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

#hyperparameter-optimization#parameter-tuning#data-science

Stars10.0k

Forks1.6k

Last commit10 months ago

TPOTJupyter Notebook

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

#hyperparameter-optimization#parameter-tuning#feature-selection

Stars10.0k

Forks1.6k

Last commit10 months ago

Template folder structure for organizing Data Science projectsPython

A standardized, flexible project template for data science work using Cookiecutter to structure reproducible projects.

#ai#data-science#project-template

A Python library for anomaly detection across tabular, time series, graph, text, image, and audio data. 60+ detectors, benchmark-backed ADEngine orchestration, and an agentic workflow for AI agents.

#novelty detection#autoencoder#anomaly

Stars9.9k

Forks1.5k

Last commit4 days ago

Forecasting with sktimePython

A unified Python framework for machine learning with time series, offering scikit-learn compatible tools for forecasting, classification, clustering, and more.

#hacktoberfest#data-science#classification

A unified Python framework for machine learning with time series, offering scikit-learn compatible tools for forecasting, classification, clustering, and more.

#hacktoberfest#data-science#classification

An open-source, low-code Python library that automates end-to-end machine learning workflows.

#data-science#low-code#automl

A GPU-accelerated DataFrame library for tabular data processing, part of the RAPIDS data science suite.

#cudf#cuda#apache-arrow

Stars9.7k

Forks1.1k

Last commit18 hours ago

tflearnPython

A modular deep learning library providing a higher-level API for TensorFlow to speed up experimentation.

#neural-network#data-science#deep-learning

Stars9.6k

Forks2.4k

Last commit2 years ago

DartsPython

A Python library for user-friendly forecasting and anomaly detection on time series, from ARIMA to deep neural networks.

#python-library#backtesting#data-science

Stars9.5k

Forks1.0k

Last commit3 days ago

GoLearnGo

A batteries-included machine learning library for Go with a scikit-learn inspired interface.

#data-science#model-evaluation#classification

Stars9.4k

Forks1.2k

Last commit2 years ago

tsfreshJupyter Notebook

Automatically extracts and selects relevant features from time series data for machine learning tasks.

#data-science#signal-processing#python

Stars9.3k

Forks1.3k

Last commit18 days ago

Financial Machine LearningPython

A curated list of practical financial machine learning tools, applications, and research repositories.

#algorithmic-trading#finance#data-science

Stars8.7k

Forks1.4k

Last commit1 year ago

vaexPython

A high-performance Python DataFrame library for lazy out-of-core processing and visualization of billion-row datasets at interactive speeds.

#out-of-core#python-dataframe#apache-arrow

Stars8.5k

Forks603

Last commit3 months ago

docker-stacksPython

A collection of ready-to-run Docker images containing Jupyter applications and interactive computing tools.

#scientific-computing#containerization#jupyterhub

Stars8.4k

Forks3.0k

Last commit4 days ago

jupyter/docker-stacks/pyspark-notebookPython

A collection of ready-to-run Docker images containing Jupyter applications and interactive computing tools.

#scientific-computing#containerization#jupyterhub

A multi-user server that spawns, manages, and proxies multiple instances of single-user Jupyter notebook servers.

#jupyterhub#authentication#data-science

Stars8.3k

Forks2.1k

Last commit22 hours ago

TidyTuesdayHTML

A weekly social data project providing real-world datasets for practicing data tidying, visualization, and analysis.

#julia#data-science#quarto

Stars8.3k

Forks2.6k

Last commit3 days ago

Introduction to Machine Learning with PythonJupyter Notebook

Code and Jupyter notebooks for the book 'Introduction to Machine Learning with Python' by Andreas Mueller and Sarah Guido.

#data-science#python#hands-on-tutorials

Stars8.1k

Forks4.7k

Last commit

Machine Learning Cheat SheetTeX

A comprehensive cheat sheet with classical equations and diagrams for machine learning knowledge recall and interview preparation.

#mathematics#data-science#education

Stars8.0k

Forks1.3k

Last commit

evidentlyJupyter Notebook

An open-source Python framework to evaluate, test, and monitor ML and LLM systems with 100+ built-in metrics.

#html-report#hacktoberfest#python-library

Stars7.7k

Forks886

Last commit2 months ago

FeaturetoolsPython

An open-source Python library for automated feature engineering using Deep Feature Synthesis.

#data-science#multi-table-data#automl

Stars7.7k

Forks916

Last commit7 days ago

FeaturetoolsPython

An open-source Python library for automated feature engineering using Deep Feature Synthesis.

#python-library#time-series-features#data-science

Stars7.7k

Forks916

Last commit7 days ago

h2oJupyter Notebook

An open-source, in-memory platform for distributed and scalable machine learning with support for a wide range of algorithms and big data technologies.

#h2o#ensemble-learning#random-forest

Stars7.5k

Forks2.0k

Last commit1 day ago