Disco — Distributed Map-Reduce Framework | Open Awesome

Home
Machine Learning
Disco

Disco

BSD-3-ClauseErlang

A distributed map-reduce framework for parallel computations over large datasets on unreliable computer clusters.

1.6k stars242 forks0 contributors

What is Disco?

Disco is a distributed map-reduce framework for parallel computations over large datasets on unreliable computer clusters. It abstracts away technical complexities like communication protocols, load balancing, and fault tolerance, allowing developers to process massive data with minimal code.

Target Audience

Data engineers and scientists who need to analyze large datasets across distributed systems without managing low-level cluster infrastructure.

Value Proposition

Developers choose Disco for its simplicity in writing distributed jobs, strong fault tolerance, and seamless integration with Python data science tools, making it easier to focus on data analysis rather than distributed systems engineering.

Overview

a Map/Reduce framework for distributed computing

Use Cases

Best For

Processing and analyzing terabyte-scale datasets across distributed clusters
Implementing fault-tolerant map-reduce jobs without managing infrastructure

Related Projects

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub

Integrating distributed computing with Python data science workflows

Running parallel computations on unreliable hardware networks

Educational projects teaching distributed systems and map-reduce concepts

Batch processing jobs where Hadoop or Spark might be overkill

Open Source Alternative To

Disco is an open-source alternative to the following products:

Apache Spark

Apache Spark is an open-source unified analytics engine for large-scale data processing, providing high-level APIs in Java, Scala, Python, and R.

Hadoop

Hadoop is an open-source framework for distributed storage and processing of large data sets across clusters of computers using simple programming models.

Quick Stats

Stars1,631

Forks242

Contributors0

Open Issues129

Last commit8 years ago

CreatedSince 2008

Built With

Links & Resources

Website

Included in

Auto-fetched 16 hours ago

PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Stars101,845

Forks28,449

Last commit15 hours ago

keras

Deep Learning for humans

Stars64,175

Forks19,743

Last commit19 hours ago

streamlit

Streamlit — A faster way to build and share data apps.

Stars45,308

Forks4,329

Last commit18 hours ago

gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

#distributed-computing

#map-reduce

Python

Machine Learning72.2k