A distributed map-reduce framework for parallel computations over large datasets on unreliable computer clusters.
Disco is a distributed map-reduce framework for parallel computations over large datasets on unreliable computer clusters. It abstracts away technical complexities like communication protocols, load balancing, and fault tolerance, allowing developers to process massive data with minimal code.
Data engineers and scientists who need to analyze large datasets across distributed systems without managing low-level cluster infrastructure.
Developers choose Disco for its simplicity in writing distributed jobs, strong fault tolerance, and seamless integration with Python data science tools, making it easier to focus on data analysis rather than distributed systems engineering.
a Map/Reduce framework for distributed computing
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Disco is an open-source alternative to the following products:
Apache Spark is an open-source unified analytics engine for large-scale data processing, providing high-level APIs in Java, Scala, Python, and R.
Hadoop is an open-source framework for distributed storage and processing of large data sets across clusters of computers using simple programming models.