A distributed computation system written in Go for parallel and cluster processing, similar to Hadoop MapReduce and Spark.
Glow is a distributed computation system written in pure Go that enables parallel and distributed data processing. It provides a library for building data pipelines that can run across multiple threads or scale to clusters of machines, solving the problem of efficiently processing large datasets without complex infrastructure setup.
Go developers and data engineers who need to process large datasets in parallel or across distributed systems, particularly those looking for Hadoop/Spark alternatives in the Go ecosystem.
Developers choose Glow for its simplicity, minimal resource footprint, and pure Go implementation that eliminates external dependencies while providing familiar MapReduce-style operations with easy cluster deployment.
Glow is an easy-to-use distributed computation system written in Go, similar to Hadoop Map Reduce, Spark, Flink, Storm, etc. I am also working on another similar pure Go system, https://github.com/chrislusf/gleam , which is more flexible and more performant.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Agents consume only around 5.5MB of memory, making deployment efficient on various servers without significant overhead, as highlighted in the cluster setup section.
Does not require complex infrastructure like Zookeeper or HDFS; can be set up with a single binary and script, enabling easy scaling from local to distributed environments.
Provides a chainable interface with operations like Filter, Map, and Reduce, allowing developers to build data processing flows with concise and readable code, as shown in the tutorial.
Can generate dot files for flow diagrams via the -glow.flow.plot flag, helping users understand task distribution and optimize performance, with examples provided in the README.
Focuses on basic MapReduce operations and lacks built-in support for complex data processing like streaming windows or machine learning algorithms, which might require workarounds.
As a niche project in Go, it has fewer integrations, libraries, and community support compared to established frameworks like Apache Spark, potentially limiting resources for troubleshooting.
Relies on a wiki and mailing list with basic examples, which may not cover advanced use cases or provide comprehensive guidance for production deployments.
glow is an open-source alternative to the following products:
Apache Spark is an open-source unified analytics engine for large-scale data processing, providing high-level APIs in Java, Scala, Python, and R.
Apache Storm is a distributed real-time computation system for processing large volumes of high-velocity data, commonly used for stream processing.
Hadoop MapReduce is a programming model and software framework for processing vast amounts of data in parallel on large clusters of commodity hardware.