A high-performance distributed map/reduce system with DAG execution, written in Go, supporting standalone or distributed modes.
Gleam is a distributed execution system designed for high performance and efficiency, enabling users to define data processing flows as directed acyclic graphs (DAGs). It is built in Go and supports computations written in Go, Unix pipe tools, or any streaming programs, making it flexible and easy to customize. It solves the problem of building scalable data processing pipelines without the complexity of traditional big data frameworks.
Developers and data engineers who need to build distributed data processing pipelines, especially those working in Go environments or preferring lightweight, efficient systems over heavier frameworks like Apache Spark or Hadoop.
Developers choose Gleam for its simplicity, high performance, and memory efficiency, leveraging Go's concurrency and avoiding garbage collection issues by running executors in separate OS processes. Its flexibility allows flows to run standalone or distributed, with support for various data sources and easy customization through a simple Go codebase.
Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses pure Go mappers and reducers with concurrency, and merges multiple map-reduce steps for better performance, as stated in the README.
Executors run in separate OS processes to avoid GC issues, with automatic memory adjustment based on data size hints, keeping memory usage low (about 10 MB per agent).
Supports standalone or distributed runs, with adjustable in-memory or on-disk modes for streaming and back pressure, enabling fault tolerance via OnDisk persistence.
The Go codebase is simpler than Scala/Java/C++, allowing developers to easily write custom functions and plugins, as highlighted in the philosophy section.
The README admits that windowing functions (similar to Apache Beam/Flink) are in progress and SQL support is a todo, limiting use for complex streaming or query-based workflows.
Setting up a distributed cluster requires manual configuration of master and agent processes, which can be cumbersome compared to cloud-native or managed services.
As a smaller project with a 'just beginning' status, Gleam has fewer plugins and community support than established frameworks, which may impact integration and troubleshooting.
gleam is an open-source alternative to the following products: