A unified resource scheduler for co-scheduling batch, stateless, and stateful workloads in a single cluster to maximize resource utilization.
Peloton is a unified resource scheduler that co-schedules mixed workloads—such as batch, stateless, and stateful jobs—in a single compute cluster. It solves the problem of low resource utilization in large-scale environments by enabling elastic resource sharing, overcommitment, and workload preemption. Designed for web-scale operations, it handles millions of containers and tens of thousands of nodes efficiently.
Platform engineers, SREs, and infrastructure teams at web-scale companies managing large, heterogeneous clusters with diverse workload types. It is particularly relevant for organizations running big data and machine learning workloads alongside traditional services.
Developers choose Peloton for its ability to maximize cluster utilization through advanced features like hierarchical resource pools, GPU/gang scheduling, and cloud-agnostic deployment. Its scalability and support for mixed workloads make it a robust alternative to simpler schedulers in complex, high-demand environments.
Unified Resource Scheduler to co-schedule mixed types of workloads such as batch, stateless and stateful jobs in a single cluster for better resource utilization.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses hierarchical resource pools to dynamically allocate resources among teams, enabling better utilization in large clusters as highlighted in the README's feature list.
Supports GPU and gang scheduling for TensorFlow and dynamic resource allocation for Spark, making it ideal for big data and machine learning workloads, per the README's optimization features.
Designed to scale to millions of containers and tens of thousands of nodes, proven in web-scale environments like Uber, as stated in the README's scalability claims.
Can run in on-premise datacenters or public clouds, providing deployment versatility without vendor lock-in, emphasized in the project philosophy.
Requires managing four separate daemons and dependencies on Zookeeper for service discovery and Mesos for resource abstraction, increasing operational overhead and setup complexity.
Demands expertise in distributed systems and specific components like Cassandra for storage, which may deter teams without dedicated infrastructure resources.
Built on Mesos with a niche focus, Peloton has a smaller community and tooling compared to more popular schedulers like Kubernetes, potentially limiting support and integrations.