An aggregating proxy that provides a single API endpoint and high availability for multiple Prometheus shards.
Promxy is an aggregating proxy for Prometheus that makes multiple shards appear as a single API endpoint. It solves the problem of operating Prometheus at scale by providing high availability through data merging and simplifying query federation across hosts. It requires no changes to existing Prometheus infrastructure.
DevOps engineers, SREs, and platform teams running multi-host Prometheus deployments who need simplified management, high availability, and unified querying.
Developers choose Promxy because it delivers a seamless HA solution without sidecars or infrastructure modifications, offers a single query endpoint for Grafana, and maintains performance parity with downstream Prometheus servers.
An aggregating proxy to enable HA prometheus
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Merges data from duplicate Prometheus hosts to fill gaps during upgrades or outages, enabling seamless HA without clustering support, as highlighted in the motivation section.
Provides a single Prometheus API endpoint for Grafana, eliminating multiple datasources and allowing globally aggregated PromQL queries across all configured instances.
Works as a proxy with no sidecars, custom builds, or modifications to Prometheus servers, requiring only configuration changes on Promxy itself.
Supports aggregation across mixed Prometheus-compatible endpoints, including other Promxy instances and VictoriaMetrics, as confirmed in the FAQ on downstreams.
Performs a complete scatter-gather to all server groups for every query, which can introduce latency and inefficiency, though optimization is planned per issue #2.
Recording rules require a remote_write endpoint to store metrics, as Promxy lacks a local time-series database, adding complexity for alerting and rule management.
By default, returns an error if a server group is unavailable, which might hinder fault tolerance in some setups, though configurable via ignore_error options.