Showing 36 of 285 projects
A curated list of awesome Site Reliability Engineering (SRE) and Production Engineering resources.
eBPF-powered network observability for Kubernetes, indexing L4/L7 traffic with full context and TLS decryption.
A cloud-native search engine optimized for observability data like logs and traces, offering sub-second search on cloud storage.
A collection of performance analysis tools for Linux using ftrace and perf_events to trace system activity with minimal dependencies.
A high-level tracing language for Linux that leverages eBPF for efficient system and application observability.
Kubernetes-native operator for deploying and managing Prometheus monitoring stacks using custom resources.
An open-source platform for building product integrations with AI, handling auth, proxy, and functions for 700+ APIs.
A Java-native, high-performance API gateway for microservices, offering service proxy, protocol conversion, and comprehensive API governance.
eBPF-based platform for Kubernetes monitoring and performance testing with automatic service mapping.
A high-performance PHP application server and process manager written in Go, designed to replace traditional setups like Nginx+FPM.
A curated collection of ready-to-use Prometheus alerting rules for monitoring infrastructure, databases, and cloud services.
A simple framework for alerting on anomalies, spikes, or other patterns in Elasticsearch data.
A diagnostic logging library for .NET applications with first-class support for structured event data.
A Java library for capturing JVM and application-level metrics to monitor system performance.
Query APIs, cloud services, and code directly with SQL using a zero-ETL approach—no database required.
A pure Go library for loading, compiling, debugging, and attaching eBPF programs to Linux kernel hooks.
A modern application delivery platform that simplifies deploying and operating applications across hybrid, multi-cloud environments.
An open-source observability and APM tool with AI-powered root cause analysis, combining metrics, logs, traces, profiling, and SLO-based alerting.
A durable background task queue and workflow orchestration platform built on Postgres with observability and flow control.
An AI-powered open-source observability platform unifying metrics, logs, and alerting with agentless collection and custom monitoring.
A fluent Go job scheduling library for running functions at fixed intervals, cron times, or random durations.
A smart model router for OpenClaw that cuts LLM costs up to 70% by routing requests to the cheapest capable model.
A framework for instrumenting Rust programs to collect structured, event-based diagnostic information.
An AI-native proxy and data plane for agentic applications, providing built-in orchestration, safety, observability, and smart LLM routing.
A flexible and structured logging platform for .NET applications, supporting both traditional and structured logging across various platforms.
A flexible and structured logging platform for .NET applications, supporting both traditional and modern logging patterns.
A Docker monitoring stack with Prometheus, Grafana, cAdvisor, NodeExporter, and AlertManager for hosts and containers.
An open-source observability database that unifies metrics, logs, and traces into a single engine, replacing Prometheus, Loki, and Elasticsearch.
An easy-to-use, powerful, and reliable system to process and distribute data across cybersecurity, observability, and AI pipelines.
An asynchronous thread pool framework for Java applications that supports dynamic configuration changes, monitoring, and alerting without code modifications.
A Go microservice template for Kubernetes that demonstrates best practices for building and deploying cloud-native applications.
A cloud-native traffic orchestration system for high availability, extensibility, and observability in API management and service mesh.
A horizontally scalable, highly available, multi-tenant, long-term storage solution for Prometheus and OpenTelemetry Metrics.
A Prometheus exporter that probes endpoints over HTTP, HTTPS, DNS, TCP, ICMP, and gRPC to collect blackbox monitoring metrics.
An open-source chaos engineering platform for SREs and developers to test cloud-native system resilience.
Open source CNAPP that hunts for threats in cloud native platforms, ranks them by risk, and visualizes attack paths.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.