Showing 36 of 118 projects
A curated list of awesome Site Reliability Engineering (SRE) and Production Engineering resources.
eBPF-powered network observability for Kubernetes, indexing L4/L7 traffic with full context and TLS decryption.
A cloud-native search engine optimized for observability data like logs and traces, offering sub-second search on cloud storage.
A collection of performance analysis tools for Linux using ftrace and perf_events to trace system activity with minimal dependencies.
A high-level tracing language for Linux that leverages eBPF for efficient system and application observability.
Kubernetes-native operator for deploying and managing Prometheus monitoring stacks using custom resources.
A Java-native, high-performance API gateway for microservices, offering service proxy, protocol conversion, and comprehensive API governance.
eBPF-based platform for Kubernetes monitoring and performance testing with automatic service mapping.
A high-performance PHP application server and process manager written in Go, designed to replace traditional setups like Nginx+FPM.
A simple framework for alerting on anomalies, spikes, or other patterns in Elasticsearch data.
A curated collection of ready-to-use Prometheus alerting rules for monitoring infrastructure, databases, and cloud services.
A diagnostic logging library for .NET applications with first-class support for structured event data.
A Java library for capturing JVM and application-level metrics to monitor system performance.
Query APIs, cloud services, and code directly with SQL using a zero-ETL approach—no database required.
A modern application delivery platform that simplifies deploying and operating applications across hybrid, multi-cloud environments.
A pure Go library for loading, compiling, debugging, and attaching eBPF programs to Linux kernel hooks.
An open-source observability and APM tool with AI-powered root cause analysis, combining metrics, logs, traces, profiling, and SLO-based alerting.
An AI-powered open-source observability platform unifying metrics, logs, and alerting with agentless collection and custom monitoring.
An open-source platform for building product integrations with AI, handling auth, proxy, and functions for 700+ APIs.
A fluent Go job scheduling library for running functions at fixed intervals, cron times, or random durations.
A durable background task queue and workflow orchestration platform built on Postgres with observability and flow control.
A framework for instrumenting Rust programs to collect structured, event-based diagnostic information.
A flexible and structured logging platform for .NET applications, supporting both traditional and structured logging across various platforms.
A flexible and structured logging platform for .NET applications, supporting both traditional and modern logging patterns.
A Docker monitoring stack with Prometheus, Grafana, cAdvisor, NodeExporter, and AlertManager for hosts and containers.
An AI-native proxy and data plane for agentic applications, providing built-in orchestration, safety, observability, and smart LLM routing.
An open-source observability database that unifies metrics, logs, and traces into a single engine, replacing Prometheus, Loki, and Elasticsearch.
An easy-to-use, powerful, and reliable system to process and distribute data across cybersecurity, observability, and AI pipelines.
An asynchronous thread pool framework for Java applications that supports dynamic configuration changes, monitoring, and alerting without code modifications.
A Go microservice template for Kubernetes that demonstrates best practices for building and deploying cloud-native applications.
A cloud-native traffic orchestration system for high availability, extensibility, and observability in API management and service mesh.
A horizontally scalable, highly available, multi-tenant, long-term storage solution for Prometheus and OpenTelemetry Metrics.
A Prometheus exporter that probes endpoints over HTTP, HTTPS, DNS, TCP, ICMP, and gRPC to collect blackbox monitoring metrics.
A smart model router for OpenClaw that cuts LLM costs up to 70% by routing requests to the cheapest capable model.
An open-source chaos engineering platform for SREs and developers to test cloud-native system resilience.
Open source CNAPP that hunts for threats in cloud native platforms, ranks them by risk, and visualizes attack paths.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.