A CLI-driven toolkit for writing and running Chaos Engineering experiments across any platform via extensions.
Chaos Toolkit is an open-source Chaos Engineering framework that provides a command-line interface for designing, running, and automating resilience experiments. It helps developers and SREs proactively discover weaknesses in their systems by simulating failures in a controlled manner. The toolkit is extensible, allowing integration with virtually any platform through community or custom-built extensions.
Site Reliability Engineers (SREs), DevOps practitioners, and software developers who are responsible for building and maintaining resilient cloud-native or on-premise systems and want to implement Chaos Engineering practices.
Developers choose Chaos Toolkit for its simplicity, extensibility, and platform-agnostic approach, enabling chaos experiments in environments where other tools may not fit. Its open API and strong community focus encourage collaboration and sharing of experiments.
Chaos Engineering Toolkit & Orchestration for Developers
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Allows running experiments with a single command like `chaos run experiment.json`, making it easy to automate and integrate into scripts or CI/CD pipelines as shown in the README.
Supports custom extensions for any target platform, enabling integration with proprietary or niche environments, which is highlighted in the README as a core feature.
Works across cloud, datacenter, and CI/CD environments through existing extensions, providing versatility where other tools may not fit, per the README.
Encourages sharing via Slack, StackOverflow, and GitHub, fostering a rich ecosystem of experiments and extensions, as emphasized in the mission statement.
Core functionality is minimal; most advanced features require installing and managing separate extensions, adding complexity and potential maintenance overhead.
Requires Python 3.8+ and uses PDM for development, which can be a barrier in environments with strict tooling constraints or limited Python support.
Lacks automatic rollback mechanisms; safety relies on manual experiment design or extensions, increasing risk for inexperienced users.