A deep reinforcement learning library offering high-quality, single-file implementations of algorithms like PPO, DQN, and SAC for research and education.
CleanRL is a Deep Reinforcement Learning library that provides high-quality, single-file implementations of popular algorithms like PPO, DQN, and SAC. It is designed to be a clear, readable reference for researchers and learners who want to understand the complete implementation details of DRL algorithms without the abstraction layers of larger modular libraries. The project solves the problem of opaque or overly complex codebases by offering minimal, standalone scripts that are easy to study, modify, and extend for advanced research prototypes.
Reinforcement learning researchers, students, and practitioners who need to deeply understand algorithm implementations, prototype new features not supported by modular libraries, or seek a transparent and benchmarked codebase for experiments.
Developers choose CleanRL for its unparalleled code clarity and simplicity, which facilitates easier debugging, learning, and prototyping compared to more abstract modular libraries. Its single-file approach, comprehensive benchmarking, and research-oriented features like integrated experiment tracking provide a unique blend of educational value and practical utility for advanced DRL projects.
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Each algorithm variant is packaged into one standalone file, such as ppo_atari.py with only 340 lines, offering a complete and readable reference without navigating complex modular hierarchies.
Includes extensive benchmarks across 7+ algorithms and 34+ environments via the Open RL Benchmark, providing transparent performance evaluation and comparison data.
Integrates Tensorboard logging, Weights and Biases for experiment management, and gameplay video capture, supporting reproducible research workflows as highlighted in the features.
Supports scaling to thousands of experiments using Docker and AWS Batch integration, enabling large-scale distributed training as documented in the cloud features.
CleanRL is explicitly not a modular library and not meant to be imported, limiting its use in projects that require building upon or extending existing codebases in a reusable way.
The single-file approach leads to duplicate code across variants, increasing maintenance overhead and making updates more cumbersome, as acknowledged in the philosophy.
Requires installing multiple optional dependencies for different environments (e.g., Atari, Mujoco), complicating the initial setup with separate pip install commands.