A modular, high-throughput PyTorch framework for deep reinforcement learning research, supporting policy gradient, deep Q-learning, and Q-function policy gradient algorithms.
rlpyt is a deep reinforcement learning framework implemented in PyTorch, providing modular and optimized versions of common RL algorithms like A2C, PPO, DQN, DDPG, and SAC. It solves the need for a high-throughput, research-oriented codebase that unifies infrastructure for policy gradient, deep Q-learning, and Q-function policy gradient methods.
Researchers and developers working on small- to medium-scale deep reinforcement learning projects who need a flexible, efficient framework for experimenting with and modifying RL algorithms.
Developers choose rlpyt for its modular design, support for parallel and multi-GPU execution, and comprehensive algorithm implementations, which accelerate RL research without the overhead of large-scale systems like OpenAI's Dota setup.
Reinforcement Learning in PyTorch
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Allows easy modification and re-use of components like runners, samplers, and agents, facilitating custom research setups as highlighted in the Key Features.
Supports parallel sampling and multi-GPU optimization using PyTorch's DistributedDataParallel, enabling efficient training on local hardware for faster experimentation.
Implements common algorithms across policy gradient, deep Q-learning, and Q-function policy gradient families, providing a unified infrastructure for diverse RL research.
Introduces namedarraytuple for handling multi-modal observations and actions, simplifying code for complex environment interfaces as described in the README.
Some algorithms like Implicit Quantile Networks are listed as 'coming soon', and the README notes changes may occur, which can hinder stability for projects requiring cutting-edge methods.
Requires conda environment setup, PYTHONPATH adjustments, and additional installations for environments like MuJoCo, making initial configuration cumbersome compared to pip-installable alternatives.
Does not include visualization tools, relying on external packages like viskit for data analysis, adding extra steps and dependencies for monitoring training progress.