An end-to-end platform for applied reinforcement learning and contextual bandits, built with PyTorch for production decision-making systems.
ReAgent is an end-to-end platform for applied reinforcement learning and contextual bandits, originally developed at Facebook. It provides tools for training, evaluating, and serving decision-making models in production environments where simulators aren't available. The platform supports batch offline training and counterfactual policy evaluation to safely test new policies without deployment.
Machine learning engineers and researchers building production reinforcement learning systems for recommendation engines, optimization tasks, and decision-making applications. Particularly valuable for teams working with batch data in real-world environments.
Developers choose ReAgent for its comprehensive production-ready workflow, support for both classic RL algorithms and specialized recommender system methods, and robust counterfactual evaluation tools. Its focus on batch offline learning makes it uniquely suited for real-world applications where live experimentation is risky.
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Offers a complete workflow from data preprocessing to model serving, specifically designed for large-scale applications without simulators, as highlighted in the overview.
Includes tools like Doubly Robust and MAGIC for estimating policy performance without deployment, enabling safe testing of new policies in batch settings.
Provides algorithms like Seq2Slate and SlateQ for slate-based recommendations, addressing real-world use cases in recommendation engines.
Supports a range of classic off-policy RL algorithms and contextual bandit methods, including DQN variants, TD3, SAC, and more, as listed in the README.
The project is officially archived with no further updates, and users are redirected to Pearl for current support, making it unsuitable for new developments.
Designed for distributed training and large-scale systems, which can involve significant setup and infrastructure overhead, not ideal for quick prototyping.
Primarily geared towards batch offline learning where simulators aren't available, limiting its applicability to online or simulation-based RL scenarios.