A sequence-to-sequence transformer model for predicting chemical reaction pathways (retrosynthesis) with uncertainty calibration.
Molecular Transformer is a sequence-to-sequence neural network model that predicts chemical reaction outcomes and retrosynthetic pathways. It treats molecules as SMILES strings and uses transformer architecture to translate between reactants and products, helping chemists design synthesis routes faster. The model includes uncertainty estimation to indicate prediction confidence.
Computational chemists, researchers in cheminformatics, and organic chemists who need AI tools for reaction prediction and retrosynthesis planning.
It provides an open-source, uncertainty-calibrated model trained on public reaction datasets, unlike proprietary tools. The integration with RDKit for data preprocessing and availability of pre-trained models lowers the barrier for academic and industrial adoption.
Molecular Transformer is a neural machine translation model adapted for chemistry that predicts chemical reaction outcomes and retrosynthetic pathways. It translates between molecular representations (SMILES strings) to forecast how molecules react or how target molecules can be synthesized, accelerating discovery in organic chemistry and drug development.
Molecular Transformer aims to make AI-assisted chemical reaction prediction accessible to organic chemists, with the goal of integrating these models into daily laboratory workflows to accelerate molecular discovery.
Provides confidence estimates for predictions, explicitly mentioned in the README to help chemists assess reliability, which is rare in open-source models.
Includes models trained on public datasets like USPTO_MIT and USPTO_STEREO, available for download, allowing immediate use without training from scratch.
Doubles training data by generating random equivalent SMILES via RDKit, as described in the README, improving model robustness and accuracy.
Utilizes RDKit for SMILES canonicalization and tokenization, ensuring accurate molecular representation and preprocessing, which is critical for chemistry applications.
Requires Python 3.5 and PyTorch 0.4.1, which are obsolete and may cause compatibility issues with modern systems or libraries, as noted in the installation steps.
Involves multi-step conda environment setup, data preprocessing, and model averaging (last 20 checkpoints), making it inaccessible for non-experts without deep ML or chemistry knowledge.
Trained primarily on USPTO patent data, so predictions may falter for reactions outside this domain, as admitted in the README regarding the need for more diverse data on IBM RXN.
Implementation of DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking
Junction Tree Variational Autoencoder for Molecular Graph Generation (ICML 2018)
REINVENT is a reinforcement learning framework specifically designed for de novo drug design, enabling the generation of novel molecular structures with optimized properties. It addresses the challenge of discovering new chemical entities by combining generative models with property prediction to explore chemical space efficiently. ## Key Features - **Reinforcement Learning Pipeline** — Uses RL to optimize molecular structures toward desired chemical properties and biological activities - **De Novo Molecular Generation** — Creates entirely new molecular entities rather than modifying existing compounds - **Property Optimization** — Incorporates scoring functions to guide generation toward molecules with specific target properties - **Template-Based Execution** — Provides configurable JSON templates for different running modes and experiments - **TensorBoard Integration** — Enables real-time monitoring and visualization of training logs and progress ## Philosophy REINVENT applies reinforcement learning principles to drug discovery, treating molecular generation as an optimization problem where the agent learns to propose molecules that maximize desired chemical and biological properties.
The official implementation of 3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction (ICLR 2023)
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.