A sequence-to-sequence transformer model for predicting chemical reaction pathways (retrosynthesis) with uncertainty calibration.
Molecular Transformer is a sequence-to-sequence neural network model that predicts chemical reaction outcomes and retrosynthetic pathways. It treats molecules as SMILES strings and uses transformer architecture to translate between reactants and products, helping chemists design synthesis routes faster. The model includes uncertainty estimation to indicate prediction confidence.
Computational chemists, researchers in cheminformatics, and organic chemists who need AI tools for reaction prediction and retrosynthesis planning.
It provides an open-source, uncertainty-calibrated model trained on public reaction datasets, unlike proprietary tools. The integration with RDKit for data preprocessing and availability of pre-trained models lowers the barrier for academic and industrial adoption.
Molecular Transformer is a neural machine translation model adapted for chemistry that predicts chemical reaction outcomes and retrosynthetic pathways. It translates between molecular representations (SMILES strings) to forecast how molecules react or how target molecules can be synthesized, accelerating discovery in organic chemistry and drug development.
Molecular Transformer aims to make AI-assisted chemical reaction prediction accessible to organic chemists, with the goal of integrating these models into daily laboratory workflows to accelerate molecular discovery.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Provides confidence estimates for predictions, explicitly mentioned in the README to help chemists assess reliability, which is rare in open-source models.
Includes models trained on public datasets like USPTO_MIT and USPTO_STEREO, available for download, allowing immediate use without training from scratch.
Doubles training data by generating random equivalent SMILES via RDKit, as described in the README, improving model robustness and accuracy.
Utilizes RDKit for SMILES canonicalization and tokenization, ensuring accurate molecular representation and preprocessing, which is critical for chemistry applications.
Requires Python 3.5 and PyTorch 0.4.1, which are obsolete and may cause compatibility issues with modern systems or libraries, as noted in the installation steps.
Involves multi-step conda environment setup, data preprocessing, and model averaging (last 20 checkpoints), making it inaccessible for non-experts without deep ML or chemistry knowledge.
Trained primarily on USPTO patent data, so predictions may falter for reactions outside this domain, as admitted in the README regarding the need for more diverse data on IBM RXN.