Official implementation of a 3D equivariant diffusion model for generating drug-like molecules that bind to specific protein targets and predicting their binding affinity.
TargetDiff is an open-source machine learning framework for structure-based drug discovery. It generates novel 3D molecular structures that are likely to bind to a specific protein target using an equivariant diffusion model and predicts their binding affinity. It addresses the challenge of designing drug-like molecules with desired binding properties from scratch.
Computational chemists, bioinformaticians, and machine learning researchers working on AI-driven drug discovery, molecular generation, and protein-ligand interaction prediction.
It provides a unified, geometry-aware pipeline for both target-conditioned molecule generation and affinity prediction, leveraging SE(3)-equivariant networks for accurate 3D modeling and integrating with established docking tools for validation.
The official implementation of 3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction (ICLR 2023)
Generates 3D molecular structures specifically for protein binding pockets using data from CrossDocked2020, with scripts for pocket extraction and sampling from PDB files, enabling precise drug design.
Utilizes SE(3)-equivariant diffusion models to respect 3D symmetries, improving geometric accuracy as validated in the ICLR 2023 paper and benchmark comparisons against baselines like Pocket2Mol.
Predicts binding affinity via supervised learning on PDBBind data, with inference scripts for real complexes, achieving an RMSE of 1.316 on test sets and leveraging generative features for enhanced performance.
Integrates with AutoDock Vina for in silico evaluation, providing metafiles for benchmarking and reproducibility, as detailed in the evaluation section with multiple docking modes.
Requires specific versions of PyTorch, CUDA, and other packages via Conda and Pip, with additional tools like AutoDockTools_py3, making installation error-prone and time-consuming.
READMe admits that supervised learning checkpoints for PDBBind v2020 are lost, offering only v2016 models, which limits accuracy with newer data and reflects maintenance gaps.
Relies heavily on GPU for training and sampling, and docking evaluation is time-consuming, as noted in the evaluation scripts, restricting use in resource-constrained settings.
Training requires downloading large datasets from Google Drive and running multiple preprocessing scripts, which can be daunting and prone to failure without careful manual intervention.
Implementation of DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking
Junction Tree Variational Autoencoder for Molecular Graph Generation (ICML 2018)
Molecular Transformer is a neural machine translation model adapted for chemistry that predicts chemical reaction outcomes and retrosynthetic pathways. It translates between molecular representations (SMILES strings) to forecast how molecules react or how target molecules can be synthesized, accelerating discovery in organic chemistry and drug development. ## Key Features - **Retrosynthesis Prediction** — Predicts reactant molecules needed to synthesize a target product molecule. - **Uncertainty Calibration** — Provides confidence estimates for predictions, helping chemists assess reliability. - **SMILES Tokenization** — Uses custom tokenization of SMILES strings to treat molecules as sequences for transformer models. - **Data Augmentation** — Doubles training data by generating random equivalent SMILES representations via RDKit. - **Pre-trained Models** — Includes models trained on public datasets (USPTO_MIT, USPTO_STEREO) with mixed or separated reactant/reagent formats. ## Philosophy Molecular Transformer aims to make AI-assisted chemical reaction prediction accessible to organic chemists, with the goal of integrating these models into daily laboratory workflows to accelerate molecular discovery.
REINVENT is a reinforcement learning framework specifically designed for de novo drug design, enabling the generation of novel molecular structures with optimized properties. It addresses the challenge of discovering new chemical entities by combining generative models with property prediction to explore chemical space efficiently. ## Key Features - **Reinforcement Learning Pipeline** — Uses RL to optimize molecular structures toward desired chemical properties and biological activities - **De Novo Molecular Generation** — Creates entirely new molecular entities rather than modifying existing compounds - **Property Optimization** — Incorporates scoring functions to guide generation toward molecules with specific target properties - **Template-Based Execution** — Provides configurable JSON templates for different running modes and experiments - **TensorBoard Integration** — Enables real-time monitoring and visualization of training logs and progress ## Philosophy REINVENT applies reinforcement learning principles to drug discovery, treating molecular generation as an optimization problem where the agent learns to propose molecules that maximize desired chemical and biological properties.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.