Official implementation of a 3D equivariant diffusion model for generating drug-like molecules that bind to specific protein targets and predicting their binding affinity.
TargetDiff is an open-source machine learning framework for structure-based drug discovery. It generates novel 3D molecular structures that are likely to bind to a specific protein target using an equivariant diffusion model and predicts their binding affinity. It addresses the challenge of designing drug-like molecules with desired binding properties from scratch.
Computational chemists, bioinformaticians, and machine learning researchers working on AI-driven drug discovery, molecular generation, and protein-ligand interaction prediction.
It provides a unified, geometry-aware pipeline for both target-conditioned molecule generation and affinity prediction, leveraging SE(3)-equivariant networks for accurate 3D modeling and integrating with established docking tools for validation.
The official implementation of 3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction (ICLR 2023)
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Generates 3D molecular structures specifically for protein binding pockets using data from CrossDocked2020, with scripts for pocket extraction and sampling from PDB files, enabling precise drug design.
Utilizes SE(3)-equivariant diffusion models to respect 3D symmetries, improving geometric accuracy as validated in the ICLR 2023 paper and benchmark comparisons against baselines like Pocket2Mol.
Predicts binding affinity via supervised learning on PDBBind data, with inference scripts for real complexes, achieving an RMSE of 1.316 on test sets and leveraging generative features for enhanced performance.
Integrates with AutoDock Vina for in silico evaluation, providing metafiles for benchmarking and reproducibility, as detailed in the evaluation section with multiple docking modes.
Requires specific versions of PyTorch, CUDA, and other packages via Conda and Pip, with additional tools like AutoDockTools_py3, making installation error-prone and time-consuming.
READMe admits that supervised learning checkpoints for PDBBind v2020 are lost, offering only v2016 models, which limits accuracy with newer data and reflects maintenance gaps.
Relies heavily on GPU for training and sampling, and docking evaluation is time-consuming, as noted in the evaluation scripts, restricting use in resource-constrained settings.
Training requires downloading large datasets from Google Drive and running multiple preprocessing scripts, which can be daunting and prone to failure without careful manual intervention.