How to generate molecules for my own protein using TargetDiff?

Use the sample_for_pocket.py script with a PDB file of the protein pocket extracted to a 10Å region, as shown in the README, but you may need to preprocess the data with provided scripts like extract_pockets.py first.

TargetDiff vs Pocket2Mol: which is better for binding affinity?

TargetDiff integrates affinity prediction directly using PDBBind data and equivariant models, while Pocket2Mol focuses more on generation; benchmark metafiles show TargetDiff's competitive docking scores, but the choice depends on whether you need unified prediction.

Can I run TargetDiff without AutoDock Vina?

Yes, evaluation can use docking_mode 'none' to skip docking, but for binding affinity validation, Vina is recommended, and the README notes it takes time to prepare files for first-time use.

What hardware do I need to train TargetDiff from scratch?

A GPU with CUDA 11.6 or compatible, as specified in dependencies, plus sufficient storage for datasets like CrossDocked2020 and PDBBind, which are large and require preprocessing.

How accurate is the binding affinity prediction?

On PDBBind v2016 test sets, it achieves RMSE of 1.316 and MAE of 1.031, as reported in the evaluation section, but accuracy may vary with newer data due to lost checkpoints.

Is TargetDiff good for virtual screening?

It can generate and score molecules, but the sampling and docking steps are slow, making it more suited for focused drug design rather than high-throughput screening without optimizations.

TargetDiff

Python

Official implementation of a 3D equivariant diffusion model for generating drug-like molecules that bind to specific protein targets and predicting their binding affinity.

GitHub

What is TargetDiff?

TargetDiff is an open-source machine learning framework for structure-based drug discovery. It generates novel 3D molecular structures that are likely to bind to a specific protein target using an equivariant diffusion model and predicts their binding affinity. It addresses the challenge of designing drug-like molecules with desired binding properties from scratch.

Target Audience

Computational chemists, bioinformaticians, and machine learning researchers working on AI-driven drug discovery, molecular generation, and protein-ligand interaction prediction.

Value Proposition

It provides a unified, geometry-aware pipeline for both target-conditioned molecule generation and affinity prediction, leveraging SE(3)-equivariant networks for accurate 3D modeling and integrating with established docking tools for validation.

Overview

The official implementation of 3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction (ICLR 2023)

Use Cases

Best For

Generating novel drug candidates for a specific protein target

Related Projects

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub

343 stars53 forks0 contributors

Predicting binding affinity of protein-ligand complexes

Academic research in AI for drug discovery

Benchmarking molecular generation models

Exploring structure-activity relationships in silico

Teaching concepts of equivariant neural networks in chemistry

Not Ideal For

Projects requiring quick, out-of-the-box deployment without extensive environment setup and dependency management
Teams without access to GPU resources for training or sampling with the diffusion models
Applications focused solely on 2D molecular properties or ligand-based design without protein structure data
Production environments needing robust, commercial-grade support and frequent model updates

Pros & Cons

Pros

Target-Conditioned Generation

Generates 3D molecular structures specifically for protein binding pockets using data from CrossDocked2020, with scripts for pocket extraction and sampling from PDB files, enabling precise drug design.

Equivariant Neural Networks

Utilizes SE(3)-equivariant diffusion models to respect 3D symmetries, improving geometric accuracy as validated in the ICLR 2023 paper and benchmark comparisons against baselines like Pocket2Mol.

Integrated Affinity Prediction

Predicts binding affinity via supervised learning on PDBBind data, with inference scripts for real complexes, achieving an RMSE of 1.316 on test sets and leveraging generative features for enhanced performance.

Docking Validation

Integrates with AutoDock Vina for in silico evaluation, providing metafiles for benchmarking and reproducibility, as detailed in the evaluation section with multiple docking modes.

Cons

Complex Setup and Dependencies

Requires specific versions of PyTorch, CUDA, and other packages via Conda and Pip, with additional tools like AutoDockTools_py3, making installation error-prone and time-consuming.

Incomplete or Outdated Models

READMe admits that supervised learning checkpoints for PDBBind v2020 are lost, offering only v2016 models, which limits accuracy with newer data and reflects maintenance gaps.

High Computational Demands

Relies heavily on GPU for training and sampling, and docking evaluation is time-consuming, as noted in the evaluation scripts, restricting use in resource-constrained settings.

Cumbersome Data Handling

Training requires downloading large datasets from Google Drive and running multiple preprocessing scripts, which can be daunting and prone to failure without careful manual intervention.

Frequently Asked Questions

Home

Computational Biology

DiffDock

Implementation of DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking

Stars1,534

Forks355

Last commit1 year ago

JTVAE

Junction Tree Variational Autoencoder for Molecular Graph Generation (ICML 2018)

Stars561

Forks196

Last commit3 years ago

Molecular Transformer

Molecular Transformer is a neural machine translation model adapted for chemistry that predicts chemical reaction outcomes and retrosynthetic pathways. It translates between molecular representations (SMILES strings) to forecast how molecules react or how target molecules can be synthesized, accelerating discovery in organic chemistry and drug development. ## Key Features - **Retrosynthesis Prediction** — Predicts reactant molecules needed to synthesize a target product molecule. - **Uncertainty Calibration** — Provides confidence estimates for predictions, helping chemists assess reliability. - **SMILES Tokenization** — Uses custom tokenization of SMILES strings to treat molecules as sequences for transformer models. - **Data Augmentation** — Doubles training data by generating random equivalent SMILES representations via RDKit. - **Pre-trained Models** — Includes models trained on public datasets (USPTO_MIT, USPTO_STEREO) with mixed or separated reactant/reagent formats. ## Philosophy Molecular Transformer aims to make AI-assisted chemical reaction prediction accessible to organic chemists, with the goal of integrating these models into daily laboratory workflows to accelerate molecular discovery.

Stars426

Forks83

Last commit4 years ago

REINVENT

REINVENT is a reinforcement learning framework specifically designed for de novo drug design, enabling the generation of novel molecular structures with optimized properties. It addresses the challenge of discovering new chemical entities by combining generative models with property prediction to explore chemical space efficiently. ## Key Features - **Reinforcement Learning Pipeline** — Uses RL to optimize molecular structures toward desired chemical properties and biological activities - **De Novo Molecular Generation** — Creates entirely new molecular entities rather than modifying existing compounds - **Property Optimization** — Incorporates scoring functions to guide generation toward molecules with specific target properties - **Template-Based Execution** — Provides configurable JSON templates for different running modes and experiments - **TensorBoard Integration** — Enables real-time monitoring and visualization of training logs and progress ## Philosophy REINVENT applies reinforcement learning principles to drug discovery, treating molecular generation as an optimization problem where the agent learns to propose molecules that maximize desired chemical and biological properties.

Stars375

Forks114

Last commit1 year ago

#computational-chemistry

Computational Biology122