A state-of-the-art diffusion model for predicting how small molecules (ligands) bind to proteins.
DiffDock is an open-source implementation of a state-of-the-art diffusion model for molecular docking. It predicts the 3D binding pose of a small molecule (ligand) within a protein's active site, which is a critical step in computational drug discovery and structural biology. The method outputs both the predicted structure and a confidence score to help researchers assess the prediction's reliability.
Computational chemists, structural biologists, and drug discovery researchers who need to predict or analyze how potential drug molecules interact with protein targets. It is also relevant for machine learning practitioners interested in AI applications for science.
Developers choose DiffDock for its high accuracy, which is state-of-the-art in molecular docking benchmarks, and its unique diffusion-based approach that provides a confidence estimate alongside each prediction. The project is actively maintained, offers multiple easy-to-use interfaces, and is built on a transparent, open-source codebase.
Implementation of DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking
DiffDock-L achieves high performance on benchmarks like PDBBind and DockGen, as evidenced by the updated paper and provided evaluation scripts for replication.
Outputs a confidence score for each predicted pose, with guidelines in the FAQ to help assess reliability, aiding in decision-making without external tools.
Accepts proteins as PDB files or sequences (folded with ESMFold) and ligands as SMILES or various file formats, supporting diverse data sources from experiments or databases.
Offers a web interface via Hugging Face Spaces, local CLI, Docker container, and a graphical UI, making it accessible for different user preferences and setups.
Explicitly does not predict binding affinity; users must integrate with other tools like GNINA or free energy calculations, adding complexity to workflows.
Designed only for ligand-protein docking, not suitable for larger biomolecules like proteins or nucleic acids, requiring alternative methods for such interactions.
While CPU is supported, inference is significantly slower without a GPU, as noted in the README, which can limit throughput for large batches.
Junction Tree Variational Autoencoder for Molecular Graph Generation (ICML 2018)
Molecular Transformer is a neural machine translation model adapted for chemistry that predicts chemical reaction outcomes and retrosynthetic pathways. It translates between molecular representations (SMILES strings) to forecast how molecules react or how target molecules can be synthesized, accelerating discovery in organic chemistry and drug development. ## Key Features - **Retrosynthesis Prediction** — Predicts reactant molecules needed to synthesize a target product molecule. - **Uncertainty Calibration** — Provides confidence estimates for predictions, helping chemists assess reliability. - **SMILES Tokenization** — Uses custom tokenization of SMILES strings to treat molecules as sequences for transformer models. - **Data Augmentation** — Doubles training data by generating random equivalent SMILES representations via RDKit. - **Pre-trained Models** — Includes models trained on public datasets (USPTO_MIT, USPTO_STEREO) with mixed or separated reactant/reagent formats. ## Philosophy Molecular Transformer aims to make AI-assisted chemical reaction prediction accessible to organic chemists, with the goal of integrating these models into daily laboratory workflows to accelerate molecular discovery.
REINVENT is a reinforcement learning framework specifically designed for de novo drug design, enabling the generation of novel molecular structures with optimized properties. It addresses the challenge of discovering new chemical entities by combining generative models with property prediction to explore chemical space efficiently. ## Key Features - **Reinforcement Learning Pipeline** — Uses RL to optimize molecular structures toward desired chemical properties and biological activities - **De Novo Molecular Generation** — Creates entirely new molecular entities rather than modifying existing compounds - **Property Optimization** — Incorporates scoring functions to guide generation toward molecules with specific target properties - **Template-Based Execution** — Provides configurable JSON templates for different running modes and experiments - **TensorBoard Integration** — Enables real-time monitoring and visualization of training logs and progress ## Philosophy REINVENT applies reinforcement learning principles to drug discovery, treating molecular generation as an optimization problem where the agent learns to propose molecules that maximize desired chemical and biological properties.
The official implementation of 3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction (ICLR 2023)
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.