A deep learning model for protein sequence design that generates amino acid sequences for given protein backbones.
ProteinMPNN is a deep learning model for protein sequence design that generates amino acid sequences compatible with given protein backbone structures. It solves the inverse protein folding problem, enabling researchers to design novel proteins with desired structural properties. The model is trained to predict sequences that fold into specified backbones, supporting both full-atom and CA-only representations.
Computational biologists, protein engineers, and researchers working on de novo protein design or protein optimization who need to generate sequences for specified backbone scaffolds.
ProteinMPNN offers a robust, fast, and user-friendly open-source alternative to proprietary protein design tools, with flexible controls for fixing residues, adding biases, and incorporating evolutionary information via PSSM profiles.
Code for the ProteinMPNN paper
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Offers full backbone and CA-only models, plus soluble-protein-trained weights, enabling adaptation to different structural inputs and design scenarios like membrane or soluble proteins.
Supports fixing residues, tying positions for symmetry, adding amino acid biases, and incorporating PSSM profiles via JSONL files, allowing precise sequence customization as shown in the helper scripts.
Provides scores, probabilities, and uncertainty metrics like global_score and seq_recovery in outputs, enabling researchers to assess design quality and reliability.
Can generate multiple sequences per target with configurable sampling temperatures, facilitating exploration of diverse sequence variants, as indicated by the --num_seq_per_target flag.
With over 30 input flags and reliance on JSONL files for controls, the setup and usage can be overwhelming for new users without extensive prior experience.
While it runs on CPU, the deep learning models are computationally intensive, making inference slow on systems without GPUs, as hinted by the batch size adjustments for GPU memory.
Requires pre-defined backbone structures in PDB format, so it cannot be used for de novo structure prediction or sequence-only design tasks, limiting its scope.