A transformer-based model for unconditional and conditional molecular generation using GPT architecture trained on chemical datasets.
MolGPT is a transformer-based deep learning model specifically designed for molecular generation tasks. It uses a custom GPT architecture trained on chemical datasets to generate novel molecules, both unconditionally and with specific target properties. The model addresses the challenge of discovering new chemical compounds for drug development and materials science through machine learning.
Computational chemists, drug discovery researchers, and materials scientists working on molecular design and generation. Also relevant for machine learning practitioners interested in applying transformers to scientific domains.
MolGPT provides a specialized transformer model optimized for chemical structures rather than general text, with built-in interpretability features and benchmarking against established molecular datasets. It offers an accessible implementation for researchers wanting to apply modern deep learning to molecular generation without building from scratch.
MolGPT is a custom GPT model specifically trained for molecular generation tasks using transformer-decoder architecture. It enables both unconditional generation of novel molecules and conditional generation based on specific chemical properties, providing researchers with a powerful tool for drug discovery and materials science.
MolGPT applies modern transformer architectures to molecular science, treating chemical structures as sequences that can be generated and optimized using deep learning techniques while maintaining interpretability through visualization tools.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Trained on both MOSES and Guacamol datasets, ensuring broad chemical space coverage and robustness for diverse generation tasks, as highlighted in the key features.
Supports targeted molecular generation based on specific properties or scaffolds, enabling precise drug discovery applications, per the conditional generation feature.
Integrates Ecco library for saliency maps, providing model explainability to help researchers understand generation decisions, as mentioned in the interpretability tools.
Compares favorably against previous approaches on standard datasets like MOSES and Guacamol, validating its effectiveness for molecular generation, as noted in performance benchmarks.
Requires downloading datasets from external Google Drive links and running shell scripts, which can be error-prone and lacks detailed guidance for customization.
The README is brief, with no examples or troubleshooting advice, making it difficult for users to adapt the model beyond the provided scripts.
Training and generation are resource-intensive, relying on GPUs and substantial memory, which may limit accessibility for smaller research teams.