A collection of BERT-like transformer models pre-trained on chemical SMILES data for drug design and property prediction.
ChemBERTa is a collection of BERT-like transformer models specifically pre-trained on chemical SMILES data for applications in chemistry and drug discovery. It applies masked language modeling techniques to molecular representations, enabling transfer learning for various chemical prediction tasks. The project provides pre-trained models that can be fine-tuned for specific applications like property prediction and chemical modeling.
Researchers, developers, and students working at the intersection of machine learning and chemistry, particularly those interested in applying transformer models to chemical data for drug design and molecular property prediction.
ChemBERTa offers specialized transformer models pre-trained on chemical data, providing a foundation for chemistry-specific machine learning tasks without requiring extensive computational resources for pre-training. The models are easily accessible through HuggingFace and integrate with existing deep learning workflows.
bert-loves-chemistry: a repository of HuggingFace models applied on chemical SMILES data for drug design, chemical modelling, etc.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Models are pre-trained on chemical SMILES datasets like ZINC and PubChem, providing a tailored foundation for tasks like drug design and property prediction, as highlighted in the README's focus on MLM for chemistry.
All model weights are hosted on HuggingFace, making them easily accessible for loading and inference using standard transformers libraries, as shown in the example code snippet.
Pre-trained models can be fine-tuned for specific chemical prediction tasks, supported by a tutorial and examples for benchmarks like BBBP, enabling quick adaptation to new datasets.
Includes planned attention visualization tools for chemical contexts, aiding in model interpretability and research, as mentioned in the features and Todo list.
The README admits the library is currently primarily notebooks, with model implementation and visualization code pending updates, making it less stable for immediate production use.
Beyond notebooks and tutorials, there is minimal formal documentation or community support, which could hinder adoption and troubleshooting for complex workflows.
Exclusively designed for SMILES representations, limiting its applicability to other chemical data formats without significant modification or additional preprocessing.