A deep learning model using transformer architecture to predict compound-protein interactions from molecular and protein sequences.
TransformerCPI is a deep learning model that predicts interactions between chemical compounds and proteins using transformer architecture and sequence data. It addresses the challenge of identifying potential drug candidates by analyzing molecular structures and protein sequences. The model incorporates self-attention mechanisms and label reversal experiments to improve prediction accuracy and reliability.
Bioinformaticians, computational chemists, and drug discovery researchers working on compound-protein interaction prediction and virtual screening.
Developers choose TransformerCPI because it provides state-of-the-art sequence-based interaction prediction without extensive feature engineering, offers pre-trained models for immediate use, and includes robust validation through label reversal experiments.
TransformerCPI: Improving compound–protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments(BIOINFORMATICS 2020) https://doi.org/10.1093/bioinformatics/btaa524
Uses transformer architecture to model long-range dependencies in sequences, as demonstrated in the model diagram and paper, enabling capture of complex biochemical patterns without manual feature engineering.
Incorporates label reversal experiments in the test set, a unique approach mentioned in the README that enhances prediction reliability by testing against reversed interaction labels.
Provides trained models and data sets with train/test splits in the 'data' directory, allowing researchers to start predictions immediately without collecting or preprocessing data from scratch.
Works directly with molecular SMILES strings and protein amino acid sequences, reducing dependency on extensive feature engineering and aligning with modern deep learning trends in bioinformatics.
Requires Python 3.6 and RDKit 2019.03.3.0, which are outdated and may conflict with modern environments or other libraries, complicating setup and maintenance.
README lacks detailed tutorials, API references, or troubleshooting guides, offering only basic setup and usage notes, which hinders adoption for users unfamiliar with the codebase.
Data sets are provided as .7z files, requiring additional tools for extraction and lacking clear instructions for integrating custom datasets, adding unnecessary complexity.
Focused on research with no built-in deployment tools, web interfaces, or APIs, making it unsuitable for integration into production drug discovery pipelines without significant extra work.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.