Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Computational Biology
  3. TransformerCPI

TransformerCPI

Apache-2.0Python

A deep learning model using transformer architecture to predict compound-protein interactions from molecular and protein sequences.

GitHubGitHub
156 stars39 forks0 contributors

What is TransformerCPI?

TransformerCPI is a deep learning model that predicts interactions between chemical compounds and proteins using transformer architecture and sequence data. It addresses the challenge of identifying potential drug candidates by analyzing molecular structures and protein sequences. The model incorporates self-attention mechanisms and label reversal experiments to improve prediction accuracy and reliability.

Target Audience

Bioinformaticians, computational chemists, and drug discovery researchers working on compound-protein interaction prediction and virtual screening.

Value Proposition

Developers choose TransformerCPI because it provides state-of-the-art sequence-based interaction prediction without extensive feature engineering, offers pre-trained models for immediate use, and includes robust validation through label reversal experiments.

Overview

TransformerCPI: Improving compound–protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments(BIOINFORMATICS 2020) https://doi.org/10.1093/bioinformatics/btaa524

Use Cases

Best For

  • Predicting potential drug-target interactions for virtual screening
  • Analyzing molecular mechanisms of compound-protein binding
  • Building deep learning pipelines for cheminformatics applications
  • Research on transformer architectures for biological sequence data
  • Developing computational tools for drug discovery workflows
  • Studying protein-ligand interaction patterns using sequence information

Not Ideal For

  • Teams needing real-time predictions for high-throughput virtual screening
  • Researchers without GPU access or computational resources for training
  • Projects requiring highly interpretable models for regulatory or mechanistic insights
  • Developers seeking plug-and-play solutions with comprehensive documentation and APIs

Pros & Cons

Pros

Advanced Self-Attention Mechanism

Uses transformer architecture to model long-range dependencies in sequences, as demonstrated in the model diagram and paper, enabling capture of complex biochemical patterns without manual feature engineering.

Robust Validation via Label Reversal

Incorporates label reversal experiments in the test set, a unique approach mentioned in the README that enhances prediction reliability by testing against reversed interaction labels.

Pre-trained Models and Curated Datasets

Provides trained models and data sets with train/test splits in the 'data' directory, allowing researchers to start predictions immediately without collecting or preprocessing data from scratch.

Sequence-Based Prediction Focus

Works directly with molecular SMILES strings and protein amino acid sequences, reducing dependency on extensive feature engineering and aligning with modern deep learning trends in bioinformatics.

Cons

Outdated and Specific Dependencies

Requires Python 3.6 and RDKit 2019.03.3.0, which are outdated and may conflict with modern environments or other libraries, complicating setup and maintenance.

Sparse and Minimal Documentation

README lacks detailed tutorials, API references, or troubleshooting guides, offering only basic setup and usage notes, which hinders adoption for users unfamiliar with the codebase.

Cumbersome Data Handling

Data sets are provided as .7z files, requiring additional tools for extraction and lacking clear instructions for integrating custom datasets, adding unnecessary complexity.

No Production-Ready Features

Focused on research with no built-in deployment tools, web interfaces, or APIs, making it unsuitable for integration into production drug discovery pipelines without significant extra work.

Frequently Asked Questions

Quick Stats

Stars156
Forks39
Contributors0
Open Issues0
Last commit3 years ago
CreatedSince 2020

Tags

#transformer#cheminformatics#deep-learning#computational-biology#drug-discovery#sequence-modeling#bioinformatics#pytorch

Built With

R
RDKit
p
pandas
G
Gensim
P
Python
N
NumPy
P
PyTorch

Included in

Computational Biology122
Auto-fetched 1 day ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub