How do I reduce the computational time for protein editing in ChatDrug?

Use the --fast_protein flag when running the command, as mentioned in the usage section, to enable the optimized mode that accelerates retrieval and evaluation processes for protein tasks.

What's the difference between ChatDrug and in-context learning for drug editing?

ChatDrug uses retrieval-augmented generation and domain feedback for more accurate edits, while in-context learning relies solely on few-shot prompting without external data; the framework includes both via separate scripts like main_ChatDrug.py and main_InContext.py for comparison.

Can I use ChatDrug without an OpenAI API key?

No, ChatDrug requires an OpenAI API key for conversational LLMs, as specified in the usage instructions where you must provide it in ChatDrug/task_and_evaluation/Conversational_LLMs_utils.py, limiting offline or cost-free use.

How accurate is ChatDrug compared to traditional computational chemistry methods?

Accuracy is task-dependent and reported in the ICLR 2024 paper with improvements over baselines, but it's designed for iterative, human-in-the-loop design rather than fully automated high-throughput screening, so real-world performance may vary.

Is there a way to extend ChatDrug to support other molecular types like RNA?

Currently, ChatDrug only supports small molecules, peptides, and proteins; extending to other types would require modifying the retrieval database, evaluation modules, and potentially the LLM prompts, which is not documented.

ChatDrug

Python

A conversational AI framework for editing small molecules, peptides, and proteins using retrieval-augmented generation and domain feedback.

Visit Website GitHub

What is ChatDrug?

ChatDrug is a research framework that uses large language models (LLMs) for conversational editing of drug molecules, including small molecules, peptides, and proteins. It combines retrieval-augmented generation with domain-specific feedback to iteratively refine molecular structures based on natural language instructions and biochemical property evaluations.

Target Audience

Computational chemists, bioinformaticians, and AI researchers working on drug discovery and protein engineering who need interactive tools for molecular design and optimization.

Value Proposition

It uniquely integrates conversational AI with domain-aware feedback mechanisms, enabling more intuitive and guided drug editing compared to traditional computational methods or standalone LLMs without biochemical grounding.

Overview

LLM for Drug Editing, ICLR 2024

Use Cases

Best For

Iteratively optimizing small molecule structures for improved binding affinity
Designing therapeutic peptides with specific MHC binding properties

Related Projects

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub

161 stars10 forks0 contributors

Engineering protein sequences for enhanced stability or function

Prototyping drug candidates through natural language interaction

Comparing retrieval-augmented generation against in-context learning for molecular editing

Academic research on AI-assisted drug discovery pipelines

Not Ideal For

Teams needing high-throughput, automated molecular screening without iterative human input
Organizations without access to biochemical domain experts to interpret and act on feedback mechanisms
Budget-constrained projects that cannot afford ongoing costs from OpenAI API usage
Applications requiring real-time molecular editing due to computational delays in retrieval and evaluation

Pros & Cons

Pros

Retrieval-Enhanced Accuracy

Uses a knowledge base to augment LLM responses, improving the relevance of molecular edits, as implemented in the retrieval-augmented generation module that pulls from curated datasets.

Biochemical Property Guidance

Incorporates domain feedback like binding affinity and solubility evaluations, directing edits toward desired therapeutic profiles, evidenced by integration with MHCFlurry for peptides and ProteinDT for proteins.

Unified Multi-Type Support

Handles small molecules, peptides, and proteins in a single framework, allowing versatile drug editing tasks without switching tools, as shown in the supported task types and evaluation metrics.

Optimized Protein Processing

Offers a fast mode for protein editing to reduce computational overhead, addressing performance bottlenecks with the --fast_protein flag that accelerates retrieval and evaluation steps.

Cons

Complex Initial Setup

Requires extensive environment configuration with conda, multiple pip installs, and manual downloads of datasets and models from Hugging Face, making deployment non-trivial for new users.

External API Dependency

Relies on OpenAI API for conversational LLMs, introducing costs and potential downtime risks, with API key setup mandatory in the utility file as per usage instructions.

Limited Evaluation Scope

Evaluation is tied to specific tools like RDKit and MHCFlurry, which may not cover all biochemical properties needed for comprehensive drug design, restricting flexibility in assessment.

Frequently Asked Questions

Home

Computational Biology

BioGPT

BioGPT is a generative pre-trained transformer model specifically designed for biomedical text generation and mining. It leverages large-scale biomedical literature to understand and generate domain-specific text, enabling advanced natural language processing applications in healthcare and life sciences. ## Key Features - **Biomedical Pre-training** — Trained on PubMed abstracts and articles for domain-specific language understanding. - **Text Generation** — Generates coherent biomedical text, such as research summaries or hypothesis descriptions. - **Relation Extraction** — Identifies relationships between biomedical entities like drug-target interactions. - **Question Answering** — Answers biomedical questions based on contextual knowledge from literature. - **Document Classification** — Classifies biomedical documents into relevant categories. - **Hugging Face Integration** — Available through the transformers library for easy deployment and experimentation. ## Philosophy BioGPT focuses on bridging the gap between general-purpose language models and domain-specific needs by providing a model that understands the nuances and terminology of biomedical literature.

Stars4,489

Forks481

Last commit2 years ago

ClawBio

🦖 ClawBio - The first bioinformatics-native AI agent skill library. Local-first. Reproducible. Open. Free.

Code and data for GeneGPT.

Stars428

Forks34

Last commit1 year ago

GenePT

GenePT is a foundation model for single-cell biology that leverages ChatGPT embeddings of NCBI gene descriptions to perform gene-level and cell-level tasks. It offers an efficient alternative to traditional models that require extensive data curation and resource-intensive training from gene expression profiles. ## Key Features - **Gene Embeddings** — Uses GPT-3.5 embeddings of NCBI gene summary texts to represent genes. - **Cell Embeddings** — Generates single-cell embeddings by averaging gene embeddings weighted by expression or creating sentence embeddings from ordered gene names. - **Efficient Approach** — Eliminates the need for dataset curation and additional pre-training, making it user-friendly. - **Competitive Performance** — Achieves comparable or superior performance to existing single-cell foundation models in tasks like gene property classification and cell type annotation. - **Pre-computed Data** — Provides readily available datasets including extracted NCBI gene summaries and pre-computed OpenAI embeddings. ## Philosophy GenePT demonstrates that using large language model embeddings of scientific literature is a straightforward and effective approach for developing biological foundation models, complementing traditional expression-based methods.

Stars321

Forks47

Last commit2 years ago

#large-language-models

#drug-discovery

#chatgpt

#retrieval-augmented-generation

#ai-research

#conversation

#bioinformatics

#computational-chemistry

Computational Biology122