How do I use GenePT for cell type annotation?

GenePT provides example notebooks for cell-level tasks, such as using pre-computed embeddings to annotate cells in datasets like hPancreas or Myeloid, with steps detailed in the repository's tutorial section.

GenePT vs scGPT: which is better for single-cell analysis?

GenePT is more efficient as it avoids training by using text embeddings, but scGPT may excel in expression-specific tasks since it's trained on gene expression data; the choice depends on whether you prioritize speed or expression-based accuracy.

Does GenePT require internet access to work?

Yes, GenePT relies on OpenAI API for generating embeddings unless using pre-computed data, so internet access is essential for live computations, as noted in the gene_embeddings_examples.ipynb tutorial.

How to compute embeddings with GenePT for custom genes?

Use the request_ncbi_text_for_genes.ipynb script to extract summaries and gene_embeddings_examples.ipynb to generate embeddings via OpenAI API, but this requires API setup and may incur costs.

What are the costs of using GenePT with OpenAI API?

Costs vary based on the embedding model (e.g., text-embedding-ada-002 or text-embedding-3-large) and the number of genes, with pricing details available on OpenAI's website, as referenced in the README.

Can GenePT handle batch effect removal in large datasets?

Yes, examples like the aorta_data_analysis.ipynb show how to use GenePT embeddings for batch effect removal, but performance may depend on dataset size and quality of gene summaries.

GenePT — Single-Cell Foundation Model

What is GenePT?

GenePT is a foundation model for single-cell biology that uses ChatGPT embeddings of NCBI gene descriptions to address gene-level and cell-level tasks. It provides an efficient alternative to traditional models by leveraging existing literature knowledge without requiring extensive data curation or additional pre-training. The model generates embeddings for genes and cells that can be used for classification, annotation, and analysis tasks.

Target Audience

Bioinformaticians, computational biologists, and researchers working with single-cell RNA-seq data who need efficient tools for gene and cell analysis. It is particularly useful for those looking to leverage large language model embeddings in biological contexts.

Value Proposition

GenePT offers a simple yet effective approach that avoids the resource-intensive training and data curation required by other single-cell foundation models. It achieves competitive performance using pre-existing knowledge from scientific literature, making it accessible and efficient for a wide range of downstream biological tasks.

Overview

GenePT is a foundation model for single-cell biology that leverages ChatGPT embeddings of NCBI gene descriptions to perform gene-level and cell-level tasks. It offers an efficient alternative to traditional models that require extensive data curation and resource-intensive training from gene expression profiles.

Key Features

Gene Embeddings — Uses GPT-3.5 embeddings of NCBI gene summary texts to represent genes.
Cell Embeddings — Generates single-cell embeddings by averaging gene embeddings weighted by expression or creating sentence embeddings from ordered gene names.
Efficient Approach — Eliminates the need for dataset curation and additional pre-training, making it user-friendly.
Competitive Performance — Achieves comparable or superior performance to existing single-cell foundation models in tasks like gene property classification and cell type annotation.
Pre-computed Data — Provides readily available datasets including extracted NCBI gene summaries and pre-computed OpenAI embeddings.

Philosophy

GenePT demonstrates that using large language model embeddings of scientific literature is a straightforward and effective approach for developing biological foundation models, complementing traditional expression-based methods.

GenePT

What is GenePT?

Overview

Key Features

Philosophy

Related Projects

Found a gem we're missing?

Use Cases

Best For

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions