Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Computational Biology
  3. CONCH

CONCH

NOASSERTIONPython

A vision-language foundation model for computational pathology, pretrained on 1.17M histopathology image-caption pairs for diverse AI tasks.

GitHubGitHub
512 stars52 forks0 contributors

What is CONCH?

CONCH is a vision-language foundation model for computational pathology that learns from histopathology images paired with biomedical text captions. It solves the problem of label scarcity in medical AI by enabling powerful zero-shot and few-shot transfer to tasks like classification, segmentation, and retrieval without task-specific training. The model is pretrained on 1.17 million image-caption pairs, the largest dataset of its kind in histopathology.

Target Audience

Researchers and developers in computational pathology, medical AI, and digital pathology who need a versatile foundation model for building and evaluating diagnostic tools, slide analysis systems, or multimodal pathology workflows.

Value Proposition

Developers choose CONCH because it offers state-of-the-art performance across a wider range of pathology tasks compared to vision-only models, supports non-H&E stains effectively, and minimizes benchmark data contamination risks due to its pretraining data curation.

Overview

Vision-Language Pathology Foundation Model - Nature Medicine

Use Cases

Best For

  • Zero-shot classification of histopathology image tiles or whole slide images
  • Cross-modal retrieval between pathology images and biomedical text
  • Building multimodal AI pipelines for computational pathology research
  • Developing tools for pathology image captioning or report generation
  • Feature extraction for weakly-supervised learning on histopathology data
  • Benchmarking new pathology AI models against a strong foundation model baseline

Not Ideal For

  • Commercial applications requiring model monetization or integration into paid products
  • Real-time clinical diagnosis systems needing ultra-low latency inference on edge devices
  • Projects exclusively using non-histopathology medical imaging (e.g., radiology, dermatology)
  • Tasks demanding full captioning or report generation, as decoder weights are excluded from public release

Pros & Cons

Pros

Multimodal Vision-Language Integration

Processes both histopathology images and biomedical text, enabling cross-modal tasks like retrieval and captioning, as demonstrated by its pretraining on 1.17M image-caption pairs.

Broad Zero-Shot Transfer

Achieves state-of-the-art performance on 14 diverse benchmarks including classification and segmentation, reducing reliance on extensive labeled data.

Non-H&E Stain Compatibility

Produces performant representations for IHCs and special stains, unlike models trained only on H&E images, as highlighted in the README.

Minimal Data Contamination Risk

Pretrained without using large public slide collections like TCGA, making it safer for benchmarking on public or private datasets without leakage concerns.

Cons

Restricted Commercial Use

Licensed under CC-BY-NC-ND, prohibiting commercial use without prior approval, which limits industry adoption and practical deployment.

Incomplete Model Weights

Publicly released weights exclude the multimodal decoder, affecting full captioning capabilities as noted in the README, despite vision and text encoders being intact.

Complex Setup Process

Requires Hugging Face access token, manual weight download, and environment setup, adding overhead compared to plug-and-play models.

Benchmark Performance Variability

While SOTA on many tasks, it underperforms on some benchmarks like EBRAINS-C compared to UNI, indicating task-specific strengths and weaknesses.

Frequently Asked Questions

Quick Stats

Stars512
Forks52
Contributors0
Open Issues14
Last commit1 year ago
CreatedSince 2023

Tags

#biomedical-nlp#bioimage-analysis#zero-shot-learning#medical-ai#image-retrieval#computational-pathology#nlp-machine-learning#histopathology#medical-imaging#multimodal-ai#bioimage-informatics#digital-pathology#health-informatics#foundation-model#pathology

Built With

t
timm
H
Hugging Face Transformers
P
PyTorch

Included in

Computational Biology122
Auto-fetched 1 hour ago

Related Projects

totalVItotalVI

Deep probabilistic analysis of single-cell and spatial omics data

Stars1,652
Forks466
Last commit1 day ago
scGPTscGPT

scGPT is a foundation model designed for single-cell multi-omics data analysis using generative AI. It leverages transformer architecture pretrained on millions of single-cell profiles to enable a wide range of downstream biological tasks, advancing computational biology by providing a powerful, unified model for cellular data. ## Key Features - **Pretrained Model Zoo** — Offers multiple organ-specific and whole-human models trained on millions of cells for various applications. - **Zero-Shot Applications** — Supports tasks like cell embedding and reference mapping without task-specific training. - **Reference Mapping** — Enables fast similarity search across millions of cells using efficient indexing with faiss. - **Multi-Task Fine-Tuning** — Can be adapted for scRNA-seq integration, cell type annotation, perturbation prediction, and GRN inference. - **Online Tools** — Provides accessible web applications for reference mapping, cell annotation, and GRN inference via cloud GPUs. ## Philosophy scGPT aims to build a foundational AI model for single-cell biology, democratizing access to advanced computational methods and accelerating discoveries in multi-omics research through open-source collaboration.

Stars1,592
Forks335
Last commit2 months ago
UNIUNI

Pathology Foundation Model - Nature Medicine

Stars752
Forks87
Last commit1 year ago
GigaPathGigaPath

Prov-GigaPath: A whole-slide foundation model for digital pathology from real-world data

Stars621
Forks104
Last commit1 year ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub