How do I run ProteinGym benchmarks on my own model?

Download the datasets from the Resources section, update paths in the config script, and run the provided performance scripts like performance_substitutions.sh. Detailed steps are in the 'Usage and reproducibility' section, which covers merging scores and calculating metrics.

What protein fitness models are included in the ProteinGym leaderboard?

Over 30 state-of-the-art models such as ESM, EVE, Tranception, and ProteinMPNN, covering MSA, single-sequence, and structure-based approaches. The full list is in the Results section with references and input modalities.

ProteinGym vs ProteinNet: which benchmark is better for mutation effect prediction?

ProteinGym focuses specifically on fitness prediction from deep mutational scanning and clinical variants, while ProteinNet is for protein structure prediction. Use ProteinGym for direct fitness evaluation, but consider your model's application—ProteinGym is more tailored to variant effect analysis.

How can I add a new assay to ProteinGym?

Raise a GitHub issue with a 'new_assay' label, ensuring the dataset is public, protein-related, has sufficient measurements, high dynamic range, and relevance to fitness prediction, as per the criteria in the 'How to contribute?' section.

Does ProteinGym support evaluation of indel mutations?

Yes, it includes a dedicated indel benchmark with ~300k mutants across 74 DMS assays, with similar metrics to substitutions. The data and scores are available for download, as mentioned in the Overview and Resources.

What are the main performance metrics used in ProteinGym?

For zero-shot DMS benchmarks, it uses Spearman, NDCG, AUC, MCC, and Top-K recall; for supervised settings, Spearman and MSE; and for clinical benchmarks, AUC. Aggregation methods are detailed in the Results section to avoid biases.

Open-Awesome

ProteinGym

MITHTMLPG_v1.3

A comprehensive benchmark suite for evaluating protein fitness prediction models using deep mutational scanning and clinical variant data.

Visit Website GitHub

442 stars58 forks0 contributors

What is ProteinGym?

ProteinGym is a large-scale benchmark suite for evaluating protein fitness prediction models. It provides curated datasets from deep mutational scanning experiments and annotated human clinical variants to enable standardized comparisons of computational methods that predict how mutations affect protein function. The project addresses the need for rigorous, reproducible evaluation in protein engineering and variant interpretation.

Target Audience

Computational biologists, bioinformaticians, and machine learning researchers developing or applying models for protein fitness prediction, variant effect analysis, and protein design.

Value Proposition

Developers choose ProteinGym because it offers a comprehensive, community-maintained benchmark with diverse datasets, standardized metrics, and an extensive leaderboard of state-of-the-art baselines, enabling fair model comparisons and accelerating research in protein fitness prediction.

Overview

Official repository for the ProteinGym benchmarks

Use Cases

Best For

Benchmarking new protein fitness prediction models against established baselines
Evaluating model performance on specific mutation types like substitutions or indels
Assessing predictive accuracy across different protein families and functional categories
Reproducing published results in protein variant effect prediction
Curating standardized datasets for training supervised protein fitness models
Comparing MSA-based, single-sequence, and structure-based prediction approaches

Not Ideal For

Teams developing proprietary protein design tools that cannot share model scores publicly
Researchers with limited computational resources or storage for multi-gigabyte datasets
Projects requiring real-time mutation effect prediction in production environments

Pros & Cons

Pros

Extensive Benchmark Coverage

Includes ~2.7M missense variants across 217 DMS assays for substitutions and ~300k mutants for indels, providing a diverse and large-scale dataset for evaluation as detailed in the Overview.

Comprehensive Performance Metrics

Uses Spearman, AUC, MCC, NDCG, and Top-K recall for zero-shot and supervised settings, ensuring thorough model assessment across different regimes, as specified in the Results section.

Public Leaderboard Transparency

Hosts an interactive website with performance rankings and detailed files, enabling easy comparison and reproducibility, highlighted in the Key Features and Results.

Community-Driven Contributions

Openly accepts new assays and baselines through GitHub issues and PRs with clear criteria, fostering collaborative development as described in the 'How to contribute?' section.

Cons

Complex Initial Setup

Requires downloading multiple large files (e.g., 17.8GB for clinical MSAs), configuring paths in scripts, and running command-line tools, which can be daunting for new users, as outlined in the Usage and reproducibility section.

Limited Model Inclusion

Only supports open-source models that can score all mutants in benchmarks, excluding proprietary methods and potentially narrowing the benchmark's scope, as stated in the 'New baselines' criteria.

Incomplete Code Integration

Supervised model training code is housed in a separate repository (ProteinNPT) and not fully integrated into this project, noted in the 'Notes' section under contributions.

Frequently Asked Questions

Related Projects

MOSES

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Stars979

Forks280

Last commit2 years ago

TAPE (Tasks Assessing Protein Embeddings)

Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology.

Stars740

Forks135

Last commit3 years ago

GuacaMol

Benchmarks for generative chemistry

Stars525

Forks99

Last commit2 years ago

scIB (Single-cell Integration Benchmarks)

Benchmarking analysis of data integration tools

Stars423

Forks76

Last commit2 months ago

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub

ProteinGym

MITHTMLPG_v1.3

A comprehensive benchmark suite for evaluating protein fitness prediction models using deep mutational scanning and clinical variant data.

Visit Website GitHub

442 stars58 forks0 contributors

What is ProteinGym?

Target Audience

Computational biologists, bioinformaticians, and machine learning researchers developing or applying models for protein fitness prediction, variant effect analysis, and protein design.

Value Proposition

Overview

Official repository for the ProteinGym benchmarks

Use Cases

Best For

Benchmarking new protein fitness prediction models against established baselines
Evaluating model performance on specific mutation types like substitutions or indels
Assessing predictive accuracy across different protein families and functional categories
Reproducing published results in protein variant effect prediction
Curating standardized datasets for training supervised protein fitness models
Comparing MSA-based, single-sequence, and structure-based prediction approaches

Not Ideal For

Teams developing proprietary protein design tools that cannot share model scores publicly
Researchers with limited computational resources or storage for multi-gigabyte datasets
Projects requiring real-time mutation effect prediction in production environments

Pros & Cons

Pros

Extensive Benchmark Coverage

Includes ~2.7M missense variants across 217 DMS assays for substitutions and ~300k mutants for indels, providing a diverse and large-scale dataset for evaluation as detailed in the Overview.

Comprehensive Performance Metrics

Uses Spearman, AUC, MCC, NDCG, and Top-K recall for zero-shot and supervised settings, ensuring thorough model assessment across different regimes, as specified in the Results section.

Public Leaderboard Transparency

Hosts an interactive website with performance rankings and detailed files, enabling easy comparison and reproducibility, highlighted in the Key Features and Results.

Community-Driven Contributions

Openly accepts new assays and baselines through GitHub issues and PRs with clear criteria, fostering collaborative development as described in the 'How to contribute?' section.

Cons

Complex Initial Setup

Limited Model Inclusion

Only supports open-source models that can score all mutants in benchmarks, excluding proprietary methods and potentially narrowing the benchmark's scope, as stated in the 'New baselines' criteria.

Incomplete Code Integration

Supervised model training code is housed in a separate repository (ProteinNPT) and not fully integrated into this project, noted in the 'Notes' section under contributions.

Frequently Asked Questions

Related Projects

MOSES

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Stars979

Forks280

Last commit2 years ago

TAPE (Tasks Assessing Protein Embeddings)

Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology.

Stars740

Forks135

Last commit3 years ago

GuacaMol

Benchmarks for generative chemistry

Stars525

Forks99

Last commit2 years ago

scIB (Single-cell Integration Benchmarks)

Benchmarking analysis of data integration tools

Stars423

Forks76