A comprehensive benchmark suite for evaluating protein fitness prediction models using deep mutational scanning and clinical variant data.
ProteinGym is a large-scale benchmark suite for evaluating protein fitness prediction models. It provides curated datasets from deep mutational scanning experiments and annotated human clinical variants to enable standardized comparisons of computational methods that predict how mutations affect protein function. The project addresses the need for rigorous, reproducible evaluation in protein engineering and variant interpretation.
Computational biologists, bioinformaticians, and machine learning researchers developing or applying models for protein fitness prediction, variant effect analysis, and protein design.
Developers choose ProteinGym because it offers a comprehensive, community-maintained benchmark with diverse datasets, standardized metrics, and an extensive leaderboard of state-of-the-art baselines, enabling fair model comparisons and accelerating research in protein fitness prediction.
Official repository for the ProteinGym benchmarks
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Includes ~2.7M missense variants across 217 DMS assays for substitutions and ~300k mutants for indels, providing a diverse and large-scale dataset for evaluation as detailed in the Overview.
Uses Spearman, AUC, MCC, NDCG, and Top-K recall for zero-shot and supervised settings, ensuring thorough model assessment across different regimes, as specified in the Results section.
Hosts an interactive website with performance rankings and detailed files, enabling easy comparison and reproducibility, highlighted in the Key Features and Results.
Openly accepts new assays and baselines through GitHub issues and PRs with clear criteria, fostering collaborative development as described in the 'How to contribute?' section.
Requires downloading multiple large files (e.g., 17.8GB for clinical MSAs), configuring paths in scripts, and running command-line tools, which can be daunting for new users, as outlined in the Usage and reproducibility section.
Only supports open-source models that can score all mutants in benchmarks, excluding proprietary methods and potentially narrowing the benchmark's scope, as stated in the 'New baselines' criteria.
Supervised model training code is housed in a separate repository (ProteinNPT) and not fully integrated into this project, noted in the 'Notes' section under contributions.