A collection of transformer protein language models for predicting structure, function, and designing proteins from sequences.
ESM (Evolutionary Scale Modeling) is a collection of pretrained transformer language models specifically designed for protein sequences. It enables researchers to predict protein 3D structures, analyze variant effects, and design novel proteins using AI, directly from amino acid sequences without requiring multiple sequence alignments or experimental data.
Computational biologists, bioinformaticians, and protein engineers who need state-of-the-art AI tools for protein structure prediction, function analysis, and de novo protein design.
ESM provides open-source, cutting-edge protein language models that outperform other single-sequence methods, offer end-to-end structure prediction comparable to AlphaFold2, and enable novel protein design—all while being freely available and self-hostable.
Evolutionary Scale Modeling (esm): Pretrained language models for proteins
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
ESM-2 models outperform all tested single-sequence protein language models in structure prediction tasks, as evidenced by the comparison table showing top-L precision metrics.
Includes ESMFold for end-to-end structure prediction, ESM-1v for zero-shot variant effects, and ESM-IF1 for inverse folding, covering diverse protein engineering needs in one repository.
Provides pre-trained models freely with integrations like HuggingFace and ColabFold, plus a command-line interface for bulk processing, democratizing access to advanced tools.
Offers an open repository of hundreds of millions of predicted protein structures and embeddings, enabling exploratory research at scale without recomputation.
Running large models like ESM-2 15B requires substantial GPU memory; the README admits the need for CPU offloading and chunking to handle long sequences, which can slow inference.
Setting up ESMFold involves installing OpenFold with specific requirements like nvcc and python <= 3.9, leading to potential setup errors and compatibility issues.
The models are not optimized for low-latency predictions; command-line tools and API calls may be slow for high-throughput applications without dedicated infrastructure.