A performant JAX reimplementation of the UniRep model for generating protein sequence representations.
jax-unirep is a Python library that reimplements the UniRep model, a deep learning model for generating numerical representations (embeddings) of protein sequences. It solves the problem of efficiently computing these representations for downstream tasks in protein engineering and analysis, using JAX for accelerated performance.
Computational biologists, bioinformaticians, and researchers in protein engineering who need high-performance tools for protein sequence featurization and machine learning workflows.
Developers choose jax-unirep for its performance gains from JAX, its self-contained and easily customizable design, and its utility APIs tailored for protein engineering, offering a modern alternative to the original implementation.
Reimplementation of the UniRep protein featurization model.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Leverages JAX for GPU-accelerated computations, providing faster inference and training compared to the original implementation, as highlighted in the project description.
Includes all model components without external dependencies on the original codebase, making it easy to integrate into custom research workflows, as stated in the README.
Offers utility APIs tailored for protein engineering pipelines, supporting streamlined and extensible workflows, which is a key feature mentioned in the documentation.
Encourages contributions with clear guidelines and is documented for extensions, fostering an open-source ecosystem, as noted in the contributing section.
Model weights are under a non-commercial CC BY-NC 4.0 license, limiting use in commercial projects without additional permissions, which could hinder adoption.
Requires a specific compute environment with modern Linux or macOS and GLIBC>=2.23 for JAX, as mentioned in the installation notes, creating barriers for some users.
Documentation is available but relies on external links and preprints, potentially lacking comprehensive tutorials for those new to protein representation learning.