A JAX-based framework for training large language models with a focus on legibility, scalability, and reproducibility.
Levanter is a framework for training large language models and other foundation models with a focus on legibility, scalability, and reproducibility. It solves the problem of complex, hard-to-follow training code by using named tensors (via Haliax) and JAX, while ensuring deterministic outcomes and efficient scaling across hardware.
Researchers and engineers training large language models who need reproducible, scalable, and maintainable training pipelines, especially those working with JAX and distributed systems.
Developers choose Levanter for its combination of human-readable code via Haliax, competitive performance with commercial frameworks, and strong guarantees of reproducibility, making it ideal for rigorous research and production training.
Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses Haliax for composable, easy-to-follow code that maintains high performance, making complex model training more legible and debuggable.
Ensures reproducible results even after preemption and resumption on TPUs, which is critical for rigorous scientific research and reliable experiments.
Supports TPUs and GPUs with FSDP and tensor parallelism, rivaling commercial frameworks like MosaicML Composer in performance for large models.
Imports and exports models, tokenizers, and datasets via SafeTensors, facilitating seamless use of the HF ecosystem within JAX-based pipelines.
Includes the Sophia optimizer for potentially 2x faster training than Adam, offering state-of-the-art optimization for faster convergence.
GPU support is documented as 'in-progress,' indicating less stability and optimization compared to TPU support, which may lead to setup challenges.
Only supports a fixed list of models (e.g., GPT-2, Llama, Gemma), limiting flexibility for custom or novel architectures outside the predefined set.
Requires precise JAX installation and hardware-specific configuration, which can be daunting for new users and slow down initial deployment.
Heavily reliant on JAX and Haliax, making it less accessible for teams familiar with PyTorch or other frameworks, and potentially causing vendor lock-in.
Levanter is an open-source alternative to the following products:
MosaicML Composer is an open-source library for efficient deep learning training, providing methods to speed up model training and reduce computational costs.
Google MaxText is a JAX-based language model implementation designed for training and inference at scale, optimized for TPUs and GPUs.