A high-performance, scalable LLM library and reference implementation written in pure Python/JAX for training on TPUs and GPUs.
MaxText is a high-performance, scalable library and reference implementation for training large language models (LLMs) written in pure Python/JAX. It provides a collection of state-of-the-art models and efficient training pipelines for both pre-training from scratch and post-training techniques like supervised fine-tuning and reinforcement learning. The library is designed to achieve maximum hardware utilization on TPUs and GPUs while maintaining a simple, optimization-free codebase.
AI researchers and engineers who need to train or fine-tune large language models at scale, particularly those using Google Cloud TPUs or high-performance GPU clusters. It's also suitable for teams building custom LLMs for production or research who want a performant, open-source foundation.
Developers choose MaxText for its exceptional performance and scalability out-of-the-box, thanks to JAX's XLA compiler optimizations. Unlike many LLM frameworks, it achieves high Model FLOPs Utilization without requiring manual low-level tuning, while providing a comprehensive model library and support for both pre-training and advanced post-training techniques.
A simple, performant and scalable Jax LLM!
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Leverages JAX and XLA to achieve high Model FLOPs Utilization without manual low-level optimizations, as emphasized in the README's 'optimization-free' design philosophy.
Includes a comprehensive collection of models like Gemma, Llama, DeepSeek, and Qwen, supporting both dense and MoE architectures for versatile training options.
Efficiently scales pre-training and post-training across thousands of TPU/GPU chips, with documented support for SFT, GRPO, and GSPO on multi-host setups.
Extends beyond text to support vision-language models such as Gemma 3/4 and Llama 4 VLMs, enabling advanced AI applications as noted in the latest news.
Primarily optimized for Google Cloud TPUs, making performance and setup less straightforward on other hardware, despite decoupled mode efforts mentioned in the README.
Requires familiarity with JAX, Flax, Orbax, and Tunix, which can be a barrier for teams accustomed to PyTorch or TensorFlow ecosystems.
The README highlights recent restructuring and rapid updates, which may lead to instability or require constant adaptation for users, as seen in the news archive.