A JAX-based machine learning framework for configuring and training large-scale models with high efficiency on TPUs and GPUs.
Paxml (also known as Pax) is a JAX-based machine learning framework specifically designed for configuring and training large-scale models like transformer-based language models. It solves the problem of efficiently running complex ML experiments with advanced parallelization and high hardware utilization, particularly on TPU and GPU clusters.
ML researchers and engineers working on training large language models or other massive neural networks who need a flexible, high-performance framework for distributed training on TPU/GPU hardware.
Developers choose Paxml for its industry-leading Model FLOPs Utilization rates on TPU hardware, fully configurable experiment setup using Python dataclasses, and robust support for multi-slice distributed training—making it particularly suitable for cutting-edge large-scale model research and production.
Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimentation and parallelization, and has demonstrated industry leading model flop utilization rates.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Achieves state-of-the-art Model FLOPs Utilization on TPU v4 Pods for large language models, as shown in benchmark graphs scaling to trillions of parameters with weak scaling patterns.
Uses Python dataclass-based HParams for fully configurable experiments, enabling clean separation between setup and implementation with Fiddle integration for shared parameters and eager error checking.
Supports multi-slice distributed training across TPU pods with pjit and pmap parallelism, detailed in multislice setup guides and example runs for scaling to massive models.
Provides BaseInput abstractions compatible with SeqIO and Lingvo, automatically handling sharding and padding for eval data, as explained in the inputs documentation.
Requires extensive gcloud commands for TPU VM creation, multi-step installation with dependency management issues, and manual configuration for multislice environments, making onboarding tedious.
Heavily optimized for Google Cloud TPUs, with GPU support fragmented through a separate NVIDIA repository, limiting portability and increasing reliance on specific cloud infrastructure.
Assumes deep familiarity with JAX, distributed systems, and advanced ML concepts; documentation is spread across multiple resources and notebooks, lacking beginner-friendly guides.