A JAX library for rapid prototyping of large-scale attention-based vision models across images, video, audio, and multimodal data.
Scenic is a JAX library focused on research and development of large-scale, attention-based models for computer vision and beyond. It solves the problem of rapid prototyping for vision models by providing shared, optimized libraries for common tasks and hosting complete project implementations. It is extensively used for building models that handle images, video, audio, and multimodal data.
Researchers and engineers in computer vision and multimodal AI who need a flexible, high-performance codebase for developing and experimenting with large-scale attention-based models using JAX.
Developers choose Scenic for its clean design philosophy favoring simplicity, its comprehensive set of optimized libraries for large-scale training, and its extensive collection of state-of-the-art model implementations and baselines, all built on the efficient JAX/Flax stack.
Scenic: A Jax Library for Computer Vision Research and Beyond
Provides scalable input pipelines and training loops designed for multi-host setups, efficiently handling data division, caching, and prefetching as outlined in the dataset_lib and train_lib modules.
Hosts fully-fleshed projects for SOTA models like ViT, DETR, and CLIP, offering reproducible baselines for easy experimentation and benchmarking, as detailed in the projects directory.
Supports model development across images, video, audio, and their combinations, enabling advanced multimodal research with shared libraries, evidenced by projects like PolyViT and AVATAR.
Emphasizes forking and copy-pasting over unnecessary abstraction, making code straightforward to understand and modify for rapid prototyping, as per the project's stated philosophy.
The forking-centric approach can lead to maintenance challenges and duplicated efforts across projects, since functionality is only upstreamed to shared libraries after proving widely useful.
Built entirely on JAX and Flax, locking users into this ecosystem, which has a steeper learning curve and less mature tooling compared to PyTorch for some developers.
Lacks polish for production deployment, with documentation and features centered on experimental use rather than enterprise integration, such as model serving or detailed deployment guides.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Trax — Deep Learning with Clear Code and Speed
Flax is a neural network library for JAX that is designed for flexibility.
Flax is a neural network library for JAX that is designed for flexibility.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.