A blazing-fast, lightweight deep learning inference engine from Alibaba, optimized for on-device LLMs and Edge AI.
MNN is a deep learning inference engine developed by Alibaba, optimized for high-performance on-device AI. It enables efficient execution of neural networks on mobile phones, embedded devices, and PCs, supporting popular model formats and architectures. The project also includes MNN-LLM and MNN-Diffusion for local deployment of large language models and stable diffusion models.
Mobile and embedded developers, AI engineers, and researchers who need to deploy and run machine learning models efficiently on edge devices with limited resources.
Developers choose MNN for its battle-tested performance within Alibaba's ecosystem, its lightweight footprint, and comprehensive support for modern AI workloads—including LLMs and diffusion models—directly on-device without cloud dependency.
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Core library size is only ~800KB for Android and ~2MB linked executable increase for iOS, enabling deployment in size-constrained mobile and embedded environments.
Supports TensorFlow, Caffe, ONNX, and TorchScript with coverage for 178 TensorFlow OPs and 163 TorchScript OPs, facilitating easy integration of diverse models.
Provides cross-platform GPU inference via Metal for iOS, OpenCL/Vulkan for Android, and CUDA for NVIDIA GPUs, backed by optimized assembly for ARM/x64 CPUs.
Includes MNN-LLM and MNN-Diffusion for local deployment of large language models and stable diffusion models, as shown in recent updates like Qwen3.5 support.
Discussion groups are predominantly in Chinese, limiting accessible support and resources for English-speaking developers, as noted in the README.
NPU acceleration via NNAPI is rated B (buggy or not optimized), and CoreML/HIAI are only A, not deeply optimized, per the architecture support table.
Requires using MNN-Converter to transform models from other formats, adding complexity and potential for errors compared to drop-in inference engines.