A visual workflow-based AI deployment framework for multi-platform and multi-backend inference, supporting large models and edge devices.
nndeploy is an open-source AI deployment framework designed to simplify and accelerate the deployment of AI models across diverse platforms, including desktop, mobile, edge devices, and servers. It provides a visual workflow editor for drag-and-drop pipeline construction and supports multiple inference backends for high-performance execution.
AI engineers and developers who need to deploy and productionize AI models across heterogeneous environments, including desktop (Windows, macOS), mobile (Android, iOS), edge devices (NVIDIA Jetson, Ascend310B, RK), and servers (RTX series, T4, Ascend310P). It is also suitable for those deploying large models (10B+ parameters) like LLMs and AIGC generation models as a visual workflow tool.
Developers choose nndeploy for its combination of a visual, drag-and-drop workflow editor that simplifies pipeline construction with support for over 13 inference backends (like ONNXRuntime, TensorRT, OpenVINO) for flexible, high-performance execution across platforms. Its unique selling point is reducing the complexity of productionizing models by offering visual automation alongside deep performance optimizations like parallel execution and memory management.
一款简单易用和高性能的AI部署框架 | An Easy-to-Use and High-Performance AI Deployment Framework
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
The drag-and-drop workflow editor allows constructing, debugging, and deploying multi-node AI pipelines visually with real-time parameter adjustments, as shown in the GIFs and quick start guide.
Integrates over 13 inference frameworks like ONNXRuntime, TensorRT, and OpenVINO, enabling flexible deployment across diverse hardware from NVIDIA GPUs to Huawei Ascend chips.
Supports pipeline and task parallelism with memory optimizations like zero-copy, which in tests improved YOLOv11 inference speed by up to 57% with TensorRT.
Includes 100+ nodes for popular models like YOLO series, Stable Diffusion, and Segment Anything, reducing development time for common AI tasks.
Workflows export to JSON for integration via Python or C++ APIs, supporting deployment on Linux, Windows, macOS, Android, and iOS from a single configuration.
Default installation only includes ONNXRuntime and MNN; using other backends like TensorRT requires additional compilation in developer mode, which can be complex and time-consuming.
Adding custom models or nodes requires Python or C++ development, and the documentation for advanced features like memory optimization may be sparse for newcomers.
The visual editor and framework layers introduce overhead that might be unnecessary for straightforward, single-model deployments, making lighter alternatives more efficient.
Future plans like '端侧大模型推理' are listed, but reliance on community contributions could lead to instability or slow updates for niche requirements.