A library for running LLMs locally and efficiently on any device with support for Python, Flutter, and Godot.
NobodyWho is a library that allows developers to run large language models (LLMs) locally and efficiently on any device. It solves the problem of dependency on cloud-based AI services by providing offline, private, and cost-free inference capabilities. The library supports multiple programming environments including Python, Flutter, and Godot, making it versatile for various application types.
Developers building applications that require local AI inference, such as mobile app creators using Flutter, game developers using Godot, and Python developers needing embedded LLM capabilities.
Developers choose NobodyWho for its guaranteed perfect tool calling, infinite conversation length support, and ability to deploy optimized native code across multiple platforms without licensing fees. Its integration with llama.cpp ensures compatibility with a wide range of GGUF models.
NobodyWho is an inference engine that lets you run LLMs locally and efficiently on any device.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Runs LLMs locally without API calls, ensuring data privacy and eliminating ongoing costs, as emphasized in the README's local execution focus.
Automatically derives grammars from function signatures for guaranteed perfect tool integration, simplifying development without manual configuration.
Uses conversation-aware preemptive context shifting to prevent memory loss in long dialogues, enabling seamless extended interactions.
Supports optimized native code for Windows, Linux, macOS, and Android, allowing deployment on diverse devices from the same codebase.
Leverages Vulkan or Metal for super-fast GPU-powered inference, improving performance for resource-intensive models as highlighted in the features.
Lacks current support for iOS and web exports, with issues #114 and #111 acknowledging these as future work, restricting its use in some environments.
Restricted to GGUF format models via llama.cpp, requiring conversion for other formats, which can add overhead and limit model selection flexibility.
Requires manual download and local management of model files, unlike cloud services with instant access, adding complexity to deployment and updates.
Inference speed is constrained by local GPU availability and hardware specs, which may not match the scalability of cloud-based solutions.