Showing 10 of 10 projects
A C/C++ library for efficient, cross-platform LLM inference with extensive hardware support and quantization.
A high-throughput, memory-efficient inference and serving engine for large language models (LLMs).
Run large language models (LLMs) privately on everyday desktops and laptops without requiring API calls or GPUs.
Official inference framework for 1-bit LLMs, enabling fast and lossless CPU/GPU inference with significant speed and energy efficiency gains.
A Python library for building production-ready model inference APIs, job queues, and multi-model serving systems for AI applications.
A fast, flexible, and hardware-aware LLM inference engine with zero-config support for any Hugging Face model.
An AI-native proxy and data plane for agentic applications, providing built-in orchestration, safety, observability, and smart LLM routing.
A fast and comprehensive machine learning framework for Java, Scala, and Kotlin with state-of-the-art algorithms and data visualization.
A lightweight, single-binary Rust inference server providing 100% OpenAI-API compatible endpoints for local GGUF models.
A self-learning vector database with graph intelligence, local AI, and PostgreSQL integration, built for real-time adaptation.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.