Showing 12 of 12 projects
A C/C++ library for efficient, cross-platform LLM inference with extensive hardware support and quantization.
A high-throughput, memory-efficient inference and serving engine for large language models (LLMs).
Run large language models (LLMs) privately on everyday desktops and laptops without requiring API calls or GPUs.
Official inference framework for 1-bit LLMs, enabling fast and lossless CPU/GPU inference with significant speed and energy efficiency gains.
A Python library for building production-ready model inference APIs, job queues, and multi-model serving systems for AI applications.
A fast, flexible, and hardware-aware LLM inference engine with zero-config support for any Hugging Face model.
An AI-native proxy and data plane for agentic applications, providing built-in orchestration, safety, observability, and smart LLM routing.
A fast and comprehensive machine learning framework for Java, Scala, and Kotlin with state-of-the-art algorithms and data visualization.
A lightweight, single-binary Rust inference server providing 100% OpenAI-API compatible endpoints for local GGUF models.
A self-learning vector database with graph intelligence, local AI, and PostgreSQL integration, built for real-time adaptation.
A library for running LLMs locally and efficiently on any device with support for Python, Flutter, and Godot.
A production-ready deep learning framework for Go that enables training and deploying neural networks as single binaries with a PyTorch-like API.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.