Llm Serving

3 projects

Showing 3 of 3 projects

A high-throughput, memory-efficient inference and serving engine for large language models (LLMs).

A high-performance serving framework for large language models and multimodal models, delivering low-latency and high-throughput inference.

A Python library for building production-ready model inference APIs, job queues, and multi-model serving systems for AI applications.

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.