Gpu Optimization

5 projects

Showing 5 of 5 projects

A high-throughput, memory-efficient inference and serving engine for large language models (LLMs).

A unified deep learning system for efficient large-scale model training and inference with advanced parallelism strategies.

An open-source inference serving platform for deploying AI models from multiple frameworks across cloud, data center, and edge devices.

An LLM acceleration library for Intel XPU (GPU, NPU, CPU) to speed up local inference and finetuning of popular models.

A Python library for building production-ready model inference APIs, job queues, and multi-model serving systems for AI applications.

Related Tags

Community-curated · Updated weekly · 100% open source

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.