An open platform for training, serving, and evaluating large language model based chatbots.
FastChat is an open-source platform for training, serving, and evaluating large language model (LLM) based chatbots. It provides the full pipeline to develop models like Vicuna, deploy them with a web interface and APIs, and benchmark their performance against other LLMs. It solves the problem of needing a unified, scalable toolkit for the end-to-end lifecycle of conversational AI systems.
AI researchers, machine learning engineers, and developers who are building, fine-tuning, or deploying open-source large language models for chatbot applications. It's particularly useful for teams needing production-ready serving and evaluation tools.
Developers choose FastChat because it integrates training, serving, and evaluation into a single, battle-tested platform. Its unique value lies in powering the community-standard Chatbot Arena for LLM benchmarking and providing OpenAI-compatible APIs, making it easy to replace proprietary services with self-hosted, open-source models.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Integrates training (e.g., Vicuna), serving with web UI and APIs, and evaluation (MT-Bench, Chatbot Arena) in one platform, as outlined in the core features.
Powers Chatbot Arena with over 10 million chat requests and 1.5M human votes, providing a trusted community leaderboard for LLM performance.
Runs on diverse backends including GPU, CPU, Apple Silicon, Intel XPU, and Ascend NPU, with optimizations like 8-bit compression for memory efficiency, detailed in the inference section.
Offers RESTful APIs that mimic OpenAI's, allowing seamless integration with existing tools and libraries, as described in the API documentation.
Requires launching separate processes for controller, model workers, and web server, which adds operational complexity and potential points of failure, as seen in the serving instructions.
The base serving backend uses huggingface/transformers and is acknowledged to be slow, necessitating additional setups like vLLM integration for high-throughput scenarios.
Fine-tuning relies on external data (ShareGPT dataset not released), and the provided dummy data limits reproducibility without sourcing custom datasets.