How to set up FastChat for multi-GPU inference?

Use the --num-gpus flag in the CLI command, and optionally --max-gpu-memory to balance memory allocation. The README provides examples for aggregating GPU memory across multiple cards, such as using 2 GPUs for Vicuna-7B.

FastChat vs vLLM for serving LLMs?

FastChat integrates vLLM for high-throughput serving but its default backend is more compatible across models. vLLM is optimized for pure inference speed, while FastChat offers a full suite including training and evaluation tools.

How to fine-tune Vicuna with my own data?

Replace the data_path in the training command with your JSON file formatted like dummy_conversation.json. The fine-tuning section includes hyperparameters and tips for handling out-of-memory issues on different GPUs.

Does FastChat support OpenAI API for local models?

Yes, FastChat provides OpenAI-compatible RESTful APIs that act as a drop-in replacement, allowing you to serve local models like Vicuna with the same interface as OpenAI services, detailed in the API docs.

What hardware is needed to run FastChat on CPU only?

It requires around 30GB of CPU memory for Vicuna-7B and 60GB for Vicuna-13B, with optional Intel AI Accelerator support for acceleration. The CPU-only section specifies these memory requirements.

How to evaluate my chatbot with MT-Bench?

Use the instructions in the fastchat/llm_judge directory to run MT-bench, which automates evaluation by prompting strong LLMs like GPT-4 to judge responses based on multi-turn questions.

FastChat

Apache-2.0Pythonv0.2.36

An open platform for training, serving, and evaluating large language model based chatbots.

GitHub

What is FastChat?

FastChat is an open-source platform for training, serving, and evaluating large language model (LLM) based chatbots. It provides the full pipeline to develop models like Vicuna, deploy them with a web interface and APIs, and benchmark their performance against other LLMs. It solves the problem of needing a unified, scalable toolkit for the end-to-end lifecycle of conversational AI systems.

Target Audience

AI researchers, machine learning engineers, and developers who are building, fine-tuning, or deploying open-source large language models for chatbot applications. It's particularly useful for teams needing production-ready serving and evaluation tools.

Value Proposition

Developers choose FastChat because it integrates training, serving, and evaluation into a single, battle-tested platform. Its unique value lies in powering the community-standard Chatbot Arena for LLM benchmarking and providing OpenAI-compatible APIs, making it easy to replace proprietary services with self-hosted, open-source models.

Overview

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Use Cases

Best For

Training and fine-tuning open-source LLM chatbots like Vicuna

Related Projects

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub

39.5k stars4.8k forks0 contributors

Deploying multiple LLMs with a unified web UI and REST API

Benchmarking chatbot performance using MT-Bench and human evaluation

Self-hosting an OpenAI API-compatible endpoint for local models

Running LLM inference across diverse hardware (GPU, CPU, Mac Metal, Intel XPU)

Creating a local version of Chatbot Arena for side-by-side model battles

Not Ideal For

Teams wanting a fully managed, no-code chatbot platform without infrastructure setup
Projects requiring only lightweight, single-model inference without evaluation or multi-model serving
Environments with strict latency requirements where distributed process overhead is unacceptable
Developers needing extensive GUI customization for non-technical end-users

Pros & Cons

Pros

End-to-End LLM Pipeline

Integrates training (e.g., Vicuna), serving with web UI and APIs, and evaluation (MT-Bench, Chatbot Arena) in one platform, as outlined in the core features.

Proven Benchmarking Platform

Powers Chatbot Arena with over 10 million chat requests and 1.5M human votes, providing a trusted community leaderboard for LLM performance.

Flexible Hardware Support

Runs on diverse backends including GPU, CPU, Apple Silicon, Intel XPU, and Ascend NPU, with optimizations like 8-bit compression for memory efficiency, detailed in the inference section.

OpenAI API Compatibility

Offers RESTful APIs that mimic OpenAI's, allowing seamless integration with existing tools and libraries, as described in the API documentation.

Cons

Complex Serving Architecture

Requires launching separate processes for controller, model workers, and web server, which adds operational complexity and potential points of failure, as seen in the serving instructions.

Default Performance Limitations

The base serving backend uses huggingface/transformers and is acknowledged to be slow, necessitating additional setups like vLLM integration for high-throughput scenarios.

Data Dependency for Training

Fine-tuning relies on external data (ShareGPT dataset not released), and the provided dummy data limits reproducibility without sourcing custom datasets.

Frequently Asked Questions

Last commit25 days ago

Open Interpreter

A natural language interface for computers

Stars63,845

Forks5,551

Last commit4 days ago

AgentGPT

🤖 Assemble, configure, and deploy autonomous AI Agents in your browser.

Stars36,161

Forks9,330

Last commit1 year ago

TaskMatrix

TaskMatrix is a system that integrates ChatGPT with a collection of visual foundation models, enabling multimodal conversations that include both text and images. It allows users to perform complex visual tasks through natural language dialogue, bridging the gap between large language models and specialized visual AI. ## Key Features - **Multimodal Chat** — Send and receive images within a ChatGPT conversation interface. - **Visual Foundation Model Integration** — Leverages models for image captioning, generation, editing, segmentation, and visual question answering. - **Template System** — Pre-defined execution flows that help ChatGPT assemble complex tasks involving multiple models without additional training. - **Flexible Model Loading** — Specify which visual models to load and their GPU/CPU assignments via command-line arguments. - **Community Extensible** — Designed for community contributions to add new visual models and capabilities. ## Philosophy TaskMatrix aims to combine the broad general knowledge of large language models with the deep domain expertise of visual foundation models to create an AI capable of handling a wide variety of tasks.

Stars34,091

Forks3,221

Last commit2 years ago

#large-language-models

#llm-evaluation

#openai-api-compatible