A semantic cache library for LLM queries that reduces API costs by 10x and boosts response speed by 100x.
GPTCache is a semantic caching library for large language model (LLM) queries that stores and retrieves responses to reduce API costs and improve latency. It integrates seamlessly with services like OpenAI's ChatGPT, LangChain, and llama_index, allowing developers to cache similar queries and avoid redundant API calls. The library uses embedding algorithms and vector stores to enable semantic matching, significantly cutting down on expenses and speeding up responses.
Developers building applications with LLM APIs (e.g., ChatGPT) who face high costs and slow response times under heavy traffic. It's also suitable for teams needing a scalable caching solution for AI-powered services.
GPTCache stands out by offering semantic caching that goes beyond exact matches, dramatically reducing LLM API costs and improving performance. Its modular design allows extensive customization, and it integrates easily with popular LLM frameworks without requiring major code changes.
Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
Uses embedding algorithms and vector stores to cache semantically similar queries, not just exact matches, which dramatically increases cache hit rates and reduces API costs, as shown in the similar search cache example.
Acts as a drop-in replacement for OpenAI's API and integrates seamlessly with LangChain and llama_index, requiring only a few lines of code to activate, per the quick start examples.
Offers interchangeable components for embeddings, vector storage, cache management, and similarity evaluation, allowing developers to tailor the system to specific needs, highlighted in the modules section.
Provides hit ratio, latency, and recall metrics to optimize cache performance, with sample benchmarks available for tuning, as mentioned in the features.
The README warns that the project is under swift development with API subject to change, which can break existing implementations and require frequent updates.
Explicitly states no longer adding support for new LLM APIs, pushing developers to use generic get/set APIs, which may not cover model-specific features without custom work.
Enabling semantic caching requires configuring multiple components like embedding models and vector databases, adding initial overhead compared to simpler caching solutions.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.