A learned index structure enabling fast lookups, range searches, and updates on billions of items with minimal space usage.
PGM-index is a learned data structure that provides fast lookup, predecessor, range search, and update operations on sorted arrays containing billions of items. It solves the problem of traditional indexes consuming excessive memory by using orders of magnitude less space while maintaining the same worst-case query time guarantees. The index uses a piecewise geometric model to approximate data distribution and enable efficient queries.
Developers and researchers working with large-scale datasets who need memory-efficient indexing for databases, search systems, or data-intensive applications. Particularly valuable for those implementing custom database engines or optimizing query performance on sorted data.
Developers choose PGM-index because it dramatically reduces memory usage compared to traditional indexes like B-trees while providing provable worst-case performance guarantees. Its header-only C++ implementation makes it easy to integrate, and it offers specialized variants for different use cases including dynamic updates, multidimensional data, and compressed storage.
🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses orders of magnitude less memory than traditional indexes like B-trees, as highlighted in the README for handling arrays with billions of items.
Ensures same worst-case query time as conventional structures, making it reliable for critical big data applications, as stated in the description.
Offers dynamic, multidimensional, compressed, and disk-based indexes to cater to different use cases, detailed in the classes overview.
Header-only library with a simple copy-and-include setup, requiring no installation, as shown in the Quickstart section.
Requires data to be sorted upfront, which can be a limitation for dynamic datasets, though DynamicPGMIndex helps but adds complexity.
The epsilon parameter controls space-time trade-offs and must be tuned for optimal performance, which can be non-trivial without the provided tuner.
As a research-oriented project, it has fewer production deployments and community support compared to established indexing libraries.