A high-performance Java library for compressing arrays of integers, optimized for databases and information retrieval.
JavaFastPFOR is a high-performance integer compression library written in Java, designed to compress arrays of integers where most values use fewer than 32 bits or have small gaps between them. It solves the problem of efficiently storing and retrieving integer data in databases, inverted indexes, and column stores by offering speeds over 1.2 billion integers per second during decompression.
Developers and engineers working on databases, search engines, information retrieval systems, and data-intensive applications where fast integer compression is critical for performance and storage efficiency.
Developers choose JavaFastPFOR for its exceptional speed, outperforming generic compression codecs for integer arrays, and its proven integration in major projects like LinkedIn Pinot, Apache Parquet, and Terrier, ensuring reliability and real-world performance.
A low-level integer compression library in Java
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Achieves over 1.2 billion integers per second (4.5 GB/s), significantly outperforming generic codecs like Snappy and LZ4 for integer arrays, as benchmarked in the project.
Includes CODECs specifically for sorted integers using delta compression, making it ideal for database and search engine applications, as highlighted in the usage section.
Offers SIMD-based implementations using the Java Vector API (JDK 19+), providing significant speed improvements for advanced users, though it requires newer Java versions.
Used in major projects like LinkedIn Pinot, Apache Parquet, and Terrier, ensuring reliability and performance in production environments, as listed in the README.
Supports compressing and uncompressing data in chunks, allowing seamless integration into various data pipelines, demonstrated in the advancedExample.
While most codecs are thread-safe, some are not, requiring users to carefully check documentation and manage codec instances per thread to avoid issues, as noted in the thread safety section.
The vectorized implementation requires JDK 19 or later, and current development assumes JDK 21+, limiting adoption in environments with older Java versions, as stated in the requirements.
Exclusively designed for integer arrays, making it unsuitable for compressing other data types without additional processing or conversion, which is a clear restriction from the project description.
The API involves manual management of offsets and buffers (e.g., using IntWrapper), which can be error-prone and less intuitive compared to higher-level compression libraries, as seen in the usage example.