A compressed bitmap data structure for Java that outperforms alternatives like WAH, EWAH, and Concise in speed and compression.
RoaringBitmap is a Java library implementing compressed bitmap data structures that provide fast set operations while using significantly less memory than traditional bitsets. It solves the problem of efficiently representing and manipulating large sets of integers in applications like database indexing and data analytics.
Developers and engineers working on data-intensive systems such as databases, search engines, and analytics platforms (e.g., Apache Spark, Druid, Pinot) who need high-performance set operations on large integer datasets.
RoaringBitmap offers superior compression and faster operations compared to alternatives like WAH, EWAH, and Concise, with added benefits like random access and memory-mapped support, making it a preferred choice in production systems.
A better compressed bitset in Java: used by Apache Spark, Netflix Atlas, Apache Pinot, Tablesaw, and many others
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Divides data into 65,536-integer chunks and selects the best representation (array, bitmap, or runs) per chunk, ensuring efficient compression and speed across varied distributions as highlighted in the README's scientific documentation.
Enables quick membership checks via binary search within chunks without decompressing the entire bitmap, unlike run-length-encoded alternatives like WAH or EWAH, which is a key advantage for intersection operations.
Provides ImmutableRoaringBitmap backed by ByteBuffer for off-heap storage, allowing large datasets to reside outside the Java heap, as used in systems like Apache Druid for performance benefits.
Widely adopted in major systems like Apache Spark and Netflix Atlas for years, with a mature codebase and serialization format specification ensuring interoperability and stability.
Offers two different 64-bit implementations (Roaring64NavigableMap and Roaring64Bitmap) with distinct underlying data structures, forcing developers to make nuanced performance trade-offs without clear guidance from the README.
Requires manual calls to validate() after deserialization from untrusted sources to ensure data integrity, adding boilerplate code and potential for oversight in security-critical applications.
Using the buffer package for memory-mapped bitmaps incurs some performance cost compared to standard in-memory bitmaps due to ByteBuffer access, as noted in the README's warning about mixing packages.