An improved HyperLogLog implementation with LogLog-Beta bias correction, sparse representation, and flexible precision for cardinality estimation.
axiomhq/hyperloglog is a Go implementation of the HyperLogLog algorithm for approximating the number of distinct elements in large datasets or data streams. It solves the count-distinct problem where exact counting would require prohibitive memory, providing memory-efficient cardinality estimation with controlled accuracy. The implementation includes several improvements over basic HyperLogLog, including bias correction and sparse representation.
Developers and data engineers working with large-scale data systems who need to estimate unique counts in streaming data, network traffic analysis, or database query optimization. Particularly useful for Go applications requiring probabilistic data structures.
This implementation offers enhanced accuracy through LogLog-Beta bias correction, flexible precision configuration, and memory-efficient sparse representation while maintaining simplicity and order-independent operations. It's a production-ready library that balances performance, accuracy, and ease of use better than many basic HyperLogLog implementations.
HyperLogLog with lots of sugar (Sparse, LogLog-Beta bias correction and TailCut space reduction) brought to you by Axiom
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses the LogLog-Beta algorithm for improved accuracy across all cardinalities, as described in the README's current implementation details, reducing estimation errors compared to basic HyperLogLog.
Optimizes memory for lower cardinalities similar to HyperLogLog++, as highlighted in key features, ensuring efficient usage without sacrificing performance in streaming applications.
Supports configurable register counts from 2^4 to 2^18, allowing users to balance accuracy and memory based on specific use cases, with memory scaling from 16 bytes to 256 KB.
Provides consistent results regardless of data insertion or merging order, ensuring reliability in distributed or parallel processing environments, as noted in the key features.
Employs Metro hash function for better performance and distribution, as mentioned in the implementation, enhancing speed in large-scale data processing.
Inherently provides approximate counts with a margin of error, making it unsuitable for applications requiring exact cardinality, such as precise inventory management or auditing.
As a Go-specific library, it cannot be directly used in projects with other programming languages, limiting its portability and requiring additional integration efforts for polyglot systems.
While efficient, memory usage can reach up to 256 KB for maximum precision (2^18 registers), which may be prohibitive for embedded systems or scenarios with many concurrent sketches.
Does not include native serialization or persistence features, requiring users to implement custom solutions for saving and loading sketches, adding complexity for long-term storage.