A Clojure/Java library for streaming, one-pass histograms that approximate data distributions for learning, visualization, and analysis.
Histogram is a Clojure/Java library that implements streaming, one-pass histograms for approximating data distributions. It allows developers to build, merge, and query histograms efficiently in memory-constrained environments, supporting both numeric and categorical target tracking. The library is designed for real-time data analysis and scalable machine learning pipelines.
Data engineers and scientists working with large-scale streaming data, particularly those implementing distributed machine learning algorithms or needing efficient data summarization for visualization and analysis in Clojure or Java ecosystems.
Developers choose Histogram for its streaming capabilities, merge-friendly design enabling parallel processing, and configurable performance optimizations, offering a practical balance between accuracy and resource usage for real-time data approximation.
Streaming Histograms for Clojure/Java
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Processes data in a single pass with constant memory using a fixed number of bins, ideal for large-scale streaming data, as demonstrated with 200K normal distribution samples in the README.
Histograms can be built independently and combined via `merge!`, enabling distributed algorithms, shown by merging 300 histograms for improved density estimation.
Supports numeric and categorical targets to capture correlations, with examples like tracking sine functions and category counts for enhanced analysis.
Offers tunable parameters like `:freeze` for stationary data and `:reservoir` choices (`:tree` or `:array`), with README benchmarks showing up to 2x speed improvements.
Relies on bin merging and assumptions (e.g., points distributed evenly around bin means), leading to errors in sums and densities, especially with low bin limits as shown in fractional sum examples.
Core implementation is in Java with a Clojure-focused wrapper, limiting use in non-JVM environments and requiring familiarity with Clojure for full feature access.
Requires careful configuration of bins, gap weighting, freeze points, and reservoirs for optimal performance, which can be overwhelming without deep expertise, as noted in performance trade-offs.