A Julia package providing efficient, type-safe implementations of numerous distance metrics and divergences between vectors and matrices.
Distances.jl is a Julia package for evaluating a comprehensive collection of distance metrics and divergences between numerical vectors and matrices. It solves the need for a performant, unified interface to compute mathematical distances—essential for statistics, machine learning, and data analysis—while providing significant speedups for batch operations on multi-dimensional data.
Julia developers and researchers working in data science, machine learning, statistics, or any field requiring quantitative similarity/dissimilarity measurements between datasets, such as clustering, classification, or spatial analysis.
Developers choose Distances.jl for its combination of an extensive, mathematically rigorous metric library with highly optimized computational routines. Its specialized column-wise and pairwise functions often provide order-of-magnitude speed improvements over manual implementations, making it the go-to package for efficient distance computations in the Julia ecosystem.
A Julia package for evaluating distances (metrics) between vectors.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Implements over 30 distances, from common metrics like Euclidean and Cosine to specialized divergences like Kullback-Leibler, covering a wide range of statistical and ML needs.
Provides colwise and pairwise functions that use BLAS and specialized algorithms, with benchmarks showing up to 90x speedups for metrics like CorrDist compared to naive loops.
Organizes distances into PreMetric, SemiMetric, and Metric types, enabling optimizations such as symmetry exploitation in pairwise computations for efficiency.
Offers colwise! and pairwise! functions for pre-allocated result storage, reducing memory allocations in performance-sensitive applications.
Includes weighted versions for key metrics like Euclidean and Minkowski, allowing for customized distance measures in applications like weighted clustering.
Euclidean and SqEuclidean distances use BLAS matrix multiplications that can introduce roundoff errors, requiring manual tolerance settings for accuracy, as noted in the README.
The package has deprecated argument orders for in-place functions, which will be removed in future releases, potentially breaking existing code.
Focused on numeric vectors and matrices, it cannot directly handle tables with mixed data types, necessitating additional packages like TableDistances.jl.
Peak performance relies on an optimized BLAS installation, which may not be consistent across all systems and could complicate setup.