A C++ toolbox providing multiple locality-sensitive hashing algorithms for large-scale approximate nearest neighbor search, with Python and MATLAB bindings.
LSHBOX is a C++ toolbox that implements multiple locality-sensitive hashing (LSH) algorithms for approximate nearest neighbor search. It is designed to solve the problem of efficient similarity retrieval in high-dimensional data, such as images, by providing fast and scalable indexing and querying methods. The toolbox includes several state-of-the-art LSH variants and supports integration with Python and MATLAB.
Researchers and engineers working on large-scale image retrieval, similarity search, or machine learning applications that require efficient nearest neighbor queries in high-dimensional spaces. It is particularly useful for those who need to benchmark or deploy LSH algorithms in C++, Python, or MATLAB environments.
Developers choose LSHBOX because it consolidates multiple LSH algorithms into a single, easy-to-use toolbox with cross-language support, eliminating the need to implement these complex algorithms from scratch. Its focus on performance, binary data handling, and reusable indexes makes it a practical choice for production-grade similarity search systems.
A c++ toolbox of locality-sensitive hashing (LSH), provides several popular LSH algorithms, also support python and matlab.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Implements eight LSH variants including Random Hyperplane, p-Stable, and Spectral Hashing, providing a wide range of techniques for different use cases.
Offers native C++ interfaces with Python and MATLAB bindings, enabling integration into diverse development environments without rewriting core logic.
Supports saving and loading hash indexes to disk, allowing faster subsequent queries by avoiding re-indexing, as demonstrated in the Python and MATLAB examples.
Uses a compact binary format for datasets, improving I/O performance and reducing memory footprint, though it requires specific preprocessing.
Each algorithm exposes parameters like hash table size and binary code length for fine-grained control over accuracy-speed trade-offs, as detailed in the algorithm chapters.
Python bindings require Boost library, and compilation can be complex, especially for multi-language support, as noted in the CMake instructions.
Datasets must be converted to a specific binary format and zero-centered for some algorithms, adding significant preprocessing overhead that isn't user-friendly.
Last updates were in 2015, with incomplete testing for Mac and no mention of recent maintenance, raising compatibility concerns with modern systems.