A server-side secondary index implementation for Apache HBase 0.94.8 using co-processors to enable efficient indexed queries.
hindex is a secondary index implementation for Apache HBase that enables efficient querying on columns other than the primary rowkey. It solves the problem of slow and inefficient scans in HBase by providing indexed access paths, allowing for faster equality and range queries. The solution is built entirely with server-side co-processors, requiring minimal changes to existing client applications.
Big data engineers and developers working with Apache HBase who need to perform efficient queries on non-primary key columns. It is particularly useful for teams managing large-scale HBase deployments where query performance is critical.
Developers choose hindex because it provides a seamless, server-side indexing solution that integrates directly with HBase's architecture. Unlike client-side indexing approaches, it requires no application code changes for basic operations and automatically selects the optimal index for queries, reducing complexity while improving performance.
Secondary Index for HBase
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Requires no changes to existing client code for puts, deletes, and scans, as server-side co-processors handle all index operations internally, per the README's usage section.
Intelligently selects the best index for scans by analyzing query filters, eliminating the need for manual index hints or client-side modifications.
Enables indexing across multiple columns, allowing efficient complex query patterns as specified in the key features.
Supports indexing during bulk data loading operations, integrating seamlessly with HBase's bulk load workflow to pre-build indexes.
Only compatible with Apache HBase 0.94.8, an old version, which limits its use in modern clusters and may require backporting efforts.
Requires manual editing of hbase-site.xml with multiple co-processor classes, increasing setup complexity and risk of misconfiguration.
Indexes cannot be added or dropped dynamically; table recreation is needed, as admitted in the future work section of the README.
Every put operation triggers index updates via co-processors, potentially slowing down write throughput and increasing latency.