A high-performance Presto connector for querying HBase with 10-100x faster performance than other open-source alternatives.
Analysys Presto-HBase-Connector is a high-performance connector that enables Presto to query HBase databases using SQL. It solves the problem of slow analytical queries on HBase by implementing optimizations like predicate pushdown, salted tables, and client-side scanning, resulting in 10-100x faster performance than other open-source connectors.
Data engineers and analysts who need to run fast SQL queries on HBase data within a Presto ecosystem, particularly those dealing with large-scale event or log data stored in HBase.
Developers choose this connector for its exceptional performance gains, advanced optimization features like salt-based sharding and batch gets, and its ability to handle write operations (INSERT/DELETE) while maintaining compatibility with Presto's SQL interface.
presto hbase connector 组件基于Presto Connector接口规范实现,用来给Presto增加查询HBase的功能。相比其他开源版本的HBase Connector,我们的性能要快10到100倍以上。
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Delivers 10-100x faster query speeds than other open-source HBase connectors, as shown in performance tests with 5 million records.
Supports salted tables for sharding, predicate pushdown to HBase, and batch gets, reducing data transfer and enabling parallel execution.
Uses ClientSide RegionScanner to directly scan HDFS files, cutting RegionServer load and boosting full-table scan performance by over 30%.
Provides INSERT and DELETE capabilities with RowKey specification, allowing data manipulation directly through SQL, as detailed in the README.
Requires manual creation of JSON files for table metadata, including RowKey formats and ranges, which adds overhead and risk of errors.
CREATE TABLE functionality is listed as 'SUPPORT LATER', meaning users cannot dynamically create tables via SQL, limiting flexibility.
Relies on PrestoSql 315+ and can have conflicts with libraries like Guava, requiring custom fixes and recompilation, as noted in troubleshooting.
Enabling ClientSide scanning involves managing HBase snapshots, dealing with compression codec issues, and periodic cleanup to avoid hitting the 65,536 snapshot limit.