2017 Cloud Computing Conference Hbase special session, Alibaba technical expert Tian Mu brought ali-hbase SQL practice and improvement speech. This paper mainly starts from why SQL is needed, and then explains SQL on Hbase, and then focuses on sharing the optimization and improvement of Ali-hbase SQL, and finally prospects for the future. Here are the highlights:

Why do YOU need SQL?




Hash Hash




Points barrels










HBase Native API implementation




SQL on HBase



















Goods report




Iot device information storage







Ali-Hbase SQL




Performance optimization

The goal is to optimize the performance of simple requests to the maximum, with a performance difference of less than 5% between the HBase Native API. In single-line read and write scenarios, the DIFFERENCE between SQL and HBase apis is obvious. Client-side metadata cache, metadata: column names, data types, table attributes, index information, and so on. Metadata update policy: We do not refresh metadata every time. We refresh metadata periodically, identifying the latest version by the version number, and updating the latest version if it is not the latest. This is an optimized cache update strategy for UPSERT.







Future jobs

The future is definitely column name mapping support, ImmutableDataEncoding support, we are currently investigating, in the case of large wide tables can save 1/3-1/2 storage space. But there is a limitation, this data can only be written, not changed. We need to optimize the function of the user to apply for SQL client, this is a very disgusting thing; Support query Server Mode and thin client, solve the problem of product iteration, users can enjoy our improvements without upgrading; Support distributed Sequence, and eventually we will also be able to distribute SQL capabilities; Optional index consistency, asynchronous global secondary index. In some scenarios, users do not need strong consistency, such as logging, and eventually it is OK to be consistent within 1 minute, so we do an asynchronous global update, and the update cost is further reduced.