Hello, I’m Koha

Recently, I used the Hbase database in my work, and also made a record of Hbase knowledge to share with everyone. In fact, there are a lot of content systems in Hbase. Here are some technical points that Kohwa thinks will be used in his work. I hope it can help you.

It can be said that the Internet is built on all kinds of databases, now there are several mainstream databases: MySQL as the representative of the relational database and its distributed solution, Redis as the representative of the cache database, ES as the representative of the retrieval database, and then distributed persistent KV database. In the open source field, especially in China, HBase is almost the preferred solution for distributed persistent KV database. HBase is used in a wide range of business scenarios, such as user profiling, real-time (offline) recommendations, real-time risk control, social Feed streams, product history orders, social chats, monitoring systems, and user behavior logs.

preface

Each of us, no matter what technology products we use, will produce a large amount of data, and the storage and query of these data is actually difficult for small databases to meet our needs, so HBase distributed big data emerged. HBase is a column-oriented database management system built on top of the Hadoop file system. HBase is a data model similar to Google’s Big Table. As a part of the Hadoop ecosystem, HBase stores data on HDFS. Clients can access data on HDFS randomly through HBase. It has the following features:

Complex transactions are not supported, only row-level transactions are supported, that is, single-row data reads and writes are atomic;

Because HDFS is used as the underlying storage, it supports structured, semi-structured, and unstructured storage like HDFS.

Horizontal scaling by adding machines;

Support data sharding;

Support automatic failover between RegionServers;

Easy to use Java client API;

Support for BlockCache and Bloom filters;

Filters support predicate pushdown.

Principle of HBase

concept

HBase is a distributed, column-oriented, open source database (or column family, to be exact). HDFS provides reliable underlying data storage services for Hbase. MapReduce provides high-performance computing capabilities for Hbase. Zookeeper provides stable services and a Failover mechanism for Hbase. That’s why we say Hbase is a distributed database solution for high-speed storage and access of massive amounts of data on a large number of cheap machines.

The column type storage

Let’s first look at row-by-row storage in the previous relational database. The diagram below:

It can be seen that only the first row ID: 1 of Kohwa’s data is filled in, but the data of Kohna and Kohashi are not filled in. In our row structure, it’s all fixed, every row is the same, and if you don’t fill it in, you have to leave it blank. You can’t leave it out.

Take a look at a rendering of storage by column using a non-relational database:

You can see that the previous column of the feathered data corresponds to the current row of the feathered data, and the original seven columns of the feathered data are now seven rows. The previous seven rows of data in one row shared a primary key ID: 1. In column storage, there are seven rows, each with a primary key corresponding to it, which is why Kohwa’s primary key ID: 1 is repeated seven times. The biggest advantage of this arrangement is that we do not need to add data, which will greatly save our space resources. Because selection rules in queries are defined by columns, the entire database is automatically indexed.

NoSQL and relational database comparison

For example:

RDBMS vs. Hbase

Hbase stores data based on column families. There can be a large number of columns under a column family, which must be specified at table creation time. To better understand Hbase column families, the following are simple relational database tables and Hbase database tables:

Major differences:

HBase architecture

Hbase consists of Client, Zookeeper, Master, HRegionServer, and HDFS.

Client

The Client uses the HBase RPC mechanism to communicate with HMaster and HRegionServer. The Client communicates with the HMaster for management and the HRegion Server for data operation.

Zookeeper

Zookeeper is used by Hbase to monitor the high availability of master, RegionServer, access metadata, and maintain cluster configurations. The specific work is as follows:

  1. Zoopkeeper is used to ensure that only one master is running in the cluster. If the master is abnormal, a new master is generated to provide services through the competition mechanism

  2. Zoopkeeper is used to monitor the status of The RegionServer. When the RegionSevrer is abnormal, the RegionServer is notified of the upper and lower limits by calling back

  3. Use Zoopkeeper to store metadata.

When using hbase on clients, add the IP address and node path of ZooKeeper to set up a connection with ZooKeeper. The following code describes how to set up a connection:

Hmaster

The main responsibilities of the Master node are as follows:

  1. The RegionServer is assigned a Region

  2. Maintain load balancing for the entire cluster

  3. Maintenance cluster metadata information, discover the invalid Region, and allocate the invalid Region to the normal RegionServer. When the RegionServer fails, coordinate the Hlog splitting

HRegionServer

HRegionServer internally manages a series of HRegion objects. Each HRegion corresponds to the storage of a ColumnFamily in a Table. That is, a Store manages a ColumnFamily (CF) in a Region. Each Store contains a MemStore and 0 to multiple StoreFiles. Store is the HBase storage core and consists of MemStore and StoreFile.

HLog

Data is written to the Write Ahead Log. The Write operation logs of all regions of each HRegionServer service are stored in the same Log file. Data is not directly written to the HDFS. Data is written in batches after a certain amount of data is cached. After the data is written, it is marked in the log.

MemStore

MemStore is an ordered memory cache. Data written to the MemStore is first flushed into a StoreFile when the MemStore is full. When the number of storefiles reaches a certain threshold, Triggers Compact merge to merge multiple Storefiles into a single StoreFile. StoreFile StoreFiles merged gradually form is more and more big, when all StoreFiles within the Region (Hfile) the total size of the over threshold (hbase. Hregion. Max. Filesize) that trigger Split the Split, Split the current Region into two regions. The parent Region goes offline. The two child regions from the newly Spilt Region are allocated to the appropriate HRegionServer by HMaster. In this way, the pressure from one Region is transferred to two regions.

Region addressing mode

Through zookeeper.META, the main steps are as follows:

  1. The Client requests ZK to obtain the data. META. Address of the RegionServer.

  2. The Client request. META. The RegionServer obtains the address of the RegionServer where the data is accessed. META. Information is cached for the next quick access.

  3. The Client requests the RegionServer where the data resides to obtain the required data.

HDFS

HDFS provides the final underlying data storage service for Hbase and supports high availability (Hlog is stored in HDFS) for Hbase.

HBase components

The Column Family Column Family

Column Family is also called Column Family. Hbase uses Column Family to store data. A Column Family can contain any number of columns to achieve flexible data access. Column families must be specified when an Hbase table is created. Just as a relational database must specify specific columns when it is created. The number of column families in Hbase is recommended to be smaller than or equal to 3. The scenario we use is typically 1 column family.

Rowkey

The concept of a Rowkey is the same as that of a primary key in mysql. Hbase uses a Rowkey to uniquely identify a row of data. Hbase supports only three query modes: RowKey-based single-row query, Rowkey-based range scan, and full table scan.

Region partition

Region: The concept of a Region is similar to that of a partition or sharding of a relational database. Hbase allocates data of a large table to different regions based on rowkeys. Each Region is responsible for data access and storage in a certain range. In this way, even a large table has low latency because it is cut into different regions.

TimeStamp versions

TimeStamp is the key to realizing multiple Hbase versions. In Hbase, different timestame values are used to identify data of different versions corresponding to the same rowkey row. When data is written, if the user does not specify a timestamp, Hbase automatically adds a timestamp that is consistent with the server time. In Hbase, data of the same Rowkey is arranged by timestamp in reverse order. By default, the latest version is queried. Users can read the data of the old version by specifying the value of timestamp.

Hbase write logic

Hbase Write Process

There are three main steps:

  1. Client Obtains the RegionServer of the Region to which data is written

  2. Requests are written to Hlog, which is stored in the HDFS. When RegionServer is abnormal, Hlog is used to restore data.

  3. The request to write to MemStore is completed only when both Hlog and MemStore are successfully written. The MemStore will be gradually flushed to the HDFS.

MemStore brush set

To improve Hbase write performance, write requests written to MemStore do not flush disks immediately. But will wait for a certain time to brush disk operation. What are the specific scenarios that trigger disk flushing? This can be summed up in the following scenarios:

  1. This global parameter controls the overall memory usage, triggering a flush when all memstores account for the largest percentage of the heap. This parameter is hbase regionserver. Global. Memstore. UpperLimit, 40% of the whole heap memory by default. But this does not mean that the global memory trigger brush disk operating all the MemStore will lose, but by another parameter hbase. The regionserver. Global. MemStore. LowerLimit to control, the default is 35% of the whole heap memory. When flush to all memstores accounts for 35% of the heap memory, the flush stops. This is mainly to reduce the impact of disk flushing on services and to smooth the system load.

  2. When the size of the MemStore reach hbase) hregion) MemStore. Flush. The size is triggered when the size of brush set, the default size of 128 MB

  3. Hlogs are used to ensure Hbase data consistency. If there are too many Hlogs, the recovery time will be too long. Therefore, Hbase sets a maximum number of Hlogs. When the number of hlogs reaches the maximum, the disk is forcibly flushed. This parameter is hase regionserver. Max. Logs, default is 32.

  4. You can manually flush operations using the hbase Shell or Java API.

  5. After RegionServer is shut down, the disk flushing operation is triggered. After all data is flushed, you do not need to use Hlog to recover data.

  6. When RegionServer is faulty, the Region in the RegionServer is migrated to another normal RegionServer. After the Region data is restored, the Region disk flushing is triggered. Services can be accessed only after the disk flushing is complete.

The middle tier of the HBase

Phoenix is an open source SQL middle tier for HBase that allows you to manipulate data on HBase using standard JDBC. Before Phoenix, if you wanted to access HBase, you could only call its Java API, but the HBase API was too complex compared to using a single row of SQL to query data. The idea of Phoenix is to put SQL SQL back in NOSQL, which means that you can use standard SQL to perform operations on HBase data. This also means that you can operate HBase by integrating with common persistence layer frameworks such as Spring Data JPA or Mybatis.

The Phoenix query engine converts SQL queries into one or more HBase Scans that are executed in parallel to generate standard JDBC result sets. By directly using the HBase API and using coprocessors and custom filters, it provides performance in milliseconds for small data queries and seconds for tens of millions of rows of data. Phoenix also has features that HBase does not have, such as secondary index. Phoenix is the best SQL middle layer of HBase because of the above advantages.

HBase Installation and Usage

Download the HBase compression package and decompress it

Run the hbase-env.sh file to configure JAVA_HOME:

Configure hbase – site. XML:

Change the host name to your own to start HBase. The Web page is as follows:

HBase command

Here are some examples of Hbase that Kohwa has compiled

Command:

Use HBase API

The API is as follows:

The following is an example:

HBase Application Scenarios

Object storage System

Medium Object Storage (HBase MOB) is a new feature introduced in hbase-2.0.0. It is used to solve the problem of low performance of Medium files (0.1-10M) stored in HBase. This feature is suitable for storing images, documents, PDFS, and videos in Hbase.

OLAP storage

Kylin uses HBase storage at the bottom level because of its high concurrency and massive storage capacity. Kylin’s cube construction process will generate a large amount of pre-aggregated intermediate data, which has a high data inflation rate and a high requirement on the storage capacity of the database.

Phoenix is an SQL engine built on HBase. Phoenix can directly call JDBC interface to operate HBase. Although there is upsert operation, it is mostly used in OLAP scenarios.

Time series data

OpenTsDB is an upper-layer HBase application that records and displays indicator values at each point in time. It is generally used in monitoring scenarios.

User portrait system

Dynamic column, sparse column characteristics. The number of dimensions used to describe a user’s characteristics is variable and may grow dynamically (such as hobbies, gender, address, etc.), and not every feature dimension will have data.

Message/order system

Strong consistency: Good read performance. Hbase can ensure strong consistency.

Feed streams are stored by the system

Feed flow systems with more read and less write, simple data model, high concurrency, peak-trough access, persistent reliable storage, and message ordering, such as HBase’s Lexicographical ordering of rowkeys, are suitable for this scenario.

Hbase optimization

Partition in advance

By default, a Region partition is automatically created when HBase tables are created. When data is imported, all HBase clients write data to this Region. The Region is split until it is large enough. One method to speed up the batch writing is to create empty Regions in advance. In this way, when data is written to HBase, load balancing is performed among clusters based on the Region partition.

Rowkey optimization

Rowkeys in HBase are stored in lexicographical order. Therefore, when designing Rowkeys, make full use of the sorting feature to store data that is frequently read together and data that may be accessed recently together.

In addition, if rowkeys are generated in ascending order, it is recommended that the reverse mode be used instead of the positive order to write rowkeys directly, so that rowkeys are distributed evenly. The advantage of this design is that the load of RegionServer can be balanced. Otherwise, it is easy to generate the phenomenon that all new data is piled on one RegionServer, which can be designed together with the pre-sharding of table.

Reduce the number of column families

Do not define too many ColumnFamilies in a single table. Currently, Hbase cannot process tables with more than two or three Columnfamilies. When a ColumnFamily is flushed, adjacent Columnfamilies are flushed due to the correlation effect, resulting in more I/O.

Caching strategies

When creating a table, can pass HColumnDescriptor. SetInMemory (true) put the table in the RegionServer cache, ensure that at the time of reading a cache hit.

Set the storage lifetime

When creating a table, can pass HColumnDescriptor. SetTimeToLive (int timeToLive) set the storage lifetime data in the table, outdated data will automatically be deleted.

The hard drive configuration

If each RegionServer manages 10 to 1000 Regions and each Region is in the range of 1 to 2 gb, the minimum value for each Server is 10 gb, the maximum value is 2TB (1000 x 2 gb), and the maximum value is 6TB (3 backup). The first solution is to use three 2TB hard disks, and the second is to use 12 500GB hard disks. When the bandwidth is sufficient, the latter can provide greater throughput, more granular redundancy backup, and faster recovery of single disk failure.

Allocate proper memory to RegionServer service

As long as it doesn’t affect other services, the bigger the better. For example, add export HBASE_REGIONSERVER_OPTS=” -XMx16000M $HBASE_REGIONSERVER_OPTS “to the end of hbase-env.sh in the HBase conf directory. 16000M indicates the memory size allocated to RegionServer.

Number of write data backups

The number of backups is proportional to read performance and inversely proportional to write performance, and the number of backups affects high availability. There are two configuration methods. One is to copy HDFS -site. XML to the hbase conf directory and add or change the value of configuration item dfs.replication to the number of backup files to be set. The change takes effect on all hbase user tables. When creating a table, the backup number of column families is set. The default value is 3. This backup number takes effect only for the set column families.

WAL (write ahead log)

This function is enabled by default. This function improves the system performance. However, data may be lost if the system is faulty (RegionServer that is inserted is down). When calling JavaAPI write, set WAL for the Put instance by calling put.setwritetowal (Boolean).

Batch write

Put in HBase supports single insert or batch insert. In general, batch write is faster and saves round-trip network overhead. When the JavaAPI is called by the client, the bulk of the Put is Put into a Put list, and then the HTable’s Put(Put list) function is called to write the bulk.

The last

When you understand HBase, you can find that the design of HBase is very similar to that of Elasticsearch. For example, the design of the Flush&Compact mechanism of HBase is the same as that of Elasticsearch. Therefore, it is easy to understand.

In essence, HBase is positioned as a distributed storage system and Elasticsearch is a distributed search engine. The two are not the same, but they are complementary. HBase has limited search capabilities and supports only rowkey-based indexes. Advanced features such as secondary indexes need to be developed. Therefore, there are cases where HBase and Elasticsearch combine to implement the storage + search capability. Use HBase to compensate for the storage capability of Elasticsearch and the search capability of Elasticsearch.

It’s not just HBase and Elasticsearch. Any kind of distributed framework or system, they all have certain commonalities, the difference lies in their different concerns. Kohwa’s feeling is that when learning distributed middleware, you should first understand its core concerns, then compare other middleware, extract commonalities and features, and further deepen your understanding.