In the face of massive data storage, how to ensure the efficiency and stability of HBase clusters

On September 15, 2018, Deng Jie, senior big data engineer of Data Platform Department of Ping An Technology, delivered a speech titled “HBase Application and Practice” in “The fifth MeetUp of China HBase Technology Community — HBase Application and Development”. IT big said as the exclusive video partner, by the organizers and speakers review authorized release.

Read the words: 3315 | 9 minutes to read

Watch the full speech video and PPT, please click:
t.cn/E23igdc.

Abstract

This speech first introduces the status quo of Ping An Technology using HBase and the problems it has solved for users, and then how to ensure the efficiency and stability of HBase cluster.

HBase usage status of Ping An Technology

The usage status of HBase in China can be analyzed from the following two aspects. The first is the cluster size and data volume of HBase. The second is its application scenario. Currently, the HBase cluster consists of more than 300 physical units, and the data volume is about two P and two PB units.

What problems have been solved for users

In HBase applications, users must first store massive data and then pay attention to performance and reliability. The last possibility is the migration of data.

From the perspective of users, when they use traditional databases, they cannot predict the business application scenarios, so they cannot judge how much data they will face next. Therefore, you are advised to add data to HBase clusters. HBase supports online capacity expansion. Even if data increases rapidly in a certain period of time, HBase can be used for horizontal capacity expansion to meet requirements.

When using a traditional DB database, you will encounter many problems in maintenance and expansion. If you migrate to HBase, expansion and maintenance will be convenient.

Client optimization

Performance and high availability are also important issues for users. Performance is mainly related to the invocation of HBase clusters by applications.

The previous figure lists several common optimization solutions. The scan operation is based on the application layer. In this case, after a client sends a request to HBase, data is returned through multiple RPC requests. In this regard, if a large amount of data is requested, you can adjust the parameters to reduce RPC interaction and thus reduce time consumption.

Another optimization point is about GET. HBase can get the entire data at one time or perform batch GET operations. It is generally recommended to use get in batches. The principle is to reduce the number of RPC interactions.

Next comes column clustering and column optimization. The same column cluster data in HBase is stored in a directory. Data in different column clusters is stored separately. When multiple column clusters are retrieved, the index is retrieved independently if the key is used only and no column cluster is specified. In this case, compared with the specified column cluster retrieval, the efficiency is lower, that is, the more column clusters, the greater the influence will be.

The fourth is to disable caching. When we write data, if the client suddenly loads a large amount of data without disabling caching, hot data may be squeezed out.

When other services search for HBase, they need to load the HBase again, which causes a delay.

Server level optimization

Several common optimization methods are listed at the server level. There are two methods for balancing operations in HBase. One is using balance_switch, which is followed by a parameter. If the value is true, automatic balancing is enabled. If false is specified, the current auto-balancing is turned off.

The other method is balancer, which may require manual execution. For example, the HBase node is restarted after hanging, and the Region is not balanced within the interval. In another case, after a new HBase node is added, the Region is not balanced. If balance_switch does not work, force it to be balanced manually.

The second optimization is in Blockce, when the cache hit ratio is not high, you can turn on external memory and then increase its hit ratio, which is also good for GC.

The third operation is Compaction, which guarantees that data is locally unique. In real application scenarios, it is advisable to avoid automatically executing a Compaction that affects cluster I/OS, affecting read and write operations of applications. So we need to change to manual definition execution. Run a Compaction operation over the weekend or when traffic is low.

When running a Compaction, there are two attributes that can be optimized. Since the number of threads is 1 by default, it takes longer when there is a large amount of data. Parameters can be adjusted to speed Compaction based on the size of the cluster or the impact of the cluster application.

Another optimization point may be reliability, which users are concerned about. HBase is a highly available cluster that can perform active/standby switchover. Therefore, there is no single point of failure. After the master hangs, the role can be switched to the BackUpMaster immediately. The BackUpMaster changes the role status to available and provides services externally.

Data migration

There are several scenarios for data migration. Migrating Hive data between HBase clusters or migrating Hive data to HBase.

In the first case, since the data format is the same between the two clusters, you can migrate directly using distcp. Here, because mapReduce is used, you specify the queue name.

Note the following four items during migration.

If YARN is enabled, distcp uses Mapreduce to transfer data. Therefore, ensure that cluster resources are available before migration.
Firewall: The ports between two HBase clusters must be able to access Telnet, such as NN and DN ports.
Use HBase Hbck to restore metadata information

The above example shows an example of a cross-cluster migration. The reason for this problem is that the HDFS file is not closed and is being written, and distcp checks the file length each time. If the file is closed, this exception occurs.

In this case, we can first detect the state of the file, then close the file and re-migrate the data. The shutdown may fail due to exceptions. You can repeat the shutdown until it succeeds.

There are two methods to migrate Hive data to HBase. In the first method, generate HFile files in cluster A without writing codes, use distcp to migrate HFile files to cluster B, and import data to HBase tables using HBase BulkLoad.

Another advanced method is to use API interfaces to directly migrate data in the form of BulkLoad and application programs.

How do I ensure the efficiency and stability of an HBase cluster

To ensure the efficiency and stability of an HBase cluster, a monitoring system and repair mechanism are necessary. In essence, some special processes are required.

Let’s start with the monitoring system. After collecting all HBase indicators, you can know the health status of the entire HBase cluster. You can use the decoding interface provided by RegionServer to collect indicators on HBase nodes and draw core indicators.

As for the repair mechanism, the monitoring system and the repair system need to unite. The monitoring system finds problems and feeds back problems, and then the repair system automatically fixes them, such as cluster process availability, existence, load balancing repair, etc.

Permanent RIT is commonly encountered in HBase. In most cases, RIT is instantaneous. However, in some cases, RIT enters the permanent RIT state.

To solve this problem, let’s take a look at an example. For MERGING Region operations in this case, RIT continued to show MERGING NEW state. By checking HBase JIRA, we found that this was a BUG triggered by hbase-17682, and a patch was needed to fix it.

For MERGING requests, the client initiates a MERGING request, and the Master organizes two regions on a RegionServer for MERGING. Before MERGING, it generates an initialized state called MERGING NEW, which is stored in the master’s memory.

Outer-merging NEW state for outer-merging master (outer-merging NEW state)

That’s all for today’s sharing. Thank you!

Editor: IT big guy said, reprint please indicate copyright and source

In the face of massive data storage, how to ensure the efficiency and stability of HBase clusters

Abstract

HBase usage status of Ping An Technology

What problems have been solved for users

Client optimization

Server level optimization

Data migration

How do I ensure the efficiency and stability of an HBase cluster

Related Posts

Spring-boot-route (4) Handling global exceptions

Kqueue and Epoll for advanced IO models

(2) What does new NioEventLoopGroup(nThreads) do