In HBase, data logically looks like this:

The differences between HBase and MySQL are as follows:

  1. Group different columns under the same column family
  2. Supports multiple versions of data

It doesn’t feel that much different. What problems does it solve with MySQL? Every new thing appears to solve the existing problems.

  1. Write friendly, support asynchronous mass concurrent write
  2. Columns can be added dynamically
  3. Data is stored in columns. Non-existent columns do not fall into disks, saving space. The content that does not exist in MySQL is also filled with NULL
  4. Support distributed storage of massive data (BigTable was originally proposed by Google to solve the problem of data storage)
  5. , etc.

So how does he solve these problems? How is his data stored?

Physical structure of HBase data

Before introducing the physical structure, a brief mention of the LSM tree

LSM tree

Like the B+ tree used by MySQL, it is an index structure for disk data. B+ trees are a read-friendly storage structure, but when it comes to writing a lot of data, such as logs, because of the randomization involved, it’s not enough.

LSM trees are proposed for this kind of write-heavy scenario. Its Chinese name is: Log structure Merging Tree. Files store changes to data. Data appends but does not modify the original data. Is sequential write operation.

However, if all operations are written sequentially, the data is read without any basis and requires scanning all operations to be read. LSM trees start by maintaining a small, ordered piece of data in memory (memory does not have a problem with random reads and writes), and when that piece of data exceeds a certain size, it is put into disk in its entirety.

In this way, there are many ordered files on the disk, but a large number of small files need to be searched in sequence when reading data, which reduces the read performance. In this case, it is necessary to merge multiple small files into one file to optimize the read performance.

At this point, that’s basically the whole idea of LSM trees.

  1. Maintains an ordered data in memory
  2. Push data in memory to disk
  3. Merge multiple ordered files on the disk to form a larger ordered file

HBase storage

In HBase, LSM trees are used to store data. Each piece of data is an operation record. Parts of HBase implementation are as follows.

Implementation of memory ordered structure

An ordered structure in memory is maintained through a hop table. When a hop table is full, new writes are blocked and pushed to disk, while a new data structure is opened to receive new operation requests.

The contents of each piece of data

Store a KV key pair, where V is the value we write, and this key consists of the following parts:

  • row key
  • Column family
  • The column name
  • The time stamp
  • Operation types: Put, Delete, DeleteColumn, DeleteFamily, and so on

The entire list is a sequential list of keys. Its sorting rules are as follows:

  1. The row key is small in the front
  2. Compares column families with row keys
  3. Compare column names in the same column family
  4. Compare the timestamp with the column name, the larger timestamp comes first.

When a row key is read in this order, the latest version of the data is retrieved first. If a delete operation is performed, the latest data is empty even if there is subsequent data.

Structure of disk files

It consists of three parts:

  1. Header information: store file size, number of file blocks, index location, index size, etc

  2. Index data: Users index all data blocks in a file. Each data block contains one index data

    • The last piece of data in a block. Binary search for an index to quickly locate the specified data block
    • Location of a data block in a file
    • Size of a data block
    • Bloom filter. Users quickly filter non-existent data blocks during scanning
  3. Data block, which stores every KV data.

According to this structure, the user performs the following operations for each file when performing a specified row_key read:

  1. Load index data based on header information content
  2. Find which block of data is row_key under by binary lookup
  3. Filter out non-existent data blocks according to bloom filter to speed up reading
  4. Finds the specified data block based on the location and size of the data block

HBase data is stored in column families

A quick review of downlink and column storage.

Line storage

Row storage: Stores a row of data together until the next row is written. Take a typical MySQL example.

Row storage is faster when reading a row of data, but when reading a column of data, the entire row needs to be read into memory for filtering.

The column type storage

The corresponding to row storage is column storage, which stores one column of data together and stores data in different columns separately.

Column storage is friendly for reading only one column, but in contrast, if you want to read multiple columns, you need to read multiple times and merge them.

Column family storage

HBase uses a compromise solution that stores column families together and stores different column families separately.

In other words, if a table has multiple column families with only one column under each column family, it is equivalent to column storage

If a table has only one column family with multiple columns, it is equivalent to row storage.

HBase allocates data in the same column family of a table to the same Region, which is allocated to a regionServer in the cluster. All regions are stored in the hbase: Meta table. The table structure is as follows:

The meanings of different columns in the table are as follows:

  • Row_key is concatenated (comma) from the following fields

    • The name of the table
    • Starting row_key
    • Create timestamp
    • Md5 of the above three fields
  • Info: RegionInfo Stores the following data (JSON)

    • STARTKEY: starting row_key
    • End row_key ENDKEY:
    • NAME: region NAME
    • ENCODED: Not sure what it is
  • Info :seqnumDuringOpen Indicates the Online duration of regionServer

  • Info: regionServer on which the Server resides

  • Info: Start time of serverStartCode regionServer

  • , etc.

conclusion

I have a brief understanding of HBase data disk format, and also explain a lot of doubts about HBase, such as:

  1. Why only row key index queries are supported

    • Because the entire file is sorted by row key
  2. Why is the read efficiency lower than MySQL

    • Because you have to read the files in turn to find them
  3. Why efficient writes are supported

    • Because it’s all sequential reads and writes
  4. How do I set the HBase column family

    • Put those read from the same scene into the same column family, and those read from different scenes into different column family
  5. , etc.