The HBase Doesn’t Sleep Book is an HBase technical book that makes people not fall asleep after reading. It is very good. In order to deepen my memory, I decided to organize important parts of the book into reading notes for later reference and hope to bring some help to students who are just learning HBase.

directory

  • Chapter 1 – Getting to know HBase
  • Chapter 2 – Get HBase Running
  • Chapter 3: HBase Basic Operations
  • Chapter 4 – Getting started with the Client API
  • Chapter 5 – HBase Internal Exploration
  • Chapter 6 – Advanced usage of the client API
  • Chapter 7 – Client API management capabilities
  • Chapter 8 – Faster
  • Chapter 9 – When HBase Meets MapReduce

This article is a bit longer and will require some patience. Firstly, this paper reviews the HBase data model and data hierarchy structure, and expounds the function and framework of each data hierarchy in detail. Then the detailed process of data writing and reading is introduced. Finally, the evolution of Region search from the old version to the new version is described.

I. Data model

1. Review of important concepts

  • Namespace: Allocates multiple tables to a group for unified management.
  • Table: A Table consists of one or more column families; Data attributes such as timeout (TTL) and COMPRESSION algorithm (COMPRESSION) are defined in the column family definition. After the column family is defined, the table is empty. The table has no data until rows are added.
  • Row: A Row contains multiple columns, which are sorted by column families; The column family of the data in the row can only be selected from the column family defined in the table. Because HBase is a column database, data in a row can be distributed on different servers.
  • Column Family: A Column Family is a collection of multiple columns. HBase tries to put columns of the same Column Family on the same server to improve access performance and manage associated columns in batches. All data attributes are defined on the column family; In HBase, a table defines column families rather than columns.
  • Column QualifierColumn: More than one column makes up a row. Column families and columns are often usedColumn Family: Column QualifierColumns can be defined at will. There is no limit to the number of columns in a row.
  • A Cell: A column can store multiple versions of data, and each version is called a Cell. Therefore, the concept of a Cell in HBase is different from that of a traditional relational database. Data in HBase is more fine-grained than traditional data structures. Data in the same location is divided into multiple versions.
  • Timestamp (Timestamp/version number) : This can be called either a Timestamp or a version number because it is used to calibrate the version numbers of multiple cells in the same column. If the version number is not specified, the system automatically uses the current timestamp as the version number. When a number is manually defined as a version number, this Timestamp really means only a version number.

2. A few minor issues

Does HBase support table association?

The official answer is a flat “no”. If you want to implement the association between data, you have to do it yourself, which is the price you have to pay for picking a NoSQL database.

Does HBase support ACID?

ACID is an acronym for Atomicity, Consistency, Isolation, and persistence. ACID is a guarantee that transactions execute correctly and is partially supported by HBase.

What are table namespaces used for?

Table namespaces are mainly used to group tables, so what is the use of grouping tables? The namespace can make up for the shortcoming that HBase cannot be divided into databases on an instance. The namespace can be used to group tables like a relational database and configure different environments for different groups, such as quota management and security management.

Two reserved tablespaces in HBase are predefined:

  • Hbase: system tablespace used to organize hbase internal tables.
  • Default: Tables that do not have a defined tablespace are automatically allocated to this tablespace.

HBase data storage mode

1. Architecture review

An HBase cluster consists of one Master (or HighAvailable) and multiple RegionServers.

  • Master: Allocates regions to RegionServer during RegionServer startup and performs various management operations, such as Region splitting and Region merging. The Master role in HBase is much weaker than that of other types of clusters. The Master role in HBase is special because it has nothing to do with data reading and writing. The service system still runs when it is suspended. Of course, Master should not be down for long, as many necessary operations such as creating tables, changing column family configurations, and more importantly, splitting and merging require it.
  • RegionServer: RegionServer has one or more regions on which data is stored. If your HBase is based on the HDFS (a standalone HBase can be based on local disks), all data access operations of the Region are implemented by invoking the HDFS client interface.
  • Region: Part of a table data. HBase is a database that is automatically sharded. A Region is equivalent to a partition of a partitioned table in a relational database or a fragment of MongoDB.
  • HDFS: HBase interacts with the HDFS rather than the hard disks of the server. Therefore, THE HDFS is the data carrier.
  • ZooKeeper: ZooKeeper is more important than the Master in HBase. If the Master is disabled, the service system still runs. But when ZooKeeper is turned off, data cannot be read because metadata tables are required to read datahbase:meataIs stored on ZooKeeper.

2. RegionServer internal architecture

A RegionServer contains:

  • WAL: a write-ahead Log. WAL is short for write-ahead Log. When an operation reaches a Region, HBase writes the operation to the WAL first. HBase stores the data in the Memstore based on memory implementation. When the data reaches a certain amount, the data is flushed to the HFile. If the server goes down or loses power in the process, the data is lost. WAL is an insurance mechanism where data is written to WAL before it is written to Memstore so that data can be recovered from WAL in case of failure recovery.
  • Multiple regions: A Region is a data fragment. Each Region has a start rowkey and an end Rowkey, representing the range of rows it stores.

3. Region internal architecture

Each Region contains multiple Store instances. A Store corresponds to the data of a column family. If a table has two column families, a Region contains two stores, including MemStore and HFile.

4. Pre-write Logging (WAL)

Write-ahead logs (WAL) are designed to solve the problem of operation recovery after downtime. When data arrives at a Region, WAL is first written to the Region and then loaded into the Memstore. Even if the Region machine goes down, Because WAL data is stored in the HDFS, data is not lost.

WAL is enabled by default and can be turned off by using the following code.

Mutation.setDurability(Durability.SKIP_WAL);
Copy the code

Put, Append, Increment, Delete are all subclasses of Mutation, so they all have setBug methods that make their data operation a bit faster, but it’s best not to do this as data will be lost when the server goes down.

If you really want to improve performance by disabling WAL at all costs, you can choose to write WAL asynchronously.

Mutation.setDurability(Durability.ASYNC_WAL);
Copy the code

Region after this set will wait for conditions to meet the operation when writing a WAL, mentioned here mainly refers to the condition of interval hbase. Regionserver. Optionallogflushinterval, The interval indicates the interval at which HBase writes operations from memory to WAL. The default value is 1s.

If your system has high performance requirements and low data consistency requirements, and performance bottlenecks are on WAL, you can consider using asynchronous WAL writes. Otherwise, use the default configuration.

5. WAL scrolling

WAL is a circular, rolling log structure because it provides the highest write performance and ensures that the space does not continue to grow.

The inspection interval WAL by hbase. Regionserver. Logroll. Period definition, the default value is 1 h. The check is to compare the current WAL operations to those actually persisted to HDFS to see which operations have been persisted, and the persisted operations will be moved to the.oldlogs folder (which is also in HDFS).

A contains multiple WAL WAL instance file, the maximum number of WAL file by hbase. Regionserver. Maxlogs (default is 32) parameters to define.

Other conditions that trigger scrolling are:

  • When the WAL file Block is almost full;
  • When WAL takes up more space than or equal to a threshold, the threshold is calculated as follows:hbase.regionserver.hlog.blocksize * hbase.regionserver.logroll.multiplier;
  • hbase.regionserver.hlog.blocksizeIf you are based on HDFS, you only need to set this value to HDFS Block size.
  • hbase.regionserver.logroll.multiplierIs a percentage. The default is 0.95, which means 95%. WAL files will be archived if their size is greater than or equal to 95% of the block size.oldlogsInside the folder.

After WAL files are created, they are stored in /hbase/.log (all paths mentioned here are based on HDFS). Once WAL files are determined to be archived, they are moved to /hbase/.oldlogs. The Master is responsible for periodically cleaning up the.oldlogs folder, as long as “there are no references to the WAL file”. There are currently two types of services that may reference WAL files:

  • TTL process: This process keeps WAL files alive until they are reachedhbase.master.logcleaner.ttlThe defined timeout period (default: 10 minutes).
  • Replication: If the HBase backup mechanism is enabled, HBase deletes a WAL file only after ensuring that it is no longer needed in the backup cluster. Replication is not the number of backups of files, but a feature added in version 0.90 for backing up data from one cluster to another in real time.

6. Internal structure of Store

There are two important components in a Store:

  • MemStore: There is one MemStore instance in each Store, and data is put into MemStore after being written to WAL. MemStore is a memory storage object. Data is flushed into HFile only when MemStore is full.
  • HFile: There are multiple hfiles in the Store. When the MemStore is full, HBase generates a new HFile in the HDFS and writes the MemStore contents to this HFile. HFile is a data storage entity that directly deals with HDFS.

WAL is stored in HDFS, Memstore is stored in memory, and HFile is stored in HDFS. Data is first written to WAL, then stored in Memstore, and finally persisted to HFile. Data has been stored in HDFS once before entering HFile. Why is it needed to be stored in Memstore?

This is because files in HDFS can only be created, appended, and deleted, but cannot be modified. For a database, it is very important to store data in order to ensure performance, so we cannot write data to the hard disk in the order it arrives.

You can use memory to sort out the data and store them in sequence, and then write them to the hard disk together. This is the meaning of Memstore. Although Memstore is stored in memory and HFile and WAL are stored in HDFS, data is written to WAL before writing to Memstore. Therefore, increasing the size of Memstore does not speed up writing. Memstore exists to maintain rowkey order, not to cache data.

7, MemStore

MemStore was designed for the following reasons:

  • Files in the HDFS cannot be modified. To sequentially store data to improve read efficiency, HBase uses the LSM tree structure to store data. Data is first sorted into LSM trees in the Memstore and then written to hfiles.
  • Optimize data storage, such as deleting data immediately after it is added, so that data is not written to HDFS when it is written.

But don’t assume that reading Memstore first and then disk! When reading data, there is a special cache called BlockCache. If the BlockCache is enabled, the data is read from the BlockCache first. If the BlockCache is not enabled, the data is read from the HFile+Memstore.

8, HFile (StoreFile)

HFile is the actual carrier of data storage. All tables and columns created by us are stored in HFile. HFile consists of blocks. The default size of a block in HBase is 64KB, which is defined by the BLOCKSIZE attribute on the column family. These blocks distinguish between different characters:

  • Data: indicates a Data block. Each HFile has multiple Data blocks, which are stored in HBase tables. Data blocks are optional, but it is almost impossible to see hfiles that do not contain Data blocks.
  • Meta: metadata block. The Meta block is optional and will only be written when the file is closed. Meta blocks store metadata information of the HFile. Bloom Filter information before V2 is stored directly in Meta, and it is separated and stored separately after V2.
  • FileInfo: File information, which is also a data storage block. FileInfo is a necessary part of HFile. It is optional. It is written only when the file is closed and stores information about the file, such as the LastKey (LastKey), the average Key length (AvgKeyLen), etc.
  • DataIndex: block file that stores Data block index information. The information of the index is actually the offset value of the Data block. The DataIndex is also optional. The DataIndex is only available if there is a Data block.
  • MetaIndex: block file that stores Meta block index information. The MetaIndex block is also optional.
  • Trailer: Mandatory, which stores offsets of FileInfo, DataIndex, and MetaIndex blocks.

In fact, it is not wrong to call HFile or StoreFile. In physical storage, we call the file written by MemStore HFile, and StoreFile is just an abstract class of HFile.

9. Data Blocks

The first part of a Data block stores the type of the block, followed by multiple key-value pairs, that is, the implementation class of the Cell. Cell is an interface, and KeyValue is its implementation class.

10 and KeyValue class

The last part of a KeyValue class is the Value that stores the data, and the first part is the metadata information associated with the cell. If you store a small value, then most of the space in the cell will be the metadata for rowkey, column family, column, etc., so if everyone’s column family and column name is long, most of the space will be used to store that data.

However, if the appropriate compression algorithm can be used to greatly save the storage space of column family, column information, so in practical use, you can specify the compression algorithm to compress the metadata. However, compression and decompression inevitably bring performance loss, so the use of compression also needs to be based on the actual situation. If your data is mostly archived and doesn’t require much read/write performance, then compression algorithms are for you.

Third, add, delete, check and change the real face

HBase is a database that can be read or written randomly. However, HDFS, the persistence layer on which HBase is based, cannot be modified. So how does HBase implement our add, delete, check and change? The truth is that HBase is almost always doing new operations.

  • When you add a cell, HBase adds data to the HDFS.
  • When you modify a cell, HBase adds data to HDFS, but the version is larger than the previous one (or as you define it).
  • When you delete a cell, HBase still adds data! As a Tombstone, there is no value. The type is DELETE.

Due to the accumulation of a lot of operations in the use of the database, the continuity and sequence of data will be destroyed. To improve performance, HBase compacts HFile files every few times.

When a major compaction occurs, multiple Hfiles are consolidated into one HFile. When a new compaction occurs, Once a record marked with a tombstone is detected, it will be ignored during the merge process, so that the record will not be in the newly generated HFile, which is literally deleted.

4. HBase data structure summary

The internal structure of HBase data is as follows:

  • A RegionServer contains multiple regions. The partitioning rule is as follows: A key segment of a table generates a Region on a RegionServer. However, if the amount of data in a row is too large, HBase splits the Region into different machines based on the column family.
  • A Region contains multiple stores. The division rules are as follows: A column family is divided into one Store. If a table has only one column family, the table has only one Store in each Region on the machine.
  • There is only one Memstore in a Store;
  • There are multiple hfiles in a Store, and every Memstore flush produces a new HFile.

Write and read the KeyValue

1, write

A KeyValue is persisted to HDFS as follows:

  • WAL: Data is written to WAL as soon as it is sent. WAL is implemented based on HDFS, so it can be said that cells have been persisted. However, WAL is only a temporary log, which is store-insensitive, and the data cannot be read and used directly.
  • Memstore: The data will be immediately put into Memstore for sorting. Memstore will be responsible for storing the data according to the structure of LSM tree. This process is just like the sorting process of cards in hand after we catch them when playing cards.
  • HFile: Finally, when the size of Memstore reaches the threshold or the threshold of write interval is reached, HBaes writes the contents of the Memstore to the HDFS, which is called an HFile file stored on the hard disk. At this point, we can say that the data is truly persisted to the hard disk, even if the downtime, power outage, data will not be lost.

2, read

Because of MemStore (based on memory) and HFile (based on HDFS), you will immediately want to read MemStore first, and if you can’t find it, then go to HFile. This is an obvious mechanism, but it’s not how HBase handles reads. The actual read sequence is to search for data from BlockCache first, and then search for data from Memstore and HFile if no data is found.

The tombstone marker is not in the same place as the data, so how do you know the data is to be deleted when you read it? If the data is read before its tombstone marker, at that point in time it is not known that the data will be deleted, and only when the scanner reads down and reaches the tombstone marker will it know that the data is marked deleted and does not need to be returned to the user.

Therefore, after the HBase Scan operation obtains all required row key information, the Scan operation continues until the scanned data exceeds the specified threshold. In this way, the HBase Scan operation can know which data should be returned to users and which data should be discarded. So you can’t reduce the number of Scan rows by increasing the filter criteria, but you can significantly speed up the Scan by narrowing the row keys between STARTROW and ENDROW.

During Scan, Store will create StoreScanner instance. StoreScanner will Scan MemStore and HFile together. Therefore, whether to read data from MemStore or HFile, None of the external callers need to know the details. When StoreScanner is opened, it is positioned on the STARTROW key (STARTROW) and starts scanning down.

The red blocks are all the data belonging to the specified row. Only after scanning all qualified Storescanners will the data be returned to the user.

6. Region location

Region lookup, an early design (0.96.0) was previously known as a three-tier query architecture:

  • Region: Region where the data to be searched resides.
  • .META.: is a metadata table that stores brief information about all regions,.META.A row in the table is a Region. The row records the start and end rows of the Region, and the connection information of the Region. In this way, the client can determine which Region the data is located in.
  • -ROOT-: is a sheet of storage.META.Table table,.META.There could be a lot of them, and-ROOT-It’s stored.META.Information about which Region the table is in (.META.The table is also a normal table, also on Region. Up to 17.1 billion regions can be supported with two-tier scaling.

-ROOT- The table is recorded on ZooKeeper in the /hbase/root-region-server directory. The Client’s process for finding data looks like this from a macro perspective:

  • You can search for the ZooKeeper/hbase/root-regionserverNode to know-ROOT-Table on which RegionServer;
  • access-ROOT-Table to see where the required data is.META.On the chart, this.META.Table on which RegionServer;
  • access.META.Table to see which Region the row key is in;
  • Connect to the RegionServer where the specific data resides, and this time the row is actually scanned.

Since version 0.96, the layer 3 query architecture has been changed to layer 2 query architecture. The -root – table has been removed, and the /hbase/root-region-server in zK has been removed. The RegionServer information of the table is stored in /hbase/ Meta-region-Server in the ZK. Namespace was later introduced and the.meta. Table was changed to hbase: META.

New Region lookup process:

  • The client passes ZooKeeper first/hbase/meta-region-serverWhich RegionServer does the node query exist onhbase:metaTable.
  • The client connection containshbase:metaRegionServer table,hbase:metaThe table stores the rowkey ranges of all regions. You can query the Region to which the rowkey belongs and which RegionServer the Region belongs to.
  • After obtaining the information, the client can directly connect to one of the RegionServers that has the RowKey to access and operate on it.
  • The client caches the meta information so that the next operation does not need to be loadedhbase:metaStep by step.


Any Code, Code Any!

Scan code to pay attention to “AnyCode”, programming road, together forward.