1. Hbase definition

Hbase is a distributed and scalable NoSql database that supports massive data storage. HBase is a highly reliable, high-performance, column-oriented, and scalable distributed storage system

2. Hbase data model

Logically, the HBase data model is similar to a relational database. Data is stored in a table with rows and columns. However, from the perspective of the underlying physical storage structure (K-V), HBase is more like a multi-dimensional map.

1. Hbase logical architecture

2. Hbase physical storage structure

3. Data model

  1. NameSpace

    A NameSpace is similar to a relational database. Each NameSpace has multiple tables. By default, Hbase has two namespaces (except Hbase and default).

  2. Region

Similar to the concept of tables in relational databases, only column families are required when Hbase defines tables. This means that fields can be specified dynamically and on demand when data is written to Hbase. Therefore, HBase can handle field change scenarios more easily than relational databases.

  1. Row

Each row of data in an HBase table consists of one RowKey and multiple columns. Data is stored in the dictionary order of rowkeys and can only be retrieved based on rowkeys during data query. Therefore, RowKey design is very important.

  1. Column

    Each Column in HBase is qualified by Column Family and Column Qualifier, such as info: name and info: age. When you build a table, you only need to specify the column family, and column qualifiers do not need to be defined beforehand.

  2. Time Stamp

This field identifies different versions of data. If no timestamp is specified when data is written, the system automatically adds this field to the data. The value is the time when HBase is written.

3. Hbase basic architecture

Architecture role

  1. Region Server

RegionServer Is the Region manager and its implementation class is HRegionServer. It provides the following functions:

  1. Operations on data: get, PUT, delete;
  2. Operations on regions include splitRegion and compactRegion.
  1. Master

The Master is the administrator of all Region servers. Its implementation class is HMaster and its functions are as follows:

  1. Operations on tables: create, delete, alter
  2. RegionServer operations:
  • Assign Regions to each RegionServer,
  • Monitor the status of each RegionServer,
  • Load balancing and failover.
  1. Zookeeper

HBase uses Zookeeper

  1. Master high availability,
  2. RegionServer Monitoring,
  3. Metadata entry and maintenance of cluster configuration.

HBase information stored on the ZooKeeper node

  1. HDFS

HDFS provides basic data storage services for HBase and high availability support for HBase.

  1. client

It contains interfaces for accessing Hbase and maintains a cache to speed Hbase access. (For example, in the read/write process, the cache has cached. Mate metadata information.)

4. Role in HBase

  1. HMaster

function

  1. Monitor the RegionServer
  2. Process RegionServer failover
  3. Handle changes to metadata
  4. Process region allocation or transfer
  5. Load balancing of data in idle time
  6. Zookeeper publishes its location to the client
  1. RegionServer

function

  1. Stores actual HBase data
  2. Processes the Region assigned to it
  3. Refresh the cache to HDFS
  4. Maintain Hlog
  5. compression
  6. Process Region fragments
  1. Write-Ahead logs(WAL)

HBase modification records: When data is read or written to HBase, data is stored in the memory for a period of time (the time and data volume threshold can be set) rather than written to disks. However, keeping data in memory may have a higher probability of causing data loss. To solve this problem, data is written to a file called write-Ahead logfile before being written to memory. So in the event of a system failure, data can be reconstructed from this log file.

  1. Region

Hbase table fragments. Hbase tables are divided into different regions based on RowKey values and stored in RegionServer. A RegionServer can have multiple regions.

  1. store

HFile is stored in Store. A Store corresponds to a Column Family (Column cluster, Column Family) in an HBase table.

  1. MemStore

The memory store, which is in memory, is used to hold the current data operation, so when the data is stored in WAL, RegsionServer stores key-value pairs in memory.

  1. HFile

This is the actual physical file that holds the raw data on disk, the actual storage file. StoreFile is stored in HDFS as Hfile.