“This is the 32nd day of my participation in the November Gwen Challenge. See details of the event: The Last Gwen Challenge 2021”.

1. HBase Overview

1.1 HBase Definition

HBase is a distributed and scalable NoSQL database that supports massive data storage.

1.2 HBase data model

Logically, the HBase data model is similar to a relational database. Data is stored in a table with rows and columns. However, from the perspective of the underlying physical storage structure (K-V), HBase is more like a multi-dimensional map.

1.2.1 HBase logical structure

1.2.2 HBase Physical Storage Structure

1.2.3. Data Model

Name Space

Namespaces, similar to the DatabBase concept for relational databases, have multiple tables under each namespace. HBase has two built-in namespaces, HBase and default. HBase stores built-in HBase tables. The default table is the default namespace used by users.
Region

A table concept similar to a relational database. The difference is that HBase only needs to declare column families when defining tables. This means that fields can be specified dynamically and on demand when data is written to HBase. Therefore, HBase can handle field change scenarios more easily than relational databases.
Row

Each row of data in an HBase table consists of one RowKey and multiple columns. Data is stored in the dictionary order of rowkeys and can only be retrieved based on rowkeys during data query. Therefore, RowKey design is very important.
Column

Each Column in HBase is qualified by Column Family and Column Qualifier, such as info: name and info: age. When you build a table, you only need to specify the column family, and column qualifiers do not need to be defined beforehand.
Time Stamp

This field identifies different versions of data. If no timestamp is specified when data is written, the system automatically adds this field to the data. The value is the time when HBase is written.
Cell

Uniquely identified by {Rowkey, Column Family: Column Qualifier, time Stamp}. The data in a cell is untyped and stored in bytecode form.

1.3 HBase Basic architecture

Architectural roles:

Region Server

RegionServer Is the Region manager and its implementation class is HRegionServer. It provides the following functions:

Operations on data: get, PUT, delete;

Operations on regions include splitRegion and compactRegion.
Master

The Master is the administrator of all Region servers. Its implementation class is HMaster and its functions are as follows:

Operations on tables: create, delete, alter

RegionServer operations: Allocate Regions to each RegionServer and monitor each RegionServer

State, load balancing and failover.
Zookeeper

HBase uses Zookeeper to implement high availability (HA) of the Master, RegionServer monitoring, metadata entry, and cluster configuration maintenance.
HDFS

HDFS provides basic data storage services for HBase and high availability support for HBase.

2. HBase Quick Start

2.1 HBase Installation and deployment

2.1.1 Zookeeper is deployed properly

Ensure the normal deployment of the Zookeeper cluster and start it:

[moe@hadoop102 ~]$ zk.sh start
Copy the code

2.1.2 Hadoop is deployed normally

Hadoop cluster deployment and startup:

[moe@hadoop102 ~]$ myhadoop.sh start
Copy the code

2.1.3 HBase decompression

Decompress Hbase to a specified directory:

[moe@hadoop102 ~]$tar -zxvf /opt/software/hbase-1.3.1-bin.tar.gz -c /opt/module/ [moe@hadoop102 module]$mv Hbase - 1.3.1 / hbaseCopy the code

2.1.4 HBase configuration file

Modify HBase configuration files.

Hbase-env. sh Modified contents:

exportJAVA_HOME = / opt/module/jdk1.8.0 _212export HBASE_MANAGES_ZK=false
Copy the code

Hbase-site.xml modified contents:

<configuration>

        <property>
                <name>hbase.rootdir</name>
                <value>hdfs://hadoop102:8020/HBase</value>
        </property>

        <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>
        </property>

        <! -- new change after 0.98, not in previous version.
        <property>
                <name>hbase.master.port</name>
                <value>16000</value>
        </property>

        <property> 
                <name>hbase.zookeeper.quorum</name>
                <value>hadoop102,hadoop103,hadoop104</value>
        </property>

        <property> 
                <name>hbase.zookeeper.property.dataDir</name>
                <value>/ opt/module/zookeeper - 3.5.7 / zkData</value>
        </property>

</configuration>
Copy the code

Regionservers:

hadoop102
hadoop103
hadoop104
Copy the code

Soft connection hadoop configuration file to HBase:

[MOE @ hadoop102 ~] $ln -s/opt/module/hadoop - 3.1.3 / etc/hadoop/core - site. XML/opt/module/hbase/conf/core - site. XML [MOE @ hadoop102 ~] $ln -s/opt/module/hadoop - 3.1.3 / etc/hadoop/HDFS - site. XML/opt/module/hbase/conf/HDFS - site. XMLCopy the code

2.1.5. HBase is remotely sent to another cluster

[moe@hadoop102 module]$ xsync hbase/
Copy the code

2.1.6 Starting the HBase service

Startup Mode 1

bin/hbase-daemon.sh start master
bin/hbase-daemon.sh start regionserver
Copy the code

Note: If the time of nodes between clusters is not synchronized, RegionServer cannot be started and a ClockOutOfSyncException is thrown.

Repair tips:

A. Time synchronization service

B, attributes: hbase. Master. Maxclockskew set a larger value

<property>
 <name>hbase.master.maxclockskew</name>
 <value>180000</value>
 <description>Time difference of regionserver from master</description>
</property>
Copy the code

Startup Mode 2

bin/start-hbase.sh
Copy the code

Corresponding service stop:

bin/stop-hbase.sh
Copy the code

2.1.7. View the HBase page

After the HBase management page is successfully started, you can use host:port to access the HBase management page. For example:

http://hadoop102:16010

2.2 HBase Shell Operations

2.2.1 Basic Operations

The cli of the HBase client is displayed
```
bin/hbase shell
Copy the code
```
Viewing Help Commands
```
hbase(main):001:0> help
Copy the code
```
View which tables are present in the current database
```
hbase(main):002:0> list
Copy the code
```

2.2.2 Table operations

Create a table

hbase(main):003:0> create 'student','info'
Copy the code

Insert data into the table

put 'student','1001','info:sex','male' put 'student','1001','info:age','18' put 'student','1002','info:name','Janna' put  'student','1002','info:sex','female' put 'student','1002','info:age','20'Copy the code

Scan to view table data

hbase(main):009:0> scan 'student'
Copy the code

View table structure

hbase(main):010:0> describe 'student'
Copy the code

Updates the data for the specified field

hbase(main):011:0> put 'student','1001','info:name','Nick'
hbase(main):012:0> put 'student','1001','info:age','100'
Copy the code

View data for Specified Row or Specified Column Family: Column

hbase(main):001:0> get 'student','1001'
Copy the code

hbase(main):002:0> get 'student','1001','info:name'
Copy the code

Number of rows of statistics table data

hbase(main):003:0> count 'student'
Copy the code

Delete the data

Delete all data for a rowkey:

hbase(main):004:0> deleteall 'student','1001'
Copy the code

Delete a column from a rowkey:

hbase(main):007:0> delete 'student','1002','info:sex'
Copy the code

Clear table data
```
hbase(main):010:0> truncate 'student'
Copy the code
```
Tip: Clear the table in the disable sequence and then truncate sequence.
Delete table
- First, we need to set the table to disable:
```
hbase(main):014:0> disable 'student'
Copy the code
```
  ERROR: Table student is enabled. Disable it first.
- Then you can drop the table:
```
hbase(main):013:0> drop 'student'
Copy the code
```

3. HBase Advanced

3.1. Architecture principle

StoreFile

Physical files for storing actual data. StoreFile is stored in the HDFS as hfiles. Each Store will have one or more storeFiles (hFiles), and the data is ordered in each StoreFile.
MemStore

Write cache: The data in HFile is ordered, so the data is stored in MemStore first. After sorting, the data will be written to HFile when the time to brush is reached. Each brush will form a new HFile.
WAL

Data can be written to HFile only after sorting by MemStore, but saving data in memory has a high probability of data loss. To solve this problem, data will be written to a file called write-Ahead logfile before being written to MemStore. So in the event of a system failure, data can be reconstructed from this log file.

3.2. Write process

Writing process:

The Client accesses ZooKeeper and obtains hbase: Region Server on which the meta table resides.
Access the corresponding Region Server, obtain the hbase: Meta table, and query the Region in which the target data resides based on the namespace: Table/Rowkey of the read request. Region information of the table and meta table location information are cached in meta Cache of the client for next access.
Communicates with the target Region Server.
Write (append) data sequentially to WAL;
Write the data into the corresponding MemStore, and the data will be sorted in the MemStore.
Send an ACK to the client;
When the MemStore brush time is reached, the data will be written to HFile.

3.3. Read process

Reading process:

The Client accesses ZooKeeper and obtains hbase: Region Server on which the meta table resides.
Access the corresponding Region Server, obtain the hbase: Meta table, and query the Region in which the target data resides based on the namespace: Table/Rowkey of the read request. Region information of the table and meta table location information are cached in meta Cache of the client for next access.
Communicates with the target Region Server.
Query the target data in Block Cache, MemStore, and Store File respectively, and merge all the found data. All data here refers to different versions (time stamp) or types (Put/Delete) of the same data.
Data blocks (HFile data storage unit, default size: 64KB) queried from files are cached to Block Cache.
The final result of the merge is returned to the client.

3.4, StoreFile Compaction

Because memStore generates a new HFile every time it is flushed, and different versions (TIMESTAMP) and different types (Put/Delete) of the same field may be distributed in different Hfiles, all hfiles need to be traversed during query. To reduce the number of hfiles and clean up stale or deleted data, a StoreFile Compaction occurs.

There are two types of Compaction: Minor Compaction and Major Compaction. Minor Compaction consolidates several nearby smaller Hfiles into one larger HFile, but does not clean up expired or deleted data. A Major Compaction compacts all hfiles from a Store into a single HFile and wipes out expired or deleted data.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Big data HBase Learning Journey 1

1. HBase Overview

1.1 HBase Definition

1.2 HBase data model

1.2.1 HBase logical structure

1.2.2 HBase Physical Storage Structure

1.2.3. Data Model

1.3 HBase Basic architecture

2. HBase Quick Start

2.1 HBase Installation and deployment

2.1.1 Zookeeper is deployed properly

2.1.2 Hadoop is deployed normally

2.1.3 HBase decompression

2.1.4 HBase configuration file

2.1.5. HBase is remotely sent to another cluster

2.1.6 Starting the HBase service

2.1.7. View the HBase page

2.2 HBase Shell Operations

2.2.1 Basic Operations

2.2.2 Table operations

3. HBase Advanced

3.1. Architecture principle

3.2. Write process

3.3. Read process

3.4, StoreFile Compaction

3.5, the Region Split

Big data HBase Learning Journey 1

1. HBase Overview

1.1 HBase Definition

1.2 HBase data model

1.2.1 HBase logical structure

1.2.2 HBase Physical Storage Structure

1.2.3. Data Model

1.3 HBase Basic architecture

2. HBase Quick Start

2.1 HBase Installation and deployment

2.1.1 Zookeeper is deployed properly

2.1.2 Hadoop is deployed normally

2.1.3 HBase decompression

2.1.4 HBase configuration file

2.1.5. HBase is remotely sent to another cluster

2.1.6 Starting the HBase service

2.1.7. View the HBase page

2.2 HBase Shell Operations

2.2.1 Basic Operations

2.2.2 Table operations

3. HBase Advanced

3.1. Architecture principle

3.2. Write process

3.3. Read process

3.4, StoreFile Compaction

3.5, the Region Split

Related Posts

The Spring framework summarizes the “very detailed”, do not believe you can not learn!

An IDEA SAO operation: one click to generate a sequence diagram of a method

Spring Boot uploads files