“This is the 32nd day of my participation in the November Gwen Challenge. See details of the event: The Last Gwen Challenge 2021”.
1. HBase Overview
1.1 HBase Definition
HBase is a distributed and scalable NoSQL database that supports massive data storage.
1.2 HBase data model
Logically, the HBase data model is similar to a relational database. Data is stored in a table with rows and columns. However, from the perspective of the underlying physical storage structure (K-V), HBase is more like a multi-dimensional map.
1.2.1 HBase logical structure
1.2.2 HBase Physical Storage Structure
1.2.3. Data Model
-
Name Space
Namespaces, similar to the DatabBase concept for relational databases, have multiple tables under each namespace. HBase has two built-in namespaces, HBase and default. HBase stores built-in HBase tables. The default table is the default namespace used by users.
-
Region
A table concept similar to a relational database. The difference is that HBase only needs to declare column families when defining tables. This means that fields can be specified dynamically and on demand when data is written to HBase. Therefore, HBase can handle field change scenarios more easily than relational databases.
-
Row
Each row of data in an HBase table consists of one RowKey and multiple columns. Data is stored in the dictionary order of rowkeys and can only be retrieved based on rowkeys during data query. Therefore, RowKey design is very important.
-
Column
Each Column in HBase is qualified by Column Family and Column Qualifier, such as info: name and info: age. When you build a table, you only need to specify the column family, and column qualifiers do not need to be defined beforehand.
-
Time Stamp
This field identifies different versions of data. If no timestamp is specified when data is written, the system automatically adds this field to the data. The value is the time when HBase is written.
-
Cell
Uniquely identified by {Rowkey, Column Family: Column Qualifier, time Stamp}. The data in a cell is untyped and stored in bytecode form.
1.3 HBase Basic architecture
Architectural roles:
-
Region Server
RegionServer Is the Region manager and its implementation class is HRegionServer. It provides the following functions:
Operations on data: get, PUT, delete;
Operations on regions include splitRegion and compactRegion.
-
Master
The Master is the administrator of all Region servers. Its implementation class is HMaster and its functions are as follows:
Operations on tables: create, delete, alter
RegionServer operations: Allocate Regions to each RegionServer and monitor each RegionServer
State, load balancing and failover.
-
Zookeeper
HBase uses Zookeeper to implement high availability (HA) of the Master, RegionServer monitoring, metadata entry, and cluster configuration maintenance.
-
HDFS
HDFS provides basic data storage services for HBase and high availability support for HBase.
2. HBase Quick Start
2.1 HBase Installation and deployment
2.1.1 Zookeeper is deployed properly
Ensure the normal deployment of the Zookeeper cluster and start it:
[moe@hadoop102 ~]$ zk.sh start
Copy the code
2.1.2 Hadoop is deployed normally
Hadoop cluster deployment and startup:
[moe@hadoop102 ~]$ myhadoop.sh start
Copy the code
2.1.3 HBase decompression
Decompress Hbase to a specified directory:
[moe@hadoop102 ~]$tar -zxvf /opt/software/hbase-1.3.1-bin.tar.gz -c /opt/module/ [moe@hadoop102 module]$mv Hbase - 1.3.1 / hbaseCopy the code
2.1.4 HBase configuration file
Modify HBase configuration files.
-
Hbase-env. sh Modified contents:
exportJAVA_HOME = / opt/module/jdk1.8.0 _212export HBASE_MANAGES_ZK=false Copy the code
-
Hbase-site.xml modified contents:
<configuration> <property> <name>hbase.rootdir</name> <value>hdfs://hadoop102:8020/HBase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <! -- new change after 0.98, not in previous version. <property> <name>hbase.master.port</name> <value>16000</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>hadoop102,hadoop103,hadoop104</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/ opt/module/zookeeper - 3.5.7 / zkData</value> </property> </configuration> Copy the code
-
Regionservers:
hadoop102 hadoop103 hadoop104 Copy the code
-
Soft connection hadoop configuration file to HBase:
[MOE @ hadoop102 ~] $ln -s/opt/module/hadoop - 3.1.3 / etc/hadoop/core - site. XML/opt/module/hbase/conf/core - site. XML [MOE @ hadoop102 ~] $ln -s/opt/module/hadoop - 3.1.3 / etc/hadoop/HDFS - site. XML/opt/module/hbase/conf/HDFS - site. XMLCopy the code
2.1.5. HBase is remotely sent to another cluster
[moe@hadoop102 module]$ xsync hbase/
Copy the code
2.1.6 Starting the HBase service
-
Startup Mode 1
bin/hbase-daemon.sh start master bin/hbase-daemon.sh start regionserver Copy the code
Note: If the time of nodes between clusters is not synchronized, RegionServer cannot be started and a ClockOutOfSyncException is thrown.
Repair tips:
A. Time synchronization service
B, attributes: hbase. Master. Maxclockskew set a larger value
<property> <name>hbase.master.maxclockskew</name> <value>180000</value> <description>Time difference of regionserver from master</description> </property> Copy the code
-
Startup Mode 2
bin/start-hbase.sh Copy the code
Corresponding service stop:
bin/stop-hbase.sh Copy the code
2.1.7. View the HBase page
After the HBase management page is successfully started, you can use host:port to access the HBase management page. For example:
http://hadoop102:16010
2.2 HBase Shell Operations
2.2.1 Basic Operations
-
The cli of the HBase client is displayed
bin/hbase shell Copy the code
-
Viewing Help Commands
hbase(main):001:0> help Copy the code
-
View which tables are present in the current database
hbase(main):002:0> list Copy the code
2.2.2 Table operations
-
Create a table
hbase(main):003:0> create 'student','info' Copy the code
-
Insert data into the table
put 'student','1001','info:sex','male' put 'student','1001','info:age','18' put 'student','1002','info:name','Janna' put 'student','1002','info:sex','female' put 'student','1002','info:age','20'Copy the code
-
Scan to view table data
hbase(main):009:0> scan 'student' Copy the code
-
View table structure
hbase(main):010:0> describe 'student' Copy the code
-
Updates the data for the specified field
hbase(main):011:0> put 'student','1001','info:name','Nick' hbase(main):012:0> put 'student','1001','info:age','100' Copy the code
-
View data for Specified Row or Specified Column Family: Column
hbase(main):001:0> get 'student','1001' Copy the code
hbase(main):002:0> get 'student','1001','info:name' Copy the code
-
Number of rows of statistics table data
hbase(main):003:0> count 'student' Copy the code
-
Delete the data
Delete all data for a rowkey:
hbase(main):004:0> deleteall 'student','1001' Copy the code
Delete a column from a rowkey:
hbase(main):007:0> delete 'student','1002','info:sex' Copy the code
-
Clear table data
hbase(main):010:0> truncate 'student' Copy the code
Tip: Clear the table in the disable sequence and then truncate sequence.
-
Delete table
-
First, we need to set the table to disable:
hbase(main):014:0> disable 'student' Copy the code
ERROR: Table student is enabled. Disable it first.
-
Then you can drop the table:
hbase(main):013:0> drop 'student' Copy the code
-
3. HBase Advanced
3.1. Architecture principle
-
StoreFile
Physical files for storing actual data. StoreFile is stored in the HDFS as hfiles. Each Store will have one or more storeFiles (hFiles), and the data is ordered in each StoreFile.
-
MemStore
Write cache: The data in HFile is ordered, so the data is stored in MemStore first. After sorting, the data will be written to HFile when the time to brush is reached. Each brush will form a new HFile.
-
WAL
Data can be written to HFile only after sorting by MemStore, but saving data in memory has a high probability of data loss. To solve this problem, data will be written to a file called write-Ahead logfile before being written to MemStore. So in the event of a system failure, data can be reconstructed from this log file.
3.2. Write process
Writing process:
-
The Client accesses ZooKeeper and obtains hbase: Region Server on which the meta table resides.
-
Access the corresponding Region Server, obtain the hbase: Meta table, and query the Region in which the target data resides based on the namespace: Table/Rowkey of the read request. Region information of the table and meta table location information are cached in meta Cache of the client for next access.
-
Communicates with the target Region Server.
-
Write (append) data sequentially to WAL;
-
Write the data into the corresponding MemStore, and the data will be sorted in the MemStore.
-
Send an ACK to the client;
-
When the MemStore brush time is reached, the data will be written to HFile.
3.3. Read process
Reading process:
-
The Client accesses ZooKeeper and obtains hbase: Region Server on which the meta table resides.
-
Access the corresponding Region Server, obtain the hbase: Meta table, and query the Region in which the target data resides based on the namespace: Table/Rowkey of the read request. Region information of the table and meta table location information are cached in meta Cache of the client for next access.
-
Communicates with the target Region Server.
-
Query the target data in Block Cache, MemStore, and Store File respectively, and merge all the found data. All data here refers to different versions (time stamp) or types (Put/Delete) of the same data.
-
Data blocks (HFile data storage unit, default size: 64KB) queried from files are cached to Block Cache.
-
The final result of the merge is returned to the client.
3.4, StoreFile Compaction
Because memStore generates a new HFile every time it is flushed, and different versions (TIMESTAMP) and different types (Put/Delete) of the same field may be distributed in different Hfiles, all hfiles need to be traversed during query. To reduce the number of hfiles and clean up stale or deleted data, a StoreFile Compaction occurs.
There are two types of Compaction: Minor Compaction and Major Compaction. Minor Compaction consolidates several nearby smaller Hfiles into one larger HFile, but does not clean up expired or deleted data. A Major Compaction compacts all hfiles from a Store into a single HFile and wipes out expired or deleted data.