Good programmer technology sharing: Hbase refined solution, why Hbase? What is hbase? Hbase architecture.




1. Why hbase?


With the increasing amount of data, the traditional relational database can not meet the requirements of storage and query. Hive can meet storage requirements, but it cannot store and query unstructured and semi-structured data.


2. What is hbase?


Hbase is an open source, distributed, multi-version, and extensible non-relational database. Hbase is the open source Java version of BigTable. Based on HDFS, hbase provides a noSQL database system with high reliability, high performance, column storage, scalability, and real-time read and write. This scenario applies to the following scenarios: Massive unstructured data needs to be stored.


Random near-real-time read and write management data is required.


3. Hbase architecture


client\zookeeper\hmaster\


hregionserver\hlog\hregion\memstore\storefile\hfile


Client: hbase client, including interfaces for accessing hbase (Linux shell and Java API)


The client maintains caches, such as region location information, to speed up hbase access.


Zookeeper: Monitors hMaster status to ensure that only one active HMaster is available. Store all region addressing entries, –root table on that server. Monitors hRegionServer status in real time and notifies hMaster of RegionServer offline information in real time. Store information about all hbase tables (hbase metadata)


Hmaster :(hbase boss) allocates regions (such as creating tables) for the regionserver. Load balancing of RegionServer. Reassign regions (HRegionServer exception, HRegion fission). Garbage collection on HDFS. Process schema update requests.


Hregionserver :(younger brother of hbase)hregionserver maintains the regions assigned by the master (manages regions on the local machine). Processes CLIENT I/O requests to these regions and interacts with HDFS


Region Server Is responsible for dividing regions that become larger during operation.


Hlog: Records hbase operations. WAL data is written to log first and then to memStore. In case of data loss, data can be rolled back.


Hregion: minimum unit, table, or part of a table in hbase distributed storage and load balancing.


Store: corresponds to a column cluster.


Memstore: 128 MB memory buffer used to batch refresh data to the HDFS.


Hstorefile (hfile) : Data in hbase is stored in the HDFS as hfiles.


Quantitative relationship among components:


hmaster:hregionserver=1:n


hregionserver:hregion=1:n


hregionserver:hlog=1:1


hregion:hstore=1:n


store:memstore=1:1


store:storefile=1:n


storefile:hfile=1:1


Hbase Keywords:


Rowkey: the rowkey, which is the same as the primary key of mysql.


Columnfamily: column cluster (a collection of columns).


The column, column.


Timestamp: indicates the timestamp. By default, the latest timestamp is displayed.


Version: indicates the version number.


Cell: a cell.


4. Relationship between hbase and Hadoop


Hbase is based on Hadoop: Hbase storage relies on HDFS. Hbase features:


Mode: No mode.


Data type: single byte[].


Multiple versions: Each value can have multiple versions.


Column storage: A column cluster is stored in a directory.


Sparse storage: If the key-value is null, no storage space is occupied.


In addition to hbase installation:


1. Standalone mode


1) Decompress and configure environment variables


Tar -zxvf hbase-1.2.1-bin.tar.gz -c /usr/local


cd /usr/local


vi /etc/profile


source /etc/profile


2) Test the hbase installation


hbase version


Configure the hbase configuration file


vi conf/hbase-env.sh


JAVA_HOME


Note:


# Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+


Export HBASE_MASTER_OPTS=”$hbase_master_OPts-xx :PermSize= 128M -XX:MaxPermSize= 128M “.


Export HBASE_REGIONSERVER_OPTS=”$hbase_REGIONServer_OPts-XX :PermSize= 128M -xx :MaxPermSize= 128M “.


vi hbase-site.xml


hbase.rootdir


file:///usr/local/hbasedata


hbase.zookeeper.property.dataDir


/usr/local/zookeeperdata


Starting the hbase service:


bin/start-hbase/sh


Start the client:


bin/hbase shell


2, pseudo-distributed


3, fully distributed


Decompress and configure environment variables


Configure the hbase configuration file


vi conf/hbase-env.sh


export HBASE_MANAGES_ZK=false


vi regionservers


vi backup-masters


vi hbase-site.xml


hbase.cluster.distributed


true


hbase.rootdir


hdfs://qianfeng/hbase


hbase.zookeeper.property.dataDir


/usr/local/zookeeperdata


hbase.zookeeper.quorum


hadoop05:2181,hadoop06:2181,hadoop07:2181


Note:


If HDFS is highly available, copy core-site. XML and hdFS-site. XML in Hadoop to the hbase/conf directory.


Distribution:


SCP – r hbase – 1.2.1 root @ hadoop06: $PWD


SCP – r hbase – 1.2.1 root @ hadoop07: $PWD


Activation:


1) start the zk


2) start the HDFS


3) start the hbase


The hbase cluster time must be synchronized.


Hmaster: 16010


Hregionserver: 16030


Hbase shell operations


help


help “COMMAND”


help “COMMAND_GROUP”


Lists all tables under the current namespace


list


Create a table:


create ‘test’,’f1′, ‘f2’


The namespace:


Hbase does not have the concept of a library, but has the concept of a namespace or group. A namespace is equivalent to a library.


Hbase has two groups by default:


Default:


Hbase:


List all namespcae:


list_namespace


list_namespace_tables ‘hbase’


create_namespace ‘ns1’


describe_namespace ‘ns1’


alter_namespace ‘ns1’, {METHOD => ‘set’, ‘NAME’ => ‘gjz1’}


alter_namespace ‘ns1’, {METHOD => ‘unset’, NAME => ‘NAME’}


Drop_namespace ‘ns1’ ##


DDL:


Group name: ddl


Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters


Create a table:


create ‘test’,’f1′, ‘f2’


create ‘ns1:t_userinfo’,{NAME=>’base_info’,BLOOMFILTER => ‘ROWCOL’,VERSIONS => ‘3’}


Create ‘NS1: T1 ‘,’ F1 ‘, and the Range of Rowkeys assigned to that region => [’10’, ’20’, ’30’, ’40’].


Modify table :(if yes, it will be updated, if no, it will be added)


alter ‘ns1:t_userinfo’,{NAME=>’extra_info’,BLOOMFILTER => ‘ROW’,VERSIONS => ‘2’}


alter ‘ns1:t_userinfo’,{NAME=>’extra_info’,BLOOMFILTER => ‘ROWCOL’,VERSIONS => ‘5’}


Delete column cluster:


alter ‘ns1:t_userinfo’, NAME => ‘extra_info’, METHOD => ‘delete’


alter ‘ns1:t_userinfo’, ‘delete’ => ‘base_info’


Delete table :(disable table first)


disable ‘ns1:t1’


drop ‘ns1:t1’


DML:


Group name: dml


Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve


Insert data :(cannot insert more than one column at a time)


put ‘ns1:test’,’u00001′,’cf1:name’,’zhangsan’


put ‘ns1:t_userinfo’,’rk00001′,’base_info:name’,’gaoyuanyuan’


put ‘ns1:t_userinfo’,’rk00001′,’extra_info:pic’,’picture’


Update data:


put ‘ns1:t_userinfo’,’rk00001′,’base_info:name’,’zhouzhiruo’


put ‘ns1:t_userinfo’,’rk00002′,’base_info:name’,’zhaoming’


A table scan (scan)


scan ‘ns1:t_userinfo’


scan ‘ns1:t_userinfo’,{COLUMNS => [‘base_info:name’,’base_info:age’]}


Set search conditions :(header but not tail)


scan ‘ns1:t_userinfo’,{COLUMNS => [‘base_info:name’,’base_info:age’],STARTROW=>’rk000012′,LIMIT=>2}


scan ‘ns1:t_userinfo’,{COLUMNS => [‘base_info:name’,’base_info:age’],STARTROW=>’rk000012′,ENDROW=>’rk00002′,LIMIT=>2}


Query data :(GET)


get ‘ns1:t_userinfo’,’rk00001′


Get ‘ns1: t_userinfo’, ‘rk00001’ {TIMERANGE = > [1534136591897153136677] 47}


get ‘ns1:t_userinfo’,’rk00001′,{COLUMN=>[‘base_info:name’,’base_info:age’],VERSIONS =>4}


get ‘ns1:t_userinfo’,’rk00001′,{TIMESTAMP=>1534136580800}


DELETE data :(DELETE)


delete ‘ns1:t_userinfo’,’rk00002′,’base_info:age’


‘ns1: t_userinfo’, ‘rk00001, {TIMERANGE = > [1534138686498153138388] 62}


Delete the specified version :(delete the version up)


delete ‘ns1:t_userinfo’,’rk00001′,’base_info:name’,TIMESTAMP=>1534138686498


Table:


exists ‘ns1:t_userinfo’


disable ‘ns1:t_userinfo’


enable ‘ns1:t_userinfo’


desc ‘ns1:t_userinfo’


Statistical table :(not recommended because of poor statistical efficiency)


count ‘ns1:t_userinfo’


Empty tables:


truncate ‘ns1:test’


In the era of big data, China’s IT environment will also face a reshuffle, not only for enterprises, but also for programmers.


To learn big data development, you can refer to the big data learning route provided by programmers, which provides a complete knowledge system of big data development, including Linux&&Hadoop ecosystem, big data computing framework system, cloud computing system, machine learning && Deep learning.