CenterOS7: hbase cluster environment

What is the hbase

HBase is a distributed column-oriented database based on the Hadoop file system. It is an open source project that is horizontally scalable. HBase is a data model similar to Google’s large table design that provides fast and random access to massive amounts of structured data. It takes advantage of the fault tolerance provided by Hadoop’s file system (HDFS). It is a Hadoop ecosystem that provides random real-time read/write access to data and is part of the Hadoop file system. People can store HDFS data directly or through HBase. Read consumption/random access data in the HDFS using HBase. HBase is on top of Hadoop file systems and provides read and write access

HBase and cluster.

HDFS HBase
HDFS is a distributed file system suitable for storing large-capacity files HBase is a database based on the HDFS.
HDFS does not support fast individual record lookup HBase provides quick lookup in large tables
It provides high latency batch processing; There is no batch concept It provides billions of records with low latency access to a single row record (random access).
It provides data that can only be accessed sequentially HBase uses hash tables to provide random access and stores indexes to quickly search for data in HDFS files.

Hbase data model

Row Key: the primary Key of a Table. Records in the Table are sorted by Row Key

Timestamp: Timestamp. The Timestamp corresponding to each data operation can be regarded as the version number of the data

The Column Family: Column cluster. The Table is composed of one or more Column families in the horizontal direction. A Column Family can be composed of any number of columns, that is, the Column Family supports dynamic expansion without pre-defining the number and type of columns. All columns are stored in binary format. You need to convert them by yourself.

Ok, so you may be confused by the concept, so I will directly lead you to build an environment and do some simple applications to experience the use of hbase.

Environment set up

You can download the hbase software package by yourself. I provide hBase-1.3.1 version for you. Click download

Next, put the downloaded software into your own directory on centerOS7. Mine is under /home/mmcC. Then go to the directory and decompress the files

cd/ home/MMCC tar - ZXVF hbase - 1.3.1 - bin. Tar. GzCopy the code

Then configure the hbase-env.sh file and go to the hbase root directory

Vi /conf/hbase-env.sh // Edit the environment variable configuration scriptexportJAVA_HOME=/home/ MMCC /jdk1.8 // Configure Java environment variables, replace the hash mark with your own JDK pathexport HBASE_MANAGES_ZK=true// Use the built-in ZooKeeperexportHBASE_PID_DIR=/home/ MMCC /hbase-1.3.1/hbase_tmp // Changes the path for saving pid filesCopy the code

Then configure the hbse-site. XML file

XML <property> <name>hbase.rootdir</name> // Set the hbase root directory on the HDFS <value> HDFS ://master:9000/hbase</value> < / property > < property > < name >. Hbase cluster. Distributed < / name > / / whether the open cluster < value >true</value> </property> <property> <name>hbase.zookeeper.quorum</name> // Set the ZooKeeper cluster node <value>master,slave1,slave2</value> </property> <property> <name>hbase.tmp.dir</name> // Sets the temporary file directory <value>/home/mmcc/hbase-1.3.1/data/ TMP </value> </property> <property> // Set the web management interface port< name>hbase.master.info.port</name> <value>60010</value> </property>Copy the code

When configuring the ZooKeeper cluster node, add master:port if ZooKeeper has a separate port number. The default port number is master:2181, which can be omitted by default

Then configure the hbase cluster node service

vi /conf/regionservers

slave1
slave2
Copy the code

Configure all zooKeeper service nodes

The hbase environment of the master node has been configured, but this is a standalone version. You need to create a cluster. You can copy hbase directly to different nodes in the following ways

SCP -r /home/mmcc/hbase-1.3.1 slave1/slave1 NODE IP address :/home/mmcc SCP -r /home/mmcc/hbase-1.3.1 slave2/slave2 node IP address :/home/mmccCopy the code

Copy all files in /hbase-1.3.1 to /home/mmcc on Slave1. Since the VM is created by cloning, the files and directories are the same

Configure the bin directory of hbase to environment variables on each node

Vi/etc/profile HBASE_HOME = / home/MMCC/hbase - 1.3.1 PATH =$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$HBASE_HOME/bin:$PATH:.

source/etc/profiler// enables environment variables to take effectCopy the code

Start the cluster

Start-hbase. sh // The Hadoop cluster service has been startedCopy the code

An internal ZooKeeper cluster is started and the master and node services are started. Once started, take a look at the current service through JPS

You can see that the cluster has been started successfully.

Now we are using our own ZooKeeper as the coordination service, so the startup steps are as follows

  1. Starting a Hadoop Clusterstart-all.sh
  2. Starting an hbase Clusterstart-hbase.sh

Stop the step

  1. Stopping an hbase Clusterstop-hbase.sh
  2. Stopping the Hadoop Clusterstop-all.sh

Custom zookeeper

When configuring the hbase-env. Sh file, set export HBASE_MANAGES_ZK=false to false, indicating the cluster that uses customized ZooKeeper. If the zooKeeper cluster port is configured separately, the default port is not 2181. The zooKeeper cluster must be configured with ports in hbase-site. XML

< property > < name > hbase. Zookeeper. Quorum < / name > / / set the zookeeper cluster node < value > master: the port, slave1: port, slave2: port value > < / </property>Copy the code

Use custom ZooKeeper cluster startup sequence

  1. Starting a Hadoop Cluster
  2. Starting the ZK Cluster (ZooKeeper)
  3. Starting an hbase Cluster

Stop the step

  1. Stopping an hbase Cluster
  2. Stopping the ZK Cluster (ZooKeeper)
  3. Stopping the Hadoop Cluster

The setup and use of a ZK cluster will be covered in a later section