preface

As a Java engineer, I use HBase in my work. Therefore, I read the HBase user manual compiled after HBase Doesn’t Sleep.

First HBase

Why HBase

HBase’s complex storage structure and distributed storage have the following disadvantages: It does not store a small amount of data quickly. HBase is not fast, but it is not significantly slow when there is a large amount of data. HBase can be used in the following situations

  • The amount of single table data exceeds tens of millions, and the concurrency is quite high.
  • Data analysis needs to be weak or not as flexible or real-time.

HBase Basic Architecture



HBase has a Master server and a RegionServer server. The Master server only maintains table structure information. Actual data is stored on the RegionServer server. ZooKeeper manages all RegionServer information of HBase. The client communicates with ZooKeeper and then connects to the RegionServer.

What is the Region

Region is a collection of rows. Region Cannot cross servers. One RegionServer has one or more regions.

What is the RegionServer

RegionServer is a container for storing regions.

What is the Master

The master is only responsible for coordinating operations, such as creating tables, deleting tables, moving regions, and merging.

The data model

  • Namespace (table Namespace)

Table namespaces are not mandatory and are only used when multiple tables are grouped into a group for unified management. I didn’t mention this concept before, because it’s not usually used by beginners, or when there aren’t that many tables in the database.

  • The Table (Table).

A table consists of one or more column families. Data attributes, such as timeout (TTL), compression algorithms, and so on, are defined in the column family definition. After the column family is defined, the table is empty. The table has no data until rows are added.

  • The Row (line)

A row contains multiple columns, which are classified by column families. Data in the respective column family can only be selected from the column family as defined in the table, can’t define this table does not exist in the column family, otherwise you will get a NoSuchColumnFamilyException. Because HBase is a column database, data in a row can be distributed on different servers.

  • Column Family

A column family is a collection of columns. In fact, a column database only needs columns. Why do you need a column family? HBase tries to store columns of the same column family on the same server to improve access performance and manage associated columns in batches. All data attributes are defined on the column family. In HBase, tables define column families rather than columns. Column families are the most important concepts in HBase.

  • Column Qualifier

Multiple columns make up a row. Column families and columns are often indicated together by Column Family: Column Qualifier. Columns can be defined at will. There are no names or numbers of columns in a row, but only column families.

  • Cell

Multiple versions of data can be stored in a single column. Each edition is called a Cell. Therefore, the concept of a Cell in HBase is different from that of a traditional relational database. Data in HBase is more fine-grained than traditional data structures. Data in the same location is divided into multiple versions. Timestamp (Timestamp/version number) : You can call this either a Timestamp or a version number because it is used to calibrate the version numbers of multiple cells in the same column.

  • Timestamp (Timestamp/version number)

You can call it either a timestamp or a version number, because it is used to calibrate the version numbers of multiple cells in the same column. If you do not specify a version number, the system automatically uses the current timestamp as the version number. When you manually define a number as a version number, the Timestamp only means a version number.


How does HBase store data

RegionServer

  • WAL

WAL is short for write-ahead Log. Prewrite. When an operation reaches Region, HBase writes the operation to WAL. HBase stores data to the Memstore based on memory. Data is flushed to the HFile when it reaches a certain amount. If the server goes down or loses power in the process, the data is lost. WAL is an insurance mechanism where data is written to WAL before it is written to Memstore. This allows data to be recovered from WAL during failover.

  • Region

Region Is equivalent to a data fragment. Each Region has a start rowkey and an end Rowkey, representing the range of rows it stores.

  • Store

Each Region contains multiple Store instances. Each Store corresponds to the data in a column family. If a table has two column families, there are two stores in a Region. In the anatomical diagram of a single Store on the right, we can see that there are two components, MemStore and HFile, inside the Store.

  • MemStore

There is one MemStore instance in each Store. After data is written to WAL, it is put into MemStore. MemStore is a memory storage object. Data is flushed into HFile only when MemStore is full.

  • HFile

There are multiple hfiles in the Store. When the MemStore is full, HBase generates a new HFile on HDFS and writes the MemStore contents to the HFile. HFile is a data storage entity that directly deals with HDFS.


Common HBase Commands

list;
create 'table'.'cf';
describe 'table';
put 'table' ,'rowKey'.'cf:column'.'value';
scan 'table';
alter 'table',{NAME=>'cf',VERSIONS=>5};
scan 'table',{STARTROW=>'rowKey1',ENDROW=>'rowKey3',LIMIT=>2};
get 'table'.'rowKey'.'cf:name';
get 'table'.'rowKey', {COLUMN=>'cf:column ',VERSIONS=>5};
delete 'table'.'rowKey'.'cf:name';
delete 'table'.'rowKey'.'cf:name'.2; //Logically delete all data before the version number when scan'table'{RAW,=>true,VERSIONS=>5}; //Query the logically deleted data deleteall'table', "rowKey';
disable 'table';
drop 'table';
append 'table','rowKey', 'cf:name ', 'valueAppend',ATTRIBUTES =>{ 'kid'=>'yes'};
status;
Copy the code

HBase JAVA client API

Create a table

// Create a connection
Connection connection = ConnectionFactory.createConnection(config);
// Get the admin object
Admin admin = connection.getAdmin();
// Define the table name
TableName tableName = TableName.valueOf("table");
/ / define table
HTableDescriptor table = new HTableDescriptor(tableName) ;
// Define a column family
HColumnDescriptor cf = new HColumnDescriptor("cf");
table.addFamily(new HColumnDescriptor(cf));
// Perform the create table action
admin.createTable(table);
Copy the code

Insert data

 Table table = connection.getTable(TableName.valueOf("table"));
 Put put = new Put(Bytes.toBytes("rowKey"));
 put.addColumn(Bytes.toBytes("cf"),Bytes.toBytes("column"), Bytes.toBytes("value"));
 table.put(put);
Copy the code

checkAndPut

The CAS operation

Append

Append data to value

Increment

Method of adding atomicity

Query data

// Only rowKey can be used
Get get = new Get(Bytes.toBytes("rowKey"));
Result result = table.get(get);
Copy the code

Exists

// Check whether the data exists
boolean exists (Get get);
Copy the code

Delete the data

Delete delete=new Delete(Bytes.toBytes("rowKey")); // Delete all data under the rowKey
delete.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("column")); // Add column family and column, more granular deletion.
table.delete(delete);
Copy the code

The batch operation

void batch(List<Row> actions,Object[] results);
void put(List<Put> puts);
Result[] get(List<Get> gets);
void delete(List<Delete> deletes);
Copy the code

HBase Advanced API of the JAVA client

The comparator

CompareFilter (Comparison)

  • LESS: LESS than
  • LESS_OR_EQUAL: Less than or equal to
  • EQUAL: EQUAL.
  • NOT_EQUAL: not equal
  • GREATER_OR_EQUAL: indicates that the value must be greater than or equal to the value
  • GREATER.
  • NO_OP: no operation is performed

SubstringComparator

Checks whether the target string contains the specified string, similar to MySQL’s like.

BinaryComparator

The match comparator, which works with CompareFilter, is similar to MySQL’s equal greater than less than. When comparing numbers, make sure you store byte arrays instead of strings.

LongComparator

Digital comparator

The filter

ValueFilter ValueFilter

You do not need to specify a column. Query all rows that contain the value in the column.

Filter filter = new ValueFilter(CompareFilter.CompareOp.EQUAL, new SubstringComparator("value"));
Copy the code

Single-column value filter SingleColumnValueFilter

You can specify the columns for the query

// select * from tableName where `cf:column` like '%value%';
Filter filter = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("column"), CompareFilter.CompareOp.EQUAL, new SubstringComparator("value"));
Copy the code

Disadvantages of single-value filters

A single-column value filter puts the entire row into the result set when it finds that the row does not have a column that you want to compare. If you want to safely use a single-column value filter, make sure that each row of your records contains the column to be compared. If you cannot guarantee that every row contains the column to be compared, you can use one of the following scenarios

  1. When iterating through the result set, we again determine if the result contains the column we are comparing, and if not, we do not use the record.
  2. Use the filter list to place the FamilyFilter, QualifierFilter, and value filters into the filter list at the same time.

Filter list FamilyFilter

// Create a filter list
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);

// Only records with a cf column family are added to the result set
Filter familyFilter = new FamilyFilter(CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("cf")));
filterList.addFilter(familyFilter);

// Only records in column are placed in the result set
Filter colFilter = new QualifierFilter(CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("column")));
filterList.addFilter(colFilter);

// Only records whose values contain value are placed in the result set
Filter valueFilter = new ValueFilter(CompareFilter.CompareOp.EQUAL, new SubstringComparator("value"));

filterList.addFilter(valueFilter);
Copy the code

Implement AND AND OR relationships for filter lists

Using Operator enumeration

  • MUST_PASS_ALL conditions and
  • MUST_PASS_ONE conditions or
FilterList filterList = new FilterList(Operator operator,List<Filter> rowFilters);
Copy the code

PageFilter PageFilter

The paging filter does not implement the page-turning function, you can use business code to implement paging.

// select * from tableName limit 5;
Filter filter = new PageFilter(5);
Copy the code

RowFilter RowFilter

// select * from tableName where rowKey > rowKey;
Filter filter = new RowFilter(CompareFilter.CompareOp.GREATER, new BinaryComparator(Bytes.toBytes("rowKey")));
Copy the code

PrefixFilter PrefixFilter

// select * from tableName where rowKey like 'rowKey%';
PrefixFilter prefixFilter = new PrefixFilter(Bytes.toBytes("rowKey"));
Copy the code

Column FamilyFilter FamilyFilter

Filter out data equal to the column family

Filter filter = new FamilyFilter(CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("cf")));
Copy the code