The HBase Doesn’t Sleep Book is an HBase technical book that makes people not fall asleep after reading. It is very good. In order to deepen my memory, I decided to organize important parts of the book into reading notes for later reference and hope to bring some help to students who are just learning HBase.

directory

  • Chapter 1 – Getting to know HBase
  • Chapter 2 – Get HBase Running
  • Chapter 3: HBase Basic Operations
  • Chapter 4 – Getting started with the Client API
  • Chapter 5 – HBase Internal Exploration
  • Chapter 6 – Advanced usage of the client API
  • Chapter 7 – Client API management capabilities
  • Chapter 8 – Faster
  • Chapter 9 – When HBase Meets MapReduce

1. Best practices

1. Create an HBase connection

Note the following when creating HBase connections:

  • Add the HBase configuration folderhbase-site.xmlAnd Hadoop configuration foldercore-site.xmlDrag the configuration file from the server to the project Resources folder;
  • If the connection fails to be created, check whether the IP address and hostname mapping of the server are forgotten in the hosts file of the local computer, so that the computer cannot find these servers.
  • After shutting down the operation, close the resource. JDK7 is recommendedtry-with-resourcesFeatures;
  • In general, Configuration is recommended as a singleton. The Connection is used as it is built. Close the Connection when it is finished.

It is not recommended to implement Connection as a singleton, so that when one operation is stuck, other subsequent operations are blocked and cannot be concurrent.

public static void main(String[] args) throws URISyntaxException, IOException {/ / get the Configuration file Configuration config = HBaseConfiguration. The create (); config.addResource(new Path(ClassLoader.getSystemResource("hbase-site.xml").toURI()));
    config.addResource(new Path(ClassLoader.getSystemResource("core-site.xml").toURI())); / / create a Connection try (Connection Connection = ConnectionFactory. The createConnection (config); Admin Admin = connection.getadmin ()) {// define the TableName TableName TableName = tablename.valueof ("tb1"); / / define column family ColumnFamilyDescriptor myCf = ColumnFamilyDescriptorBuilder. Of ("cf1"); / / define table TableDescriptor table = TableDescriptorBuilder. NewBuilder (tableName). SetColumnFamily (myCf). The build (); Admin.createtable (table); } catch (Exception ex) { ex.printStackTrace(); }}Copy the code

2. HTable class and Table interface

Early tutorials will teach you how to use the HTable class, and you don’t need to manually get a Connection to use this class. You just pass the Configuration class as a build parameter to the HTable class, which will automatically connect and complete the operation. This method looks simple, but has many performance and security implications, so this class has been deprecated.

It is recommended that you manually obtain Connection and then obtain the Table interface from Connection:

HTable table = new HTable(config,"mytable"); / / a recommended try (Connection Connection = ConnectionFactory. The createConnection (config)) { connection.getTable(TableName.valueOf("tb1"));
}
Copy the code

Two, several important methods

1, checkAndPut (data consistency)

The checkAndPut method is used to solve the problem of data inconsistency if someone else has also modified the data between reading and modifying the data.

The checkAndPut () method simply combines the checkAndPut () and write () steps into one. The checkAndPut () method compares the existing data to the one you passed, and returns true if the data is the same. If inconsistent, false is returned, but no data is written.

CheckAndPut is deprecated in the latest version and is officially recommended to use checkAndMutate.

2, checkAndMutate

Table. CheckAndMutate (byte[], byte[]) is recommended. CheckAndMutate checks whether the row/family/ Qualifier value matches the expected value before performing the Put/Delete/RowMutations operation, and does not perform the operation if it does not.

table.checkAndMutate(row, family).qualifier(qualifier).ifNotExists().thenPut(put);
Copy the code

3, increment

Add N (N can be positive or negative) to a column in the database to ensure atomicity.

Table table = connection.getTable(TableName.valueOf("tb1"));
Increment inc = new Increment(Bytes.toBytes("row1"));
inc.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("age"), 1L);
table.increment(inc);
Copy the code

4. Batch operation

Batch method is used to perform Put, Get, and Delete operations at a time to facilitate operation and improve performance. The actions in actions can be Put, Get, or Delete. The second parameter results is the result of the operation, and the order of the results in Results corresponds to the order of the operation list passed in.

void batch(List<Row>actions, Object[] results)
Copy the code

It is best not to Put and Delete actions for the same cell in the same actions list, because HBase does not necessarily perform these actions sequentially and you may get unexpected results.

5. Batch PUT operations

HBase provides the following operations for batch put: void put(List< put > puts); In fact, the internal is also achieved by batch. Note That if a part of data is successfully inserted but another part of data fails to be inserted (for example, a RegionServer server is faulty), an IOException is returned and the operation is aborted. However, the successfully inserted data is still successfully inserted without being rolled back.

Retry of failed insert

To insert the failure data, the server will try to insert again, or change a RegionServer, when trying to defined times greater than the maximum number of throws RetriesExhaustedWithDetailsException exception, the exception contains a lot of false information, This includes how many operations failed, why they failed, and the server name and number of retries.

If you define the wrong column family, only try one more time, because if the column family all wrong, there is no need to continue to try, HBase can return NoSuchColumnFamilyException directly.

Write a buffer

Data that fails to be inserted will continue to be put into the local write buffer and retried the next time you insert it, and you can even manipulate it, such as erasing it.

6, BufferedMutator

The client write buffer is a caching mechanism in the client JVM that aggregates multiple Put operations and sends them to the client through a single RPC request to save IO costs associated with network handshakes. This buffer can be opened by calling table.setautoFlush (false).

SetAutoFlush is deprecated in the latest version of the API, as is the writeBuffer that comes with each table, but the client-side writeBuffer still exists, just using the BufferedMutator object instead.

BufferedMutator bm = connection.getBufferedMutator(TableName.valueOf("tb1")); // Then use the BufferedMutator object to submit the Put operation bm.mutate(Put); // Then call flush or close to submit the request to bm.flush(); bm.close();Copy the code

In most cases we don’t need to call BufferedMutator directly, nor is it recommended.

7. Scan cache

In early HBase, caching was disabled by default during scanning. However, after repeated practices, caching is enabled in modern HBase by default.

Specifically is this: every next () operation will produce a full RPC requests, and this time the RPC request can get how much data is through the hbase – site. The XML in hbase. Client. Scanner. Caching parameters configured. For example, if you set this item to 1, then you will send 10 requests after traversing 10 results, which is obviously a performance drain, especially if the data volume of a single item is small.

You can change the number of cache entries at the table level or at the scan level. You can change the number of cache entries at the table level by writing the configuration to hbase-site. XML:

<property>
    <name>hbase.client.scanner.caching</name>
    <value>200</value>
</property>
Copy the code

This means that each next operation fetches 200 pieces of data. The default configuration is 100.

You can use the Scan. SetCaching (int Caching) method to modify caching at the Scan level. This configuration has a higher priority than that in the configuration file. Caches are great, but they can take up a lot of memory, the worst of which is an OutOfMemoryException, so don’t blindly scale up the cache.


Any Code, Code Any!

Scan code to pay attention to “AnyCode”, programming road, together forward.