Because the Hbase2 cluster had a failure in the past, it took a weekend to repair, so I went to understand and sort out some hbase failures. The site of the accident can be seen in the previous article: a thrilling experience of the failure of the hbase cluster

I. HBCK consistency

Region consistency refers to the consistency of Region information in meta tables, Region information of online Regionserver, and Regioninfo of HDFS.

2. HBCK2 and HBCK1

HBCK2 is the successor to HBCK, the repair tool, which comes with HBase -1. X (AKA HBck1). Use HBCK2 instead of HBCK1 to restore the hBase-2. x cluster. Hbck1 should not run for hBase-2.x installation. It can cause damage. While HBCK1 is still bundled with hBase-2.x — to minimize surprises — it has been deprecated and will be removed in hBase-3.x. Its write tool (-fix) has been removed. It can report the status of the hBase-2.x cluster, but its assessment will be inaccurate because it does not understand the inner workings of hBase-2.x.

The hbase version is 2.0.0-cDH6.0.1, and the hbase HBCK-h is:

-----------------------------------------------------------------------
NOTE: As of HBase version 2.0, the hbck tool is significantly changed.
In general, all Read-Only options are supported and can be be used
safely. Most -fix/ -repair options are NOT supported. Please see usage
below for details on which options are not supported.
-----------------------------------------------------------------------
Copy the code

Hbase2.0 * does not support HBCK, many read-only commands can be executed, but repair commands cannot be executed at all. Hbase2 can only be downloaded from the official website and compiled by the hbase team. It is better to integrate it into shell commands, and users need to compile it themselves. More and more companies will upgrade from 1.x to 2.x.

NOTE: Following options are NOT supported as of HBase version 2.0+.


  UNSUPPORTED Metadata Repair options: (expert features, use with caution!)
   -fix              Try to fix region assignments.  This is for backwards compatiblity
   -fixAssignments   Try to fix region assignments.  Replaces the old -fix
   -fixMeta          Try to fix meta problems.  This assumes HDFS region info is good.
   -fixHdfsHoles     Try to fix region holes in hdfs.
   -fixHdfsOrphans   Try to fix region dirs with no .regioninfo file in hdfs
   -fixTableOrphans  Try to fix table dirs with no .tableinfo file in hdfs (online mode only)
   -fixHdfsOverlaps  Try to fix region overlaps inhdfs. -maxMerge <n> When fixing region overlaps, allow at most <n> regions to merge. (n=5 by default) -sidelineBigOverlaps When fixing region overlaps, allow to sideline big overlaps -maxOverlapsToSideline <n> When fixing region overlaps, allow at most <n> regions to sideline per group. (n=2 by default) -fixSplitParents Try to force offline split parents to  be online. -removeParents Try to offline and sideline lingering parents and keep daughter regions. -fixEmptyMetaCells Try to fix hbase:meta entries not referencing any region (empty REGIONINFO_QUALIFIER rows) UNSUPPORTED Metadata Repair shortcuts -repair Shortcutfor -fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans -fixHdfsOverlaps -fixVersionFile -sidelineBigOverlaps -fixReferenceFiles-fixHFileLinks
   -repairHoles      Shortcut for -fixAssignments -fixMeta -fixHdfsHoles
Copy the code

In hbase2, the HBCK command does not support repair. You need to use the HBck2 command, which will be described later.

3. Hbck consistency check and repair commands

Consistency Check Commands

Hbase HBCK <-details> < table name >Copy the code

Consistency repair

Hbase HBCK <-fixMeta>,< -FixAssignments > < table name >Copy the code

The command,

-fixMeta: Try to fix meta problems. This assumes HDFS region info is good.Copy the code

If HDFS exists, add it to meta. The region corresponding to meta is not deleted.

-fixAssignments:Try to fix region assignments.  Replaces the old -fix
Copy the code

The operation varies depending on the situation, including offline, shut down, and online again

4. Hbck abnormal location and repair

What are region inconsistency between Meta, RegionServer, and HDFS? How to fix it? You can locate and fix exceptions according to the following exception list:

Don’t agree Exception information repair
Case one Region Is Not In Hbase:Meta
Region information does not exist in meta data or HDFS, but is deployed in Regionserver. errors.reportError(ERROR_CODE.NOT_IN_META_HDFS, “Region “

+ descriptiveName + “, key=” + key + “, not on HDFS or in hbase:meta but ” +

“deployed on ” + Joiner.on(“, “).join(hbi.deployedOn));
FixAssignments
Region Does not exist in the Meta data table and is not deployed to Regionserver, but data is stored in HDFS. errors.reportError(ERROR_CODE.NOT_IN_META_OR_DEPLOYED, “Region “

+ descriptiveName + ” on HDFS, but not listed in hbase:meta ” +

“or deployed on any Region server”
– FixMeta

– FixAssignments
Region does not exist in the Meta data table, but Regionserver is deployed, and data is stored in HDFS. errors.reportError(ERROR_CODE.NOT_IN_META, “Region ” + descriptiveName

+ ” not in META, but deployed on ” + Joiner.on(“, “).join(hbi.deployedOn));
1.FixMeta

2.FixAssignments
The second case Region Is In Hbase:Meta
Region exists only in meta, but does not exist in HDFS or RS errors.reportError(ERROR_CODE.NOT_IN_HDFS_OR_DEPLOYED, “Region “

+ descriptiveName + ” found in META, but not in HDFS “

+ “or deployed on any Region server.”)
FixMeta
Region exists in the Meta table and Regionserver, but does not exist in HDFS. errors.reportError(ERROR_CODE.NOT_IN_HDFS, “Region ” + descriptiveName

+ ” found in META, but not in HDFS, ” +

“and deployed on ” + Joiner.on(“, “).join(hbi.deployedOn));
1.FixAssignments

2.FixMeta
Region exists in the META table and HDFS, and the Region table is not in the Disable state, but is not deployed. errors.reportError(ERROR_CODE.NOT_DEPLOYED, “Region ” + descriptiveName

+ ” not deployed on any Region server.”);
FixAssignments
Region Disabling or disabled errors.reportError(ERROR_CODE.SHOULD_NOT_BE_DEPLOYED,

“Region ” + descriptiveName + ” should not be deployed according ” +

“to META, but is deployed on ” + Joiner.on(“, “).join(hbi.deployedOn));
FixAssignments
Region to allocate errors.reportError(ERROR_CODE.MULTI_DEPLOYED, “Region ” + descriptiveName

+ ” is listed in hbase:meta on Region server ” + hbi.metaEntry.RegionServer

+ ” but is multiply assigned to Region servers ” +

Joiner.on(“, “).join(hbi.deployedOn));
FixAssignments
Region The Regionserver information in the Meta table is inconsistent with the Regionserver deployed. errors.reportError(ERROR_CODE.SERVER_DOES_NOT_MATCH_META, “Region “

+ descriptiveName + ” listed in hbase:meta on Region server ” +

hbi.metaEntry.RegionServer + ” but found on Region server ” +

hbi.deployedOn.get(0));
FixAssignments
The parent region exists in the META and HDFS and is in the segmented state, but the information about the child region is missing in the meta. errors.reportError(ERROR_CODE.LINGERING_SPLIT_PARENT, “Region “

+ descriptiveName + ” is a split parent in META, in HDFS, “

+ “and not deployed on any region server. This could be transient, “

+ “consider to run the catalog janitor first!” );
fixSplitParents

5. Hbck2 command

In hbase2. x, HBCK command is not applicable, and its write function (-fix) has been deleted. Although it can report the status of hbase2. x cluster, because it does not understand the internal working principle of hbase2. x cluster, its evaluation will be inaccurate. So if you’re using Hbase2.x, you should know something about HBCK2, even if you don’t use it very often.

1. Obtain HBCK2

HBCK2 has been separated from HBase as a separate project. If you want to use this tool, you need to compile the source code according to your own HBase version.

Its GitHub address is:Github.com/apache/hbas…

Change the hbase version to the actual hbase2.x version in the POM, and run the package command in the root directory of the project:

mvn clean install -DskipTests
Copy the code

If there are multiple JAR packages, fetch the required HBck2 from hbase-operator-tools/hbase-hbck2/target/ hbase-hbck2-1.0.0-snapshot.jar.

2. Use Hback2

The simplest way to use HBCK2’s dependencies is to start $HBASE_HOME/bin/hbase with a script. The bin/hbase script itself mentions hBCK-an option is listed in the HBCK help output. By default, running runs the built-in HBCK1 tool of bin/hbase HBCK. To run HBCK2, you need to point to the built HBCK2 JAR -j with the following option:

${HBASE_HOME}/bin/hbase --config /etc/hbase-conf hbck -j ~/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar
Copy the code

/etc/hbase-conf is where the deployed configuration is located. Commands that do not pass options or parameters will dump HBCK2 help:

usage: HBCK2 [OPTIONS] COMMAND <ARGS>
Options:
 -d,--debug                                       run with debug output
 -h,--help                                        output this help message
 -p,--hbase.zookeeper.property.clientPort <arg>   port of hbase ensemble
 -q,--hbase.zookeeper.quorum <arg>                hbase ensemble
 -s,--skip                                        skip hbase version check
                                                  (PleaseHoldException)
 -v,--version                                     this hbck2 version
 -z,--zookeeper.znode.parent <arg>                parent znode of hbase
                                                  ensemble
Command:
 addFsRegionsMissingInMeta <NAMESPACE|NAMESPACE:TABLENAME>...
   Options:
    -d,--force_disable aborts fix for table if disable fails.
   To be used when regions missing from hbase:meta but directories
   are present still in HDFS. Can happen if user has run _hbck1_
   'OfflineMetaRepair' against an hbase-2.x cluster. Needs hbase:meta
   to be online. For each table name passed as parameter, performs diff
   between regions available in hbase:meta and region dirs on HDFS.
   Then for dirs with no hbase:meta matches, it reads the 'regioninfo'
   metadata file and re-creates given region in hbase:meta. Regions are
   re-created in 'CLOSED' state in the hbase:meta table, but not in the
   Masters' cache, and they are not assigned either. To get these regions online, run the HBCK2 'assigns'command printed when this command-run completes. NOTE: If using hbase releases older than 2.3.0, a rolling restart of HMasters is needed prior to executing the set of 'assigns' output. An example adding missing regions for tables 'tbl_1' in the default namespace, 'tbl_2' in namespace 'n1' and for all tables from namespace 'n2': $ HBCK2 addFsRegionsMissingInMeta default:tbl_1 n1:tbl_2 n2 Returns HBCK2 an 'assigns' command with all re-inserted regions. SEE ALSO: reportMissingRegionsInMeta SEE ALSO: fixMeta assigns [OPTIONS] 
      
       ... Options: -o,--override override ownership by another procedure -i,--inputFiles take one or more encoded region names A '
      raw' assign that can be used even during Master initialization (if the -skip flag is specified). Skirts Coprocessors. Pass one or more encoded region names. 1588230740 is the hard-coded name for the hbase:meta region and de00010733901a05f5a2a3a382e27dd4 is an example of what a user-space encoded region name looks like. For example: $ HBCK2 assigns 1588230740 de00010733901a05f5a2a3a382e27dd4 Returns the pid(s) of the created AssignProcedure(s) or -1 if none. If -i or --inputFiles is specified, pass one or more input file names. Each file contains encoded region names, one per line. For example: $ HBCK2 assigns -i fileName1 fileName2 bypass [OPTIONS] 
      
       ... Options: -o,--override override if procedure is running/stuck -r,--recursive bypass parent and its children. SLOW! EXPENSIVE! -w,--lockWait milliseconds to wait before giving up; default=1 Pass one (or more) procedure '
      pid's to skip to procedure finish. Parent of bypassed procedure will also be skipped to the finish. Entities will be left in an inconsistent state and will require manual fixup. May need Master restart to clear locks still held. Bypass fails if procedure has children. Add 'recursive' if all you have is a parent pid to finish parent and children. This is SLOW, and dangerous so use selectively. Does not always work. extraRegionsInMeta 
      
       ... Options: -f, --fix fix meta by removing all extra regions found. Reports regions present on hbase:meta, but with no related directories on the file system. Needs hbase:meta to be online. For each table name passed as parameter, performs diff between regions available in hbase:meta and region dirs on the given file system. Extra regions would get deleted from Meta if passed the --fix option. NOTE: Before deciding on use the "--fix" option, it'
      |namespace:tablename>s worth check if
   reported extra regions are overlapping with existing valid regions.
   If so, then "extraRegionsInMeta --fix" is indeed the optimal solution.
   Otherwise, "assigns" command is the simpler solution, as it recreates
   regions dirs in the filesystem, if not existing.
   An example triggering extra regions report for tables 'table_1'
   and 'table_2', under default namespace:
     $ HBCK2 extraRegionsInMeta default:table_1 default:table_2
   An example triggering extra regions report for table 'table_1'
   under default namespace, and for all tables from namespace 'ns1':
     $ HBCK2 extraRegionsInMeta default:table_1 ns1
   Returns list of extra regions for each table passed as parameter, or
   foreach table on namespaces specified as parameter. filesystem [OPTIONS] [<TABLENAME>...]  Options: -f, --fix sideline corrupt hfiles, bad links, and references. Report on corrupt hfiles, references, broken links, and integrity. Pass'--fix' to sideline corrupt files and links. '--fix' does NOT
   fix integrity issues; i.e. 'holes' or 'orphan' regions. Pass one or
   more tablenames to narrow checkup. Default checks all tables and
   restores 'hbase.version' if missing. Interacts with the filesystem
   only! Modified regions need to be reopened to pick-up changes.

 fixMeta
   Do a server-side fix of bad or inconsistent state in hbase:meta.
   Available inHbase 2.2.1/2.1.6 or newer versions. Master UI has matching, new'HBCK Report' tab that dumps reports generated by
   most recent run of _catalogjanitor_ and a new 'HBCK Chore'. It
   is critical that hbase:meta first be made healthy before making
   any other repairs. Fixes 'holes'.'overlaps', etc., creating
   (empty) region directories in HDFS to match regions added to
   hbase:meta. Command is NOT the same as the old _hbck1_ command
   named similarily. Works against the reports generated by the last
   catalog_janitor and hbck chore runs. If nothing to fix, run is a
   noop. Otherwise, if 'HBCK Report' UI reports problems, a run of
   fixMeta will clear up hbase:meta issues. See 'HBase HBCK' UI
   for how to generate new report.
   SEE ALSO: reportMissingRegionsInMeta

 generateMissingTableDescriptorFile <TABLENAME>
   Trying to fix an orphan table by generating a missing table descriptor
   file. This command will have no effect if the table folder is missing
   or if the .tableinfo is present (we don't override existing table descriptors). This command will first check it the TableDescriptor is cached in HBase Master in which case it will recover the .tableinfo accordingly. If TableDescriptor is not cached in master then it will create  a default .tableinfo file with the following items: - the table name - the column family list determined based on the file system - the default properties for both TableDescriptor and ColumnFamilyDescriptors If the .tableinfo file was generated using default parameters then make sure  you check the table / column family properties later (and change them if needed). This method does not change anything in HBase, only writes the new .tableinfo file to the file system. Orphan tables can cause e.g. ServerCrashProcedures to stuck, you might need to fix these still after you generated the missing table info files. replication [OPTIONS] [
      
       ...]  Options: -f, --fix fix any replication issues found. Looks for undeleted replication queues and deletes them if passed the '
      --fix' option. Pass a table name to check for replication barrier and purge if '--fix'. reportMissingRegionsInMeta 
      
       ... To be used when regions missing from hbase:meta but directories are present still in HDFS. Can happen if user has run _hbck1_ '
      |namespace:tablename>OfflineMetaRepair' against an hbase-2.x cluster. This is a CHECK only method, designed for reporting purposes and doesn't perform any
   fixes, providing a view of which regions (if any) would get re-added
   to hbase:meta, grouped by respective table/namespace. To effectively
   re-add regions in meta, run addFsRegionsMissingInMeta.
   This command needs hbase:meta to be online. For each namespace/table
   passed as parameter, it performs a diff between regions available in
   hbase:meta against existing regions dirs on HDFS. Region dirs with no
   matches are printed grouped under its related table name. Tables with
   no missing regions will show a 'no missing regions' message. If no
   namespace or table is specified, it will verify all existing regions.
   It accepts a combination of multiple namespace and tables. Table names
   should include the namespace portion, even for tables in the default
   namespace, otherwise it will assume as a namespace value.
   An example triggering missing regions report for tables 'table_1'
   and 'table_2', under default namespace:
     $ HBCK2 reportMissingRegionsInMeta default:table_1 default:table_2
   An example triggering missing regions report for table 'table_1'
   under default namespace, and for all tables from namespace 'ns1':
     $ HBCK2 reportMissingRegionsInMeta default:table_1 ns1
   Returns list of missing regions for each table passed as parameter, or
   for each table on namespaces specified as parameter.

 setRegionState <ENCODED_REGIONNAME> <STATE>
   Possible region states:
    OFFLINE, OPENING, OPEN, CLOSING, CLOSED, SPLITTING, SPLIT,
    FAILED_OPEN, FAILED_CLOSE, MERGING, MERGED, SPLITTING_NEW,
    MERGING_NEW, ABNORMALLY_CLOSED
   WARNING: This is a very risky option intended for use as last resort.
   Example scenarios include unassigns/assigns that can't move forward because region is in an inconsistent state in 'hbase:meta'. For example, the 'unassigns' command can only proceed if passed a region in one of the following states: SPLITTING|SPLIT|MERGING|OPEN|CLOSING Before manually setting a region state with this command, please certify that this region is not being handled by a running procedure, such as 'assign' or 'split'. You can get a view of running procedures in the hbase shell using the 'list_procedures' command. An example setting region 'de00010733901a05f5a2a3a382e27dd4' to CLOSING: $ HBCK2 setRegionState de00010733901a05f5a2a3a382e27dd4 CLOSING Returns "0" if region state changed and "1" otherwise. setTableState 
       
       
         Possible table states: ENABLED, DISABLED, DISABLING, ENABLING To read current table state, in the hbase shell run: hbase> get '
       
      hbase:meta', '<TABLENAME>', 'table:state' A value of \x08\x00 == ENABLED, \x08\x01 == DISABLED, etc. Can also run a 'describe "<TABLENAME>"' at the shell prompt. An example making table name 'user' ENABLED: $ HBCK2 setTableState users ENABLED Returns whatever the previous table state was. scheduleRecoveries 
      
       ... Schedule ServerCrashProcedure(SCP) for list of RegionServers. Format server name as '
      <HOSTNAME>,<PORT>,<STARTCODE>' (See HBase UI/logs). Example using RegionServer 'A.example.org, 29100154348494, 79': $HBCK2 scheduleRecoveries a.example.org, 29100154348494, 79, Returns the pid (s) of the created ServerCrashProcedure (s) Or-1 if no procedure created (see master logs for why not). Command support added in hbase versions 2.0.3, 2.1.2, 2.2.0 or newer. Unassert 
      
       ... Options: -o,--override override ownership by another procedure A '
      raw' unassign that can be used even during Master initialization (if the -skip flag is specified). Skirts Coprocessors. Pass one or more encoded region names. 1588230740 is the hard-coded name for the hbase:meta region and de00010733901a05f5a2a3a382e27dd4 is an example of what a userspace encoded region name looks like. For example: $ HBCK2 unassign 1588230740 de00010733901a05f5a2a3a382e27dd4 Returns the pid(s) of the created UnassignProcedure(s) or -1 if none. SEE ALSO, org.apache.hbase.hbck1.OfflineMetaRepair, the offline hbase:meta tool. See the HBCK2 README for how to use.Copy the code

So you see the familiar command: assigns, bypass, extraRegionsInMeta, fixMeta. These are the contents of the official documents, written very clearly, you can take a look when you have time. https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2

Hadoop3 data tolerance technology (erudite code) Hadoop Data migration Usage In detail: Flink Real-time computing topN hot list warehouse modeling layered theory article to understand Hive data storage and compression components focus on learning these several

My blog link: Hbase repair tool Hbck