Why monitoring?
In order to ensure the stability, reliability, operation and maintenance of the system.
- Control the core performance indicators of the cluster and understand the performance of the cluster.
- When the cluster has a problem, the alarm is timely, so that the operation and maintenance students can repair the problem in time.
- You can nip the problem in the bud by warning when a cluster critical indicator value is abnormal, rather than waiting for the cluster to become truly unavailable before taking action.
- When problems occur in the cluster, the monitoring system can help us locate and resolve problems more quickly
How to Build an HBase cluster monitoring system?
The company has its own monitoring system. All we need to do is to send the indicators we care about in HBase to the monitoring system. The problem is that we develop, collect and return the HBase cluster monitoring indicators.
HBase cluster monitoring indicators
Monitoring data is collected in the following aspects: Data on the operating system (OS) of a server, such as CPU, memory, disk, network, load, and network traffic. The state of the JVM on a RegionServer (or Master) machine, such as information about threads, number and time of GC, memory usage, and number of ERROR, WARN, Fatal events; Statistics in the RegionServer (or Master) process.
You can obtain the JMX information provided by HBase from the following address
http://your_master:60010/jmx // All beansCopy the code
JMX Web page data format is JSON format, a lot of information!
OS Monitoring data
OperatingSystem monitoring data in HBase is mainly performed by OperatingSystem objects. The following is the extracted JSON information.
{
"name" : "java.lang:type=OperatingSystem".
"modelerType" : "com.sun.management.UnixOperatingSystem".
"MaxFileDescriptorCount" : 1000000.
"OpenFileDescriptorCount" : 413.
"CommittedVirtualMemorySize" : 1892225024.
"FreePhysicalMemorySize" : 284946432.
"FreeSwapSpaceSize" : 535703552.
"ProcessCpuLoad" : 0.0016732901066722444.
"ProcessCpuTime" : 59306210000000.
"SystemCpuLoad" : 0.018197029910060655.
"TotalPhysicalMemorySize" : 16660848640.
"TotalSwapSpaceSize" : 536862720.
"AvailableProcessors" : 8.
"Arch" : "amd64".
"SystemLoadAverage" : 0.0.
"Name" : "Linux".
"Version" : "2.6.32-431.11.7. El6. Ucloud. X86_64".
"ObjectName" : "java.lang:type=OperatingSystem"
}
Copy the code
One of the more important indicators have OpenFileDescriptorCount FreePhysicalMemorySize, ProcessCpuLoad, SystemCpuLoad, AvailableProcessors, SystemLoadAver age
JVM Monitoring data
JVM monitoring data in Hbase is mainly based on JvmMetrics. The following is the JSON information extracted by me.
{
"name" : "Hadoop:service=HBase,name=JvmMetrics".
"modelerType" : "JvmMetrics".
"tag.Context" : "jvm".
"tag.ProcessName" : "Master".
"tag.SessionId" : "".
"tag.Hostname" : "uhadoop-qrljqo-master2".
"MemNonHeapUsedM" : 53.846107.
"MemNonHeapCommittedM" : 85.84375.
"MemNonHeapMaxM" : 130.0.
"MemHeapUsedM" : 79.05823.
"MemHeapCommittedM" : 240.125.
"MemHeapMaxM" : 989.875.
"MemMaxM" : 989.875.
"GcCountParNew" : 15190.
"GcTimeMillisParNew" : 72300.
"GcCountConcurrentMarkSweep" : 2.
"GcTimeMillisConcurrentMarkSweep" : 319.
"GcCount" : 15192.
"GcTimeMillis" : 72619.
"ThreadsNew" : 0.
"ThreadsRunnable" : 21.
"ThreadsBlocked" : 0.
"ThreadsWaiting" : 144.
"ThreadsTimedWaiting" : 18.
"ThreadsTerminated" : 0.
"LogFatal" : 0.
"LogError" : 0.
"LogWarn" : 0.
"LogInfo" : 0
}
Copy the code
JvmMetrics mainly calculates the following information: memory usage; GC statistics; Thread statistics; And event statistics.
Memory statistics are the size of NonHeapMemory currently in use by the JVM and the size of NonHeapMemory configured. The size of HeapMemory that the JVM is currently using and the size of configured HeapMemory; The maximum size of memory that can be used by the JVM at runtime.
GC statistics are relatively simple and only count the number of GC times and total time spent by a process in a fixed interval.
The statistics on threads are based on the number of threads in NEW, RUNNABLE, BLOCKED, WAITING, TIMED_WAITING, and TERMINATED states.
Statistics of events are mainly about the numbers of Fatal, Error, Warn, and Info in a fixed time interval. (This one doesn’t seem important.)
Region Servers health
You can also use the following address:
http://your_master:60010/jmx?qry=Hadoop:service=HBase,name=Master,sub=Server
Copy the code
Region Servers health value obtained:
{
"name" : "Hadoop:service=HBase,name=Master,sub=Server".
"modelerType" : "Master,sub=Server".
"tag.liveRegionServers" : "xxx".
"tag.deadRegionServers" : "".
"tag.zookeeperQuorum" : "xxx".
"tag.serverName" : "60000149683102 13 xxx2,".
"tag.clusterId" : "e5e044a3-ef9f-48f7-ba63-637376f5fa90".
"tag.isActiveMaster" : "true".
"tag.Context" : "master".
"tag.Hostname" : "xxx".
"masterActiveTime" : 1495683312239.
"masterStartTime" : 1495683310213.
"averageLoad" : 143.66666666666666.
"numRegionServers" : 3.
"numDeadRegionServers" : 0.
"clusterRequests" : 1297834323
}
Copy the code
MemoryPool
There are many MemoryPool values that you can see in all the JSON values, such as Par Eden Space, CMS Perm Gen, Par Survivor Space, CMS Old Gen, and Code Cache, on demand.
conclusion
The monitoring system of any service is a process of continuous iteration and optimization, and it is impossible to achieve the best at the beginning. Monitoring is always earlier than the problem, and every time there is a problem, and further strengthen the corresponding aspects of monitoring, we need to let the monitoring system from the problem of the alarm to the possible problems of the warning gradually transition, and finally let the monitoring system become a powerful tool to ensure the stability of the system.
The last
There are many monitoring indicators, but please obtain on demand! Reprint the article please indicate the original source, thank you for your support! http://www.54tianzhisheng.cn/2017/10/21/HBase-metrics/
The resources
1. Hbase Performance Monitoring (I)
2. Hbase Performance Monitoring (2)
3. Hbase Performance Monitoring (3)
4. HBase cluster monitoring system construction
5. Hbase JMX common monitoring indicators
Recommend related articles
1. ElasticSearch Monitors a single node
2. ElasticSearch cluster monitoring