Elasticsearch tutorial live replay

1. Source of actual combat problems

Question 1: Does the GET /_nodes/hot_threads API explain the result of the request? Return a stack of stacks to read……

Fault 2: CPU of only one machine in the ES cluster explodes, but I/O and HEAP_MEM are normal. Zha do? I looked up hot_thread and said “SOS.

Dead hit Elasticsearch knowledge planet wechat group

Hence the article.

Hot_threads what does hot_threads do? Can eat?

In actual service scenarios, when the cluster response is slower than usual and CPU usage is high, we need to troubleshoot the problem and find the root cause to restore the “silky smooth” cluster.

Elasticsearch provides the ability to monitor hotlines so that you can understand problems.

In Java, hot Threads are threads that consume a lot of CPU and take a long time to execute.

The most common API used to troubleshoot these problems is the HOT_Threads API.

GET /_nodes/hot_threads

GET /_nodes/<node_id>/hot_threads
Copy the code

The Hot Threads API returns information from the CPU side about which parts of ElasticSearch code are Hot or where the current cluster is stuck for some reason.

3. List of parameters supported by HOT_threads

  • ignore_idle_threads

(Optional, Boolean value)

If true, known idle threads are filtered out (for example, waiting in socket selection, or fetching tasks from an empty queue).

The default is true.

  • interval

(Optional, time unit) Sampling interval for executing hotspot threads.

The default is 500 milliseconds.

  • snapshots

(Optional, integer) It is the number of stack traces (nested sequences of method calls at a particular point in time) to get.

The default value is 10.

  • threads

(Optional, integer) To view information determined by the type parameter, ElasticSearch will take a specified number of the most “hot” threads.

The most “hot” threads are often where our problems lie.

The default value is 3. That is, return the TOP 3 hot threads.

  • master_timeout

(Optional, time unit) Specifies the time period for waiting to connect to the primary node.

If no response is received before the timeout expires, the request fails with an error.

The default value is 30 seconds.

  • timeout

(Optional, time unit) Specifies the time period for waiting for a response.

If no response is received before the timeout expires, the request fails with an error.

The default value is 30 seconds.

  • type

(Optional, string) Type to sample.

The options available are:

1) Block — the length of time a thread is blocked.

2) CPU — Threads occupy CPU time.

3) Wait — the amount of time a thread spends in the wait state.

If you want to learn more about thread status, see:


The default value is CPU.

4, HOT_threads actual combat example

With the parameters, let’s do it. The following command will tell ElasticSearch to check threads in WAITING at one-second intervals.

GET /_nodes/hot_threads?type=wait&interval=1s
Copy the code

Hot_threads API principle

Unlike other apis that return JSON results, the Hot Threads API returns formatted text, which you can distinguish between several sections. This is also the reason why the article said at the beginning of “return a pile of stacks of unreadable”.

Before looking at the returned stack result information, let’s take a look at some of the logic behind the Hot Threads API.

ElasticSearch receives all running threads and collects information about how much CPU time each thread is spending, how many times a particular thread is blocked or waiting, how long it has been blocked or waiting, etc.

ElasticSearch then waits for a specified interval (specified by the interval parameter) to collect the same information again and sort the hot threads by running time (in descending order).

Note that the above times are counted for the given type of operation specified by the type parameter.

The first N threads are then analyzed by ElasticSearch (where N is the number of threads specified by the thread parameter Threads).

All ElasticSearch does is take a snapshot of the thread stack trace every few milliseconds (the number of snapshots is specified by the snapshot parameter snapshot).

Finally: Stack traces are grouped to visualize thread state changes, which we see as the result information returned by the execution API.

Hot_threads API parameters are connected to hot_threads API parameters. You will have a general understanding of hot_threads.

What if you don’t understand the result? Don’t worry, the following interpretation.

Hot_threads API returns a result

Now, finally, it’s time for the hot_Threads APi to return results.

It is recommended to enlarge the image.

6.1 The first part of the response

Contains basic node information.

As follows:

 {Data-(110.188)- 1} {67A1DwgCR_eM5eFS- 6MR1Q}{qTPWEpF-Q4GTZIlWr3qUqA}{10.6110.188.} {10.6110.188.:9301}{dil}
Copy the code

You can see the node where the Elasticsearch hot thread resides, which is handy when the hotline API calls involve multiple nodes.

6.2 Part II of the Response

The next few lines can be divided into several subsections.

6.2.1 Disassemble the initial part

78.4% (391.7ms out of 500ms) cpu usage by thread 'elasticsearch [Data - (110.188) - 1] [search] [T# 38]'
Copy the code
  • [search] — represents the search thread operation.
  • 78.4% – represents 78.4% of all CPU time for the thread named Search to complete the statistics.
  • CPU Usage – Indicates the type of CPU we are using, currently thread CPU usage.
  • Block Usage – Block usage of the thread in the blocked state.
  • Wait Usage – Wait usage of threads in the wait state.

Note: The thread name is very important here, and because of it, we can guess which features of ElasticSearch are causing problems.

In the example above, we can initially conclude that the search thread is taking up a lot of CPU.

In practice, there are other threads besides search, listed as follows:

  • Recovery_stream — used to recover module events
  • Cache – Used to cache events
  • Merge — For segment merge threads
  • Index – used for data indexing (writing) threads, etc.

6.2.2 Disassembly of the second sub-part

The next section of the Hot Threads API response begins with the following message:

5/10 snapshots sharing following 35 elements
Copy the code

As shown above, previous thread information is accompanied by stack trace information.

In our example,

  • 5/10 – Indicates that five snapshots taken have the same stack trace information.

This in most cases means that half of the checking time is spent in the same part of ElasticSearch code for the current thread.

7, summary

The hot_thread API or top JStack is used to locate the thread stack for Elasticsearch.

This article provides a detailed explanation of the hot_thread API application scenarios, usage, and return results.

Feel free to comment on your understanding of hot threads or your practical experience.

If you have a similar problem at the beginning, check the official document is not clear, you are also welcome to leave a message, according to the number of likes, we will write an article specifically comb.

Hit Elasticsearch with you!


The Mastering Elasticsearch”

Elasticsearch 7.0 Bookbook



Blockbuster | into Elasticsearch methodology cognitive listing (National Day update edition in 2020)

You can pass the Elastic certification exam with a driver’s license!

Learn more in less time, faster!

Nearly 50% + Elastic certified engineers in China are here!

Play Elasticsearch with 800+ Elastic fans around the world!

Add wechat: Elastic6, ask for a planet ticket worth 18 yuan