The author | Xu Jingfeng Ali cloud senior development engineer

preface

Dubbo thread pool full is a problem that most Dubbo users have encountered. This article uses Arthas version 3.1.7 as an example to describe how to diagnose this problem using the Dashboard/Thread directives.

Arthas is recommended

  • A:It is recommended that you download the Cloud Toolkit using the IDEA plug-in to use Arthas

Cloud Toolkit is a free local IDE plug-in released by AliYun to help developers develop, test, diagnose and deploy applications more efficiently. Plugins enable one-click deployment of native applications to any server, even the cloud (ECS, EDAS, ACK, ACR, applets, etc.); There are also built-in Arthas diagnostics, Dubbo tools, Terminal terminals, file uploads, function calculations, and MySQL executators. Not only the mainstream version of IntelliJ IDEA, but also Eclipse, Pycharm, Maven and others.

  • Method 2:Direct download

Dubbo thread pool full exception description

To understand the thread pool full exception, you need to understand the Dubbo thread model. .

To summarize the default Dubbo threading model: Every time a Dubbo server receives a Dubbo request, it sends it to a thread pool that has 200 threads by default. If none of the 200 threads are idle, the client will raise the following exception:

Caused by: java.util.concurrent.ExecutionException: org.apache.dubbo.remoting.RemotingException: Server side(192.168.1.101,20880) ThreadPool is Exhausted...Copy the code

The server prints logs of WARN level:

[DUBBO] Thread pool is EXHAUSTED!
Copy the code

The causes of this exception are as follows:

  • The client/server timeout time is set improperly, which leads to infinite waiting for requests and the number of threads is exhausted.
  • The number of client requests is too large, and the server cannot process them in time. The number of threads is exhausted.
  • The server is slow to process requests due to fullGC and other reasons, and the number of threads is exhausted.
  • The server runs out of threads due to database, Redis, and network IO blocking.

The reasons could be many, but at the root of it all, there was a business problem that caused the Dubbo thread pool to run out of resources. Therefore, the first thing to do is: Check service exceptions.

Next, tune Dubbo for your own business scenario:

  • Change the value of the dubo.provider. threads parameter on the Provider end to 200 by default. How much is appropriate? At least 700 isn’t that big; It is not recommended to adjust too small, prone to the above problems;
  • Adjust the dubo.consumer.actives parameter on the Consumer side to control the rate of Consumer calls. This practice is rarely used, just mentioned;
  • Client traffic limiting;
  • Server expansion;
  • Dubbo does not currently support configuring a separate isolated thread pool for a Service to protect the Service, and this feature may be added in a future release.

In addition, Dubbo is not the only one designing the threading model this way. Most service governance frameworks and HTTP servers have the concept of a business thread pool, so they all theoretically have the possibility of a thread pool full exception, and the solution is similar.

So now that we’ve explained everything, what else do we need to check?

Generally, there are many running services online, and these services share the same Dubbo server thread pool. The whole application may be dragged down because of the problem of a service. Therefore, it is necessary to check whether the service is centralized in a certain service, and then check the service logic of this service. You need to locate the thread stack and figure out what’s causing the thread pool to fill up.

My usual practice for addressing this problem is to use Arthas’ dashboard and Thread commands, but before introducing these two commands, we set out to construct an example of a Dubbo thread pool full exception.

The Dubbo thread pool is full exception

Configure the server thread pool size

dubbo.protocol.threads=10
Copy the code

The default size is 200, which makes it difficult to reproduce the exception.

Simulate server blocking

@Service(version = "1.0.0")
public class DemoServiceImpl implements DemoService {

    @Override
    public String sayHello(String name) {
        sleep();
        return "Hello " + name;
    }

    private void sleep() { try { Thread.sleep(5000); } catch (InterruptedException e) { e.printStackTrace(); }}}Copy the code

The sleep method simulates a time-consuming operation, primarily to deplete the server thread pool.

Client multi-threaded access

for (int i = 0; i < 20; i++) {
    new Thread(() -> {
        while (true){
            try {
                Thread.sleep(1000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            try {
                demoService.sayHello("Provider");
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }).start();
}
Copy the code

Problem of repetition

The client

The service side

The problem was replicated, the scene was preserved, and Arthas was used to troubleshoot, assuming we didn’t know sleep’s time consuming logic.

Dashboard Commands

$ dashboard
Copy the code

Execution effect:

You can see the panel shown above, which shows some operating information of the system. Here we focus on the THREAD panel and introduce the meaning of each column:

  • ID: java-level thread ID. Note that this ID cannot correspond to the nativeID in jStack.
  • NAME: indicates the thread NAME.
  • GROUP: the name of the thread GROUP.
  • PRIORITY: indicates the PRIORITY of a thread. The number ranges from 1 to 10. A larger number indicates a higher PRIORITY.
  • STATE: indicates the status of the thread.
  • CPU%: CPU percentage consumed by threads. Sample 100ms, sum up the CPU usage of all threads within 100ms, and then calculate the CPU usage percentage of each thread.
  • TIME: indicates the total running TIME of a thread. The format isMinutes: seconds
  • INTERRUPTED: The current INTERRUPTED bit status of a thread.
  • DAEMON: Whether it is a DAEMON thread.

In the idle state, the thread should be WAITING. Due to sleep, all threads are now in TIME_WAITING state, causing later requests to be processed to throw a full thread pool exception.

In a real-world investigation, a number of Dubbo threads need to be sampled and their thread numbers recorded to see what service requests they are actually processing. Use the following command to filter out Dubbo server threads based on the thread pool name:

dashboard | grep "DubboServerHandler"
Copy the code

Thread Command description

Once dashboard is used to filter out the individual thread ids, its job is done, leaving the rest to the Thread command. In fact, thread module in Dashboard integrates thread command, but Dashboard can also observe memory and GC status and have a more comprehensive perspective. Therefore, I personally suggest that when troubleshooting problems, we should first use Dashboard to review the global information.

Thread Usage examples:

  • View the top N busiest threads
$ thread -n 3
Copy the code

  • Displays information about all threads
$ thread
Copy the code

This parameter is consistent with what is displayed on the dashboard.

  • Displays the thread that is currently blocking another thread
$ thread -b
No most blocking thread found!
Affect(row-cnt:0) cost in 22 ms.
Copy the code

This command remains to be perfect, at present only support to find out the synchronized keyword blocked threads, if is a Java util. Concurrent. The Lock, it is not support.

  • Displays the thread in the specified state
$ thread --state TIMED_WAITING
Copy the code

The thread state can be [RUNNABLE, BLOCKED, WAITING, TIMED_WAITING, NEW, TERMINATED].

  • View the run stack for the specified thread
$ thread 46
Copy the code

Several common usage are introduced. In the actual investigation, we need to do targeted analysis for our site, and also investigate our understanding of thread status. Here are a few common thread states:

  • Initial (NEW)

A new thread object is created, but the start() method has not yet been called.

  • Run (RUNNABLE)

Java threads refer loosely to the ready and running states as “running.”

  • Obstruction (BLOCKED)

The thread blocks the lock.

  • Wait (WAITING)

A thread entering this state needs to wait for other threads to do some specific action (notification or interrupt) :

  1. Object#wait() with no timeout argument
  2. Thread#join() without a timeout argument
  3. LockSupport#park()
  • Timeout waiting (TIMED_WAITING)

This state is different from WAITING in that it can return after a specified time.

  1. Thread#sleep()
  2. Object#wait() with a timeout parameter
  3. Thread#join() with a timeout parameter
  4. LockSupport#parkNanos()
  5. LockSupport# parkUntil () ‘
  • Termination (TERMINATED)

Indicates that the thread has finished executing.

State flow diagram

Problem analysis

There is no general method for analyzing thread pool full exceptions, which requires flexibility. We analyze the following cases one by one:

  • Blocking class problems. For example, when the database is not connected, the thread running should be BLOCKED or TIMED_WAITINGthread --stateLocate the;
  • Busy class problems. For example, CPU intensive operations, running threads are basically RUNNABLE state, can helpthread -nTo locate the busiest threads;
  • GC class problems. There are many external factors that can cause this exception, such as GC being one of them, and you can’t just resort to them herethreadCommand to check;
  • Fixed point blasting. Remember that we used grep to filter out a number of Dubbo threads that could passthread ${thread_id}Directional view the stack, if the statistics are a service to a large number of stack, basic can be concluded that the service is out of the question, as to say all of a sudden surge in the service request quantity, or to rely on the service of a downstream services suddenly out of the question, or the service access database is broken, it according to the stack.

conclusion

This article uses the Dubbo thread pool full exception as a starting point to show how threading problems can be analyzed and how Arthas can be used to quickly diagnose threading problems. Arthas basically eliminates the need for JStack to go round and round the hexadecimal, greatly speeding up diagnostics.

Arthas second essay campaign is in full swing

Arthas is officially running an essay call, the second of which runs from 8th May to 8th June, if you have:

  • Problems identified using Arthas
  • Source code interpretation of Arthas
  • Advise Arthas
  • No limit, other Arthas related content

Welcome to participate in the essay contest and get prizes. Click here for more details.