• I. Theoretical analysis
  • Two, practical application
  • To speed up the process, we split the problem into several concurrent tasks. You also create a thread pool and delegate tasks to threads in the pool so that they can execute concurrently. Under the condition of the high concurrency using thread pool, can effectively reduce the thread creation to release time overhead costs and resources, such as do not use the thread pool, may lead to system to create a large number of threads and run out of system memory and switch “excessive” (in the JVM USES the processing mechanism for time slice rotation, to reduce the mutual switch between the threads).

    The big problem is that we want to create as many tasks as possible, but we can’t create too many threads because of resource constraints. So how do we choose the optimal number of threads in the case of high concurrency? What is the selection principle?

    I. Theoretical analysis

    There are two ways to count concurrent threads.

    The first kind of

    For computationally intensive tasks, a system with Ncpu processors typically achieves optimal utilization by using a thread pool of Ncpu + 1 threads (computationally intensive threads that happen to be suspended at some point due to a page error or other reason, and happen to have an “extra” thread, You can ensure that the CPU cycle is not interrupted in this case.

    For tasks that involve I/O and other blocking operations, not all threads are scheduled all the time, so you need a larger pool. To set the length of the thread pool correctly, you must estimate the ratio of the time the task spends waiting to the time it spends calculating; This estimate does not have to be exact, and can be obtained with several monitoring tools. Alternatively, you can adjust the size of the thread pool by running your application with several thread pools of different sizes under a base load and observing the level of CPU utilization.

    Given the following definition:

    Ncpu = Number of cpus Ucpu = Target CPU usage, 0 <= Ucpu <= 1 W/C = ratio of waiting time to computing time To keep the CPU usage to the desired level, the optimal pool size is equal to:  Nthreads = Ncpu x Ucpu x (1 + W/C)Copy the code

    You can use Runtime to get the number of cpus:

    int N_CPUS = Runtime.getRuntime().availableProcessors();Copy the code

    Of course, CPU cycles aren’t the only resource you can manage with thread pools. Other resources that can constrain the size of the resource pool include memory, file handles, socket handles, and database connections. Calculating the size constraints for these types of resource pools is simple: first add up the total number of resources required by each task, then divide by the total amount available. The result is the upper limit of the pool size.

    When tasks need to use pooled resources, such as database connections, the length of the thread pool and the length of the resource pool interact. If each task requires a database connection, the size of the connection pool limits the effective size of the thread pool; Similarly, when tasks in a thread pool are the only consumers of the pool, the size of the thread pool limits the effective size of the pool.

    As mentioned above in Java Concurrency in Practice, there is a formula for estimating the size of a thread pool:

    Nthreads = Ncpu x Ucpu x (1 + W/C), where Ncpu = Number of CPU cores Ucpu = CPU usage and 0 to 1 W/C = ratio of waiting time to computing timeCopy the code

    Programming Concurrency on the JVM Mastering, Section 2.1, page 12

    To solve this problem, we want to create at least as many threads as the number of processor cores. This ensures that as many processor cores as possible are available for problem solving. We can easily get the number of processor cores available to the system by using the following code:

    Runtime.getRuntime().availableProcessors();Copy the code

    Therefore, the minimum number of threads for an application should be equal to the number of processor cores available. If all tasks are computationally intensive, create the number of cores available to the processor and multiple threads will suffice. In this case, creating more threads is detrimental to program performance. Because when there are more than one service in the ready state, the processor core needs to switch the context frequently between threads, and this switch has a large loss of program performance. But if the tasks are IO intensive, then we need to open more threads to improve performance.

    When a task performs an IO operation, its thread is blocked, and the processor can immediately do a context switch to handle other ready threads. If we only have as many threads as the number of cores available to the processor, even the pending tasks cannot be processed because we have no more threads for the processor to schedule.

    If the task is blocked 50% of the time, the program needs twice as many threads as the number of cores available to the processor. If tasks are blocked for less than 50% of the time, i.e. they are computationally intensive, the number of threads required by the program will decrease, but should not be less than the number of cores of the processor. If the task is blocked for longer than the execution time, i.e. the task is IO intensive, we need to create a number of threads several times larger than the number of processor cores. We can calculate the total number of threads required by the program, summarized as follows:

    • Threads = number of available CPU cores /(1 – blocking coefficient), where the blocking coefficient is between 0 and 1.
    • The blocking factor for computationally intensive tasks is 0, while the blocking factor for IO intensive tasks is close to 1. A completely blocked task is doomed to fail, so we don’t need to worry about a blocking factor of 1.

    To better determine how many threads your program needs, we need to know the following two key parameters:

    • Number of cores available to the processor;
    • The blocking coefficient of the task;

    The first parameter is easy to determine and can even be looked up at run time using the previous method. But determining the blocking factor is a little more difficult. We can try to guess, or use some performance analysis tool or java.lang.management API to determine the ratio of time threads spend on system IO operations to CPU intensive tasks. As mentioned above, in Programming Concurrency on the JVM Mastering, there is a formula for estimating the thread pool size:

    Number of threads = Ncpu/(1 – blocking factor)

    For statement 1, assume that the CPU is running at 100%, i.e. the number of threads = Ncpu x (1 + W/C), regardless of CPU usage.

    Now assume that the formula of method 2 is equal to the formula of method 1, that is, Ncpu/(1-blocking coefficient) = Ncpu x (1 + W/C), and derive: Blocking coefficient = W/(W + C), that is, blocking coefficient = blocking time/(blocking time + calculation time), this conclusion is verified in the follow-up of Method 2, as follows:

    Because requests to Web services spend most of their time waiting for the server to respond, the blocking factor can be quite high, so the program may need to open several times the number of processor cores. Assuming a blocking factor of 0.9, which means that each task is blocked 90% of the time and active only 10% of the time, we would need to open 20 threads on a dual-core processor (using the formula in Section 2.1, p. If we have a lot of stocks to process, we can run up to 80 threads on an 8-core processor to handle the task.

    So statement one and statement two are actually a formula.

    Two, practical application

    So how to set the actual number of concurrent threads in use? Let’s start with a topic:

    Assume that a system is required to have a TPS (Transaction Per Second or Task Per Second) of at least 20, then assume that each Transaction is completed by one thread, continuing with the assumption that the average Transaction time Per thread is 4s. So the question becomes:

    How to size the thread pool so that 20 transactions can be processed in 1s?

    The calculation is simple, each thread has 0.25TPS of processing power, so to get to 20TPS, you obviously need 20/0.25=80 threads.

    This works in theory, but in practice, the fastest part of a system is the CPU, so it is the CPU that determines the throughput limit of a system. The CPU processing capability is enhanced to increase the upper limit of system throughput. CPU throughput needs to be added to this consideration.

    The analysis is as follows (we take statement 1 as an example) :

    Nthreads = Ncpu x (1 + W/C)

    That is, the higher the proportion of thread waiting time, the more threads are needed. The higher the percentage of thread CPU time, the fewer threads are required. This can be divided into two types of tasks:

    IO intensive In general, if there is IO, so sure W/C > 1 (blocking time are generally time consuming a lot of times), but need to consider the system memory is limited (each open a thread requires memory space), there needs to be on the server specific how many threads for testing (CPU accounted for, the number of threads, the total time and memory consumption). Nthreads = Ncpu x (1 + 1) = 2Ncpu This setup is usually OK.

    Computationally intensive assuming no wait W = 0, then W/C = 0. Nthreads = Ncpu.

    Due to the short board effect, the true system throughput cannot be calculated in terms of CPU alone. In order to improve system throughput, we need to start from the “system weaknesses” (such as network latency, IO) :

    • Try to improve the parallelization ratio of short board operations, such as multi-threaded download technology;
    • Strengthen weak areas, such as NIO instead of IO;

    The first law can be related to Amdahl’s law, which defines the formula for calculating the acceleration ratio after serial system parallelization: the acceleration ratio = the system time before optimization/the system time acceleration ratio after optimization is larger, indicating that the optimization effect of system parallelization is better. Addahl’s law also gives the relationship between system parallelism, number of cpus and speed ratio, which is Speedup, system serialization ratio (refers to the ratio of serial code execution) is F, number of cpus is N: Speedup <= 1 / (F + (1-f)/N)

    When N is large enough, the smaller the serialization ratio F, the greater the acceleration ratio Speedup.

    This raises the question of whether thread pools are necessarily more efficient than threads.

    The answer is no. For example, Redis is single-threaded, but it is very efficient, with basic operations reaching 100,000 orders per second. From a threading perspective, this is partly because:

    • Multithreading incurs the overhead of thread context switching, while single threading does not.
    • The lock;

    Of course, the more fundamental reason for “Redis fast” is that:

    Redis is all about memory operations, in which case a single thread can use the CPU very efficiently. Multithreading is generally applicable to the following scenarios: there is a considerable proportion of IO and network operations.

    In general, the use of multi-thread/single-thread strategy varies according to the application situation; In the case of thread pools, different estimates have the same purpose and starting point.

    The conclusion is as follows:

    IO intensive = 2Ncpu (you can control the size after testing, 2Ncpu is generally ok) (common in threads: database data interaction, file upload and download, network data transfer, etc.)

    Computation-intensive = Ncpu (common in threads: complex algorithms)

    Of course, there is another way of saying that:

    For computationally intensive tasks, a system with Ncpu processors typically achieves optimal utilization by using a thread pool of Ncpu + 1 threads (computationally intensive threads that happen to be suspended at some point due to a page error or other reason, and happen to have an “extra” thread, You can ensure that the CPU cycle is not interrupted in this case.

    That is, computationally intensive = Ncpu + 1, but whether the additional CPU context switch is worth it is not considered here. Readers can decide for themselves.

    On the interview I also compiled through some channels need to be big real interview mainly include: the ant gold suits, a lot of spelling, ali cloud, baidu, Vipshop, ctrip, feng nest technology, music, soft power, OPPO, silver letter early payment, China’s ping an, intermediate and senior Java interview questions, with more detailed answer, hope can help to you.

    There are also specific interview questions for JVM, SPringBoot, SpringCloud, database, Linux, caching, message-oriented middleware, source code, etc.

    Data collection Procedure

    Add the small assistant VX: Xuanwo008