The main feature of a ForkJoinPool thread pool is a fork join that splits a large task into several smaller tasks for parallel execution, combined with worksteal to improve overall execution efficiency and maximize CPU resources.

I. Application scenarios

ForkJoinPool using partition algorithm, with relatively little threads handle a large number of tasks, will be a big task to a split in two, and so on, each subtask split in half, until it reached the most fine granularity, setting the threshold to stop, and then from the bottom of the task, up a layer of a layer of merger as a result, the simple process below:

ForkJoinPool can be used with a limited number of threads to perform parent-child tasks such as quicksort, binary search, matrix multiplication, linear time selection, and array and set operations.

Here is a simple code example that calculates the sum of all numbers from 1 to 100 million:

package com.javakk; import java.util.concurrent.ForkJoinPool; import java.util.concurrent.RecursiveTask; import java.util.stream.LongStream; /** * public class ForkJoinPoolTest {private static ForkJoinPool ForkJoinPool; /** * The summation task class inherits a RecursiveTask ForkJoinTask. There are three implementations: * RecursiveTask: has a return value * RecursiveAction: has no return value * CountedCompleter: */ Private static Class SumTask extends RecursiveTask<Long> {private Long [] RecursiveTask; private int from; private int to; public SumTask(long[] numbers, int from, int to) { this.numbers = numbers; this.from = from; this.to = to; } @override protected Long compute() {if (to-from < 10) {if (to-from < 10) { Long total = 0; long total = 0; for (int i = from; i <= to; i++) { total += numbers[i]; } return total; Int middle = (from + to) / 2; int middle = (from + to) / 2; SumTask taskLeft = new SumTask(numbers, from, middle); SumTask taskRight = new SumTask(numbers, middle + 1, to); taskLeft.fork(); taskRight.fork(); return taskLeft.join() + taskRight.join(); }}} public static void main(String[] args) {// We can use forkJoinPool. monPool to specify the number of threads in the constructor  new ForkJoinPool(); long[] numbers = LongStream.rangeClosed(1, 100000000).toArray(); // The future returned by the submit method can be called Long result = forkJoinPool. Invoke (new SumTask(numbers, 0, numbers. Length-1)); forkJoinPool.shutdown(); System.out.println(" result: "+result); System. The out. Println (" active threads: "+ forkJoinPool. GetActiveThreadCount ()); System. The out. Println (" steal task number: "+ forkJoinPool. GetStealCount ()); }}Copy the code

Output (the number of active threads and stolen tasks varies depending on the local environment and task execution) :

End Result: 5000000050000000 Number of active threads: 4 Number of stolen tasks: 12Copy the code

In this example, the minimum size of a compute partition is 10 elements. If you change the size of the compute partition to a different value, you can see that the performance of the compute partition varies greatly.

The internal queue of a ForkJoinPool guarantees the order in which tasks are executed, and why it is able to perform a very large number of tasks with a limited number of threads will be explained later.

Differences from ThreadPoolExecutor native thread pools

ForkJoinPool and ThreadPoolExecutor both implement Executor and ExecutorService interfaces, and can use constructors to set the number of threads, threadFactory, Can view ForkJoinPool. MakeCommonPool () method of the source code to check the general details the structure of the thread pool.

Internally, I think the biggest difference between the two thread pools is in the design of the work queue, as shown below

ThreadPoolExecutor:

ForkJoinPool:

The details are sketchy, but you can tell the difference:

  • ForkJoinPool has its own queue for each thread
  • ThreadPoolExecutor shares a queue

As you can see from the code example above, ForkJoinPool can be used to complete a very large number of parent-child tasks with a limited number of threads, such as more than 20 million tasks with four threads. But using ThreadPoolExecutor is not possible because threads in ThreadPoolExecutor do not have the ability to preferentially execute subtasks. To complete 20 million parent-child tasks, you would need 20 million threads. This can cause the ThreadPoolExecutor task queue to be full or the maximum number of threads created to overflow memory.

ForkJoinPool is best suited for computation-intensive tasks and is best suited for non-blocking tasks. Different scenarios and considerations for using thread pools were discussed in an earlier article on Thread pools in the Java Pit Log series.

ForkJoinPool, therefore, is a complement to ThreadPoolExecutor thread pools, enhancing computation-intensive scenarios.

Three. The implementation principle of job theft

The code example in the first section shows that there are 4 active threads that have completed 20 million subtasks and 12 stolen tasks (the number of stolen tasks depends on the level of split and computational complexity). This is what work steal does.

The ForkJoinPool WorkQueue is the queue that implements work theft. Javadoc notes the following:

Most operations occur in the work steal queue (nested class work queue). These are special forms of Deques with push, pop, poll operations.

A Deque is a double ended queue. Any end of the head or tail can be inserted, deleted, or retrieved, that is, it supports BOTH FIFO(queue) and LIFO(stack) order.

The most common implementation of the Deque interface is LinkedList, in addition to ArrayDeque, ConcurrentLinkedDeque, etc

The job theft mode is divided into the following steps:

  1. Each thread has its own two-endian queue
  2. When the fork method is called, tasks are placed at the head of the queue, and threads process tasks in the queue in LIFO order using push/ POP
  3. If the task in the queue is finished, the poll method is used to steal the task from the end of the queue maintained by other threads to make full use of CPU resources
  4. Stealing from the tail reduces contention with the original thread
  5. When the last task is left in the queue, the cas resolves the contention between the original thread and the steal thread

The process is as follows:

Work stealing is one of the advantages of ForkJoinPool thread pools. In common thread pools such as ThreadPoolExecutor, if a thread is executing a task that cannot continue for some reason, the thread is in a waiting state. FixedThreadPool, cachedThreadPool.

In ForkJoinPool, threads actively look for other tasks that have not yet been performed and steal them, reducing thread wait time.

JDK8 in parallel flows (parallelStream) functions are implemented based on ForkJoinPool, another java.util.concurrent.Com pletableFuture asynchronous callback the future, The internal thread pool is also ForkJoinPool.

Source: javakk.com/215.html