1 introduction

Visit pumpkin Talk www.pkslow.com for more exciting articles!

Springboot-Cloud

I wrote an article on Spring Batch Starting by Example, but you have to understand parallel processing to make it work.

2 Four modes

For the most part, the single-threaded, single-process Spring Batch already works for your needs. Before we move on to more complex features, let’s see if Simple works, Keep it Simple and Stupid.

However, Spring Batch also provides a variety of options when you do want to use parallel processing. As a whole, it can be divided into two categories:

  • (1) Single process, multithreading
  • (2) Multi-process

The breakdown is as follows:

  • (1) multithreadedStep(Single process)
  • (2) ParallelSteps(Single process)
  • (3) Remote partitioning (multi-process)
  • (4) Remote partition (multi-process)

It’s hard to understand the differences between them just from their names, so let’s go through them.

2.1 Multi-threaded Step

By providing a TaskExecutor to execute, and TaskExecutor can be customized, we can provide a TaskExecutor based on thread pool to implement multi-threading.

@Bean
public TaskExecutor taskExecutor(a){
  return new SimpleAsyncTaskExecutor("spring_batch");
}

@Bean
public Step sampleStep(TaskExecutor taskExecutor) {
  return this.stepBuilderFactory.get("sampleStep")
    .<String, String>chunk(10)
    .reader(itemReader())
    .writer(itemWriter())
    .taskExecutor(taskExecutor)
    .build();
}
Copy the code

2.2 the parallel Steps

This one looks like the first one, but it’s totally different. It works in parallel between steps. Therefore, jobs can be decomposed into independent steps instead of being processed step by step so that they can be executed in parallel. For example, in the following example, step1 and step2 are combined to form a flow1, and step3 is flow2. Flow1 and flow2 can be processed in parallel.

The code is as follows:

@Bean
public Job job(a) {
  return jobBuilderFactory.get("job")
    .start(splitFlow())
    .next(step4())
    .build()        //builds FlowJobBuilder instance
    .build();       //builds Job instance
}

@Bean
public Flow splitFlow(a) {
  return new FlowBuilder<SimpleFlow>("splitFlow")
    .split(taskExecutor())
    .add(flow1(), flow2())
    .build();
}

@Bean
public Flow flow1(a) {
  return new FlowBuilder<SimpleFlow>("flow1")
    .start(step1())
    .next(step2())
    .build();
}

@Bean
public Flow flow2(a) {
  return new FlowBuilder<SimpleFlow>("flow2")
    .start(step3())
    .build();
}

@Bean
public TaskExecutor taskExecutor(a){
  return new SimpleAsyncTaskExecutor("spring_batch");
}
Copy the code

2.3 Remote Partitioning

In this mode, a step is split into several Java processes, and communication between the main program and the partitioned execution program is carried out through middleware. As shown below:

As can be seen from the figure above, there is only one process Manager for reading, while there can be multiple processes Worker, so it is suitable for scenarios that are easy to read but difficult to process.

2.4 Remote Partition

Remote partitioning is often confused with remote partitioning, but they are not the same thing. The remote partitioning described above is read by one process and processed by multiple processes. While remote partition is multiple processes read, multiple processes process, multiple processes write:

Therefore, remote partitioning is suitable for systems prone to IO bottlenecks, as it splits read and write into multiple worker processes. Remote partitions may or may not use middleware, such as message queues. It does the partitioning through the PartitionHandler, and the Partitioner defines how to partition.

3 summary

This paper introduces four modes of Spring Batch parallel processing, which are multi-threaded Step, parallel Steps, remote partitioning, and remote partitioning. The first two are relatively simple, with code examples; The latter two are much more complex, especially the remote partitioning mode, which can greatly improve the efficiency of the entire process by separating the IO and business processing pressures. We’ll talk more about remote partitioning later.


Welcome to pay attention to the wechat public number “Pumpkin slow Talk”, will continue to update for you…

Read more and share more; Write more. Organize more.