This blog post is based on Spring Batch’s 3.0.8 release.

preface

In large enterprise applications, there are more or less a number of tasks that need to be handled, such as bulk email notifications to all expiring members. In the process of batch processing, you need to pay attention to many details, such as task exceptions, performance bottlenecks and so on. Well, it’s better to use a good frame than reinvent the wheel ourselves.

The cloud platform department of the Internet of Things I work for has such a demand, which needs to realize batch delivery of commands to millions of devices. To avoid boredom, let’s briefly implement this functionality through the Spring Batch framework, and then walk through the framework in detail!


A profound

Demo code: github.com/wudashan/sp…

Introduction of depend on

First we need to introduce a dependency on Spring Batch by adding the following code to the POM.xml file:

<dependency> <groupId>org.springframework.batch</groupId> <artifactId>spring-batch-core</artifactId> < version > 3.0.8. RELEASE < / version > < / dependency >Copy the code

Load the beans

Second, we need to create the applicationContext. XML file in the Resources directory to automatically inject the required classes:

<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd"> <! TransactionManager --> <bean id="transactionManager" class="org.springframework.batch.support.transaction.ResourcelessTransactionManager"/> <! -- Task repository --> < repository ="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean"> <property name="transactionManager" ref="transactionManager"/> </bean> <! Task - loader - > < bean id = "jobLauncher" class = "org. Springframework. Batch. Core. Launch. Support. SimpleJobLauncher" > < property name="jobRepository" ref="jobRepository"/> </bean> </beans>Copy the code

With the transactionManager, jobRepository, and jobLauncher declared above, we can now execute batch tasks! However, we also need to create a task. In the Spring Batch framework, a Job consists of one or more steps, which in turn consist of Reader, Processor, and Writer.

Create the Reader

Batch-data. CSV file is created in the resources directory and contains the following contents:

1,PENDING
2,PENDING
3,PENDING
4,PENDING
5,PENDING
6,PENDING
7,PENDING
8,PENDING
9,PENDING
10,PENDING
Copy the code

Quite simply, the first column represents the id of the command and the second column represents the current status of the command. That is, there are now 10 cached commands that need to be issued to the device.

Reads require an ItemReader

interface, and the framework provides a ready-made implementation class, FlatFileItemReader. To use this class, you need to set Resource and LineMapper. Resource represents the data source, our batch-data.csv file; LineMapper shows how to convert each line of the file into the corresponding DTO object.

Create a DTO object

Since our data source is command data, we need to create a DeviceCommand. Java file with the following code:

public class DeviceCommand { private String id; private String status; public String getId() { return id; } public void setId(String id) { this.id = id; } public String getStatus() { return status; } public void setStatus(String status) { this.status = status; }}Copy the code

Custom LineMapper

We need to implement a LineMapper implementation class that converts each line of batch-data.csv into a DeviceCommand object that is easy for the program to process.

public class HelloLineMapper implements LineMapper<DeviceCommand> { @Override public DeviceCommand mapLine(String line, String[] args = line.split(","); int lineNumber) throws Exception {// comma split each line of data. String[] args = line.split(","); // Create DeviceCommand object DeviceCommand = new DeviceCommand(); // Set the id value to the object deviceCommand. SetId (args[0]); // Set the status value to the object deviceCommand. SetStatus (args[1]); // Return deviceCommand; }}Copy the code

Create a Processor

After reading the data, we need to process the data. Since we read the command to be delivered from the file earlier, this is the best time to deliver the command to the device. HelloItemProcessor

interface is required for processing operations. We can implement HelloItemProcessor. Java as follows:
,>

public class HelloItemProcessor implements ItemProcessor<DeviceCommand, DeviceCommand> {@override public DeviceCommand Process (DeviceCommand DeviceCommand) throws Exception {// Simulates sending a command to a device System.out.println("send command to device, id=" + deviceCommand.getId()); // Update command statusdevicecommand. SetStatus ("SENT"); // Return deviceCommand; }}Copy the code

Create the Writer

After processing the data, we need to update the command status to the file to record that we have issued it. Similar to reading files, we need to implement the ItemWriter

interface, and the framework provides a ready-made implementation class, FlatFileItemWriter. Using this class requires setting up Resource and LineAggregator. Resource represents the data source, our batch-data.csv file; The LineAggregator shows how a DTO object is converted into a string and saved to each line of the file.

Custom LineAggregator

We need to implement a LineAggregator implementation class of our own that converts the DeviceCommand object into a string and saves it in batch-data.csv.

public class HelloLineAggregator implements LineAggregator<DeviceCommand> { @Override public String aggregate(DeviceCommand deviceCommand) { StringBuffer sb = new StringBuffer(); sb.append(deviceCommand.getId()); sb.append(","); sb.append(deviceCommand.getStatus()); return sb.toString(); }}Copy the code

The main program

Then, what is done is done, but the east wind! Next we in the Main program main. Java to achieve our batch command distribution function! The code is as follows:

Public class Main {public static void Main (String[] args) throws Exception {// Load context String[] configLocations = {"applicationContext.xml"}; ApplicationContext applicationContext = new ClassPathXmlApplicationContext(configLocations); / / to get the task starter JobLauncher JobLauncher = applicationContext. GetBean (JobLauncher. Class); JobRepository jobRepository = applicationContext.getBean(JobRepository.class); PlatformTransactionManager transactionManager = applicationContext.getBean(PlatformTransactionManager.class); FlatFileItemReader<DeviceCommand> FlatFileItemReader = new FlatFileItemReader<>(); flatFileItemReader.setResource(new FileSystemResource("src/main/resources/batch-data.csv")); flatFileItemReader.setLineMapper(new HelloLineMapper()); // Create processor HelloItemProcessor HelloItemProcessor = new HelloItemProcessor(); FlatFileItemWriter<DeviceCommand> FlatFileItemWriter = new FlatFileItemWriter<>(); flatFileItemWriter.setResource(new FileSystemResource("src/main/resources/batch-data.csv")); flatFileItemWriter.setLineAggregator(new HelloLineAggregator()); // Create Step StepBuilderFactory StepBuilderFactory = new StepBuilderFactory(jobRepository, transactionManager); Step step = stepBuilderFactory.get("step") .<DeviceCommand, DeviceCommand>chunk(1). Reader (flatFileItemReader) // Read operation. Processor (helloItemProcessor) // Process operation .writer(flatFileItemWriter) // Write operation.build (); // Create Job JobBuilderFactory JobBuilderFactory = new JobBuilderFactory(jobRepository); Job job = jobBuilderFactory.get("job") .start(step) .build(); Joblauncher.run (job, new JobParameters()); }}Copy the code

After executing the main method, the screen will output the following:

send command to device, id=1
send command to device, id=2
send command to device, id=3
send command to device, id=4
send command to device, id=5
send command to device, id=6
send command to device, id=7
send command to device, id=8
send command to device, id=9
send command to device, id=10
Copy the code

If you look at the batch-data. CSV file, you will find that the status of all commands is updated to SENT:

1,SENT
2,SENT
3,SENT
4,SENT
5,SENT
6,SENT
7,SENT
8,SENT
9,SENT
10,SENT
Copy the code

At this point, our batch command issued all successful! As you can see, using the Spring Batch framework to implement Batch processing is very lightweight, but this is just the tip of the iceberg.


A formal introduction

Spring Batch introduces itself on its website by saying: A lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily A lightweight, comprehensive batch processing framework for developing powerful enterprise-class batch applications for daily operations.

The framework has the following functions:

  • Transaction Management
  • Chunk based processing
  • Declarative I/O
  • Start/Stop/Restart (Start/Stop/Restart)
  • Retry/Skip

If your batch application needs to use the above functionality, don’t be afraid to use it.

Frame picture

The framework has four main roles: JobLauncher is the task initiator, which is used to start tasks and can be seen as the entry point to the program. Job stands for a specific task. Step stands for a specific Step, and a Job can have multiple steps (just imagine how many steps it takes to put an elephant in the refrigerator). JobRepository is the repository of data, which can be regarded as the interface of a database. It is used to record task status and other information during task execution.

JobLauncher

JobLauncher is the task initiator. This interface has only one run method:

public interface JobLauncher {

    public JobExecution run(Job job, JobParameters jobParameters) throws JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException, JobParametersInvalidException;

}
Copy the code

In addition to passing in the Job object, you also need to pass in the JobParameters object, which will be explained later when WE talk about Job. JobLauncher can be used to invoke batch tasks from Java programs, from the command line, or from other frameworks such as the scheduled scheduling framework Quartz, or the Web backend framework Spring MVC. The Spring Batch framework provides a JobLauncher implementation class, SimpleJobLauncher.

Job

A Job is associated with one or more jobinstances, which in turn is associated with one or more JobExecution:

Given that tasks may not be executed once and never again, but more often are timed tasks, such as once a day, once a week, and so on, the framework uses JobInstance to distinguish between tasks that are executed each time. As shown in the figure above, a Job is an EndOfDay (an EndOfDay performed at the last moment of every day), and a JobInstance is an EndOfDay performed on May 5, 2007. The framework identifies the day of the task by the JobParameters passed in when the joblauncher.run (Job, JobParameters) method is executed.

Because a task executed on May 5, 2007, may not be completed in one go, for example, if it is stopped in the middle, or if an exception causes an interruption that requires several more executions, the framework uses JobExecution to represent each execution.

Step

A Job task can be divided into several steps. Similar to JobExection, StepExecution is used to indicate the execution of each Step. Each Step also contains an ItemReader, ItemProcessor, and ItemWriter, which are described below.

ItemReader

ItemReader represents a read operation and has the following interface:

public interface ItemReader<T> {

    T read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException;

}
Copy the code

The framework already provides a variety of ItemReader interface implementation classes, including processing of text files, XML files, databases, JMS messages, etc. We can also implement the interface ourselves.

ItemProcessor

ItemReader represents a processing operation and has the following interface:

public interface ItemProcessor<I, O> {

    O process(I item) throws Exception;

}
Copy the code

The process method takes parameters of type I and returns an object of type O after processing. Developers can implement their own business code to process the data.

ItemWriter

ItemReader represents a write operation and has the following interface:

public interface ItemWriter<T> {

    void write(List<? extends T> items) throws Exception;

}
Copy the code

The framework already provides a variety of ItemWriter interface implementation classes, including text files, XML files, database, JMS messages, etc., of course, we can also implement the interface.

JobRepository

JobRepository is used to store state information about task execution, such as what task was executed at what point in time, what was the result of the task execution, and so on. The framework provides two implementations. One is saved in memory in the form of Map. When the Java program is restarted, the task information is lost, and the task execution of other nodes cannot be obtained in distributed mode. The other is to save the data in the database, and save the data in the following 6 tables:

  • BATCH_JOB_INSTANCE
  • BATCH_JOB_EXECUTION_PARAMS
  • BATCH_JOB_EXECUTION
  • BATCH_STEP_EXECUTION
  • BATCH_JOB_EXECUTION_CONTEXT
  • BATCH_STEP_EXECUTION_CONTEXT

JobRepository of the Spring Batch framework supports mainstream databases: DB2, Derby, H2, HSQLDB, MySQL, Oracle, PostgreSQL, SQLServer, and Sybase. The lovely thing is that our Gauss database is also supported, but it requires a bit of configuration.


conclusion

This blog first teaches you how to get started quickly through a batch command Demo, and then introduces the framework from whole to part, so that you have a basic understanding. Due to limited space and capabilities, it is not possible to cover all the internal implementation details and advanced features of the Spring Batch framework. If you are interested, you can read the source code or consult the official website documentation and books (the “Reference Reading” section is provided).

Newton once said, if I have seen further than others, it is because I have stood on the shoulders of giants. Yes, borrowing from a good open source framework and discarding its dross will be far more successful than building a repetitive wheel behind closed doors.


Refer to the reading

[1] Spring Batch – Projects

[2] Spring Batch – Reference Documentation

[3] Spring Batch Batch Framework

[4] Spring Batch Reference Document Chinese version

[5] A comprehensive analysis of Spring Batch, a big data Batch framework