The background of distributed task scheduling

Whether it is Internet application or enterprise application, there are a lot of batch processing tasks. We often need some task scheduling system to help solve problems. With the gradual evolution of microservitization architecture, single architecture gradually evolves into distributed and microservice architecture. In this context, many original task scheduling platforms can no longer meet the needs of business systems, so some distributed task scheduling platforms appear.

1.1 Evolution of Distributed Task scheduling

In the real business development process, there are many times when we inevitably need to use some scheduled tasks to solve problems. Usually we have several solutions: use Crontab or SpringCron (although this may be the case with few machines and simple tasks but not many). However, when the application complexity increases, the number of scheduled tasks increases, and dependencies between tasks occur, the Crontab management and configuration of scheduled tasks becomes chaotic, which seriously affects the working efficiency. This leads to a series of problems:

  • The task management is chaotic and the life cycle cannot be coordinated and managed uniformly;
  • If there are dependencies between tasks, it is difficult to choreograph.

With the development of the Internet, distributed service architecture is becoming more and more popular. Accordingly, a distributed task scheduling system is needed to manage scheduled tasks in distributed architecture.

1.2 Distributed Task scheduling Architecture

With more and more vertical applications, interactions between applications will become more and more complex. Usually, we adopt a distributed or microservice architecture to extract core businesses and form separate services. An independent microservice group gradually forms a stable service center, enabling business applications to respond more quickly to changing market demands.

At this point, a distributed service framework for improving business reuse and integration becomes key. At the same time, because the service is independent, the scheduled task is generally independent, and the change of the task has little impact on the overall system. Usually, tasks and schedules are separated (as shown in the figure above), so that the execution logic of tasks does not need to pay attention to scheduling and scheduling, and at the same time, it can ensure the high availability of actuators and schedules and is easy to develop and maintain.

1.3 Advantages of Distributed Task Scheduling

Based on the distributed service architecture, the number of independent services may be large. If scheduled tasks are implemented separately in the service, it may be difficult to manage, and service restart caused by scheduled task changes cannot be avoided. Therefore, an independent distributed task scheduling system is necessary, which can be used to manage all scheduled tasks globally. At the same time, as the function of the distributed task scheduling system, the change of scheduled task does not affect any business or the whole system:

  • By separating scheduling and task management, the development and maintenance costs are greatly reduced.
  • Distributed deployment ensures high availability, scalability and load balancing of the system and improves fault tolerance.
  • You can deploy and manage scheduled tasks through the console, which is convenient, flexible and efficient.
  • Tasks can be persisted to the database to avoid potential risks caused by downtime and data loss. In addition, there is a complete task failure rework mechanism and detailed task tracking and alarm policies.

2. Selection of distributed task scheduling technology

2.1 Distributed task scheduling considerations

  • Task choreography: There is a process sequence for scheduled tasks between multiple services.
  • Task sharding: For a large task, sharding is required for parallel execution.
  • Cross-platform: In addition to projects using the Java technology stack (SpringBoot, Spring, etc.), there are applications using other languages.
  • Non-intrusive: The business does not want to be highly coupled to scheduling and only focuses on the execution logic of the business.
  • Failover: Compensating for problems encountered during task execution to reduce manual intervention.
  • High availability: The scheduling system must ensure high availability.
  • Real-time monitoring: Displays the task execution status in real time.
  • Visualization: The operation of task scheduling provides a visual page for easy use.
  • Dynamic editing: The task clock parameters of the service may change and deployment downtime is not desired.

2.2 Comparison between SIA-Task and other distributed TASK scheduling technologies

SIA is short for Simple is Awesome, the basic development platform of CreditEase, and siA-Task (micro-service TASK scheduling platform) is one of the important products. Sia-task fits the current micro-service architecture mode. It has the characteristics of cross-platform, choreographer, high availability, non-intrusion, consistency, asynchronous parallel, dynamic expansion, real-time monitoring and so on.

Open source: github.com/siaorg/sia-…

We first compare the mainstream open source distributed task scheduling frameworks in the market, analyze their advantages and disadvantages, and then introduce our technology selection.

  • Quartz: Quartz is an open source project in the area of task scheduling by the Open source organization OpenSymphony, which is implemented entirely in Java. The project was acquired by Terracotta in 2009 and is currently a project of Terracotta. Compared to timed tasks provided by the JDK or Spring, Quartz’s control of individual tasks is basically extreme, and it plays a huge role in enterprise applications due to its power and application flexibility. However, Quartz does not support task orchestration (there are dependencies between tasks), and it does not support task sharding.
  • TBSchedule: TBSchedule is a distributed scheduling framework that allows a batch or constantly changing task to be dynamically distributed across multiple host JVMS and executed in parallel across different thread groups. Pure Java implementation based on ZooKeeper, open source by Alibaba. TBSchedule focuses on task distribution and supports task sharding, but there is no task orchestration and it is not cross-platform.
  • Elastice-job: Elastice-Job is a distributed scheduling solution provided by Dangdang open source. It consists of two independent subprojects, elastice-Job-Lite and Elastice-Job-Cloud. Elastical-job supports Job sharding (Job sharding consistency), but there is no task scheduling and it is not cross-platform.
  • Saturn: Saturn is VipSHOP’s open source distributed, highly available scheduling service. Saturn did secondary development at Elastik-Job, with support for monitoring, task sharding, and cross-platform, but no task choreography.
  • Antares: Antares is a Quartz based distributed scheduling that supports sharding and tree task dependence, but is not cross-platform.
  • Uncode-schedule: Uncode-schedule is a distributed task scheduling component based on Zookeeper. All tasks can be executed in a cluster without duplication or omission. Dynamically add and delete tasks. But there is no task sharding, no task orchestration, and it is not cross-platform.
  • Xxl-job: XXL-Job is a lightweight distributed task scheduling platform. Its core design goal is rapid development, simple learning, lightweight and easy to expand. Xxl-job supports sharding, simple task dependence, and subtask dependence, not cross-platform.

Here’s a quick comparison between SIA-Task and these TASK scheduling frameworks:

Task scheduling Task fragmentation cross-platform High availability failover Real-time monitoring
SIA-TASK Square root Square root Square root Square root Square root Square root
Quartz x x .NET Square root x API monitoring
TBSchedule x Square root x Square root Square root Square root
Elastic-Job x Square root x Square root Square root Square root
Saturn x Square root Square root Square root Square root Square root
Antares Square root Square root x Square root Square root Square root
Uncode-Schedule x x x Square root Square root Square root
XXL-JOB Subtask dependency Square root x Square root Square root Square root

It can be found that these scheduling frameworks basically support functions such as high availability, failover and real-time monitoring, but support functions such as task orchestration, task fragmentation and cross-platform has different emphasis. Sia-task will fully support these capabilities.

Iii. Introduction to SIA-Task

3.1 Selection of SIA-Task technology

  • REST: A software architecture style. The executor is required to expose the Http call interface for cross-platform purposes.
  • AOP: Faceted programming technology. Used in Hunter, the Spring project extension package, to ensure that tasks are called serially (singleton, single thread).
  • Quartz: Powerful, flexible application, the control of a single task is basically done to the extreme, used as a scheduling center clock component.
  • MySQL: Used for metadata storage and (temporary) log access.
  • Elastic: Lucene-based search server that provides a distributed multi-user full-text search engine for log storage and query.
  • SpringCloud: A community active development framework and a unified development framework for companies. For rapid development, rapid iteration.
  • MyBatis: an excellent persistence layer framework that supports customized SQL, stored procedures, and advanced mapping. Used to simplify persistence layer development.
  • Zookeeper: Time-tested registry. Used to solve the scheduling center high availability, distributed consistency and other problems.

3.2 Design idea of SIA-Task

Sia-task uses microservice design ideas to capture TASK metadata distributed on each actuator node, report it, and upload it to the registry. Support online task scheduling and dynamic task clock modification by online editable mode; The Http protocol is used as the interactive transport protocol. The data interaction format is unified using Json. The user performs operations through the choreographer (described below), triggers events, the scheduler receives events, and the scheduling center analyzes the clock, executes the task flow and notifes the task.

3.3 Basic Concepts of SIA-Task

Sia-task separates tasks and scheduling. The TASK execution logic and scheduling logic of a service are completely separated. The system consists of the following core concepts:

  • Task: Basic execution unit, an HTTP call interface exposed by an executor.
  • A Job is composed of one or more tasks that have a logical relationship with each other (serial/parallel). The smallest unit of a Job is assigned by the task scheduling center.
  • Plan: Consists of several sequentially executed jobs, each of which has its own execution cycle. A Plan has no execution cycle.
  • Task Scheduling center (Scheduler) : Schedulers tasks according to the execution cycle of each job, that is, HTTP requests are made according to the logic of schedule, job, and task.
  • Task Orchestration Center (Config) : The orchestration center uses tasks to create plans and jobs.
  • Executer: Receives HTTP requests for the execution of business logic.
  • Hunter: Spring project extension package, which is responsible for capturing tasks in the executor and uploading them to the registry. Services can rely on this component to write tasks.

3.4 SiA-Task system architecture

Sia-task can be divided into three modules (scheduling center, choreographer center, and executor) and two components (persistent storage and registry). The three modules and two components serve as follows:

  • TASK scheduling center: responsible for Job preemption, TASK scheduling, and TASK migration. It is the core function module of SIA-Task.
  • Task Orchestration center: logically orchestrates online tasks and provides log viewing and real-time monitoring functions.
  • Task executor: Responsible for receiving scheduling requests and executing task logic.
  • Task registry (ZK) : Coordinates the workflow of jobs and tasks, schedulers, and so on.
  • Persistent storage (DB) : Records Job and Task data of a project and provides log storage.

Sia-task uses SpringBoot system as architecture selection and secondary development based on Quartz and Zookeeper to support corresponding features and functions. The logical architecture diagram of SIA-Task is as follows:

3.5 SiA-Task Module Description

3.5.1 Task Scheduling Center

The task scheduling center is responsible for task scheduling, managing scheduling information, issuing scheduling requests according to scheduling configuration, and does not undertake business codes. The decoupling of scheduling system and task improves system availability and stability, and the performance of scheduling system is no longer limited by task modules. Supports visual, simple and dynamic management of scheduling information, including task creation, update, deletion, and task alarm. All the above operations take effect in real time. Meanwhile, supports monitoring of scheduling results and execution logs, and supports actuator fault recovery.

3.5.2 Task Scheduling Center

Task scheduling center is a component of distributed scheduling center that supports online task model scheduling. Web side task choreography can be performed based on UI.

We can use the above basic model to orchestrate some complex scheduling models, such as:

UI choreography for SIA-Task:

View the task layout information as shown in the following figure:

In addition, the orchestration center provides functions such as viewing home page statistics, scheduling monitoring, Job management, Task management, and log management.

3.5.3 Task Actuator

Responsible for receiving scheduling requests and executing task logic. Task module focuses on the execution of tasks and other operations, development and maintenance more simple and efficient;

Two types of actuators are supported:

(1) If sia-task-Hunter is used, support SpringBoot project and Spring project, introduce sia-task-Hunter, task (task) fetching client. Compliant HTTP interface tasks (called tasks) are automatically captured and uploaded to the registry;

(2) If siA-task-Hunter is not used, only the HTTP interface that can be invoked by the task is provided. In this case, services need to be manually recorded and concurrent invocation control of the task is controlled.

3.5.4 Task Registry (Zookeeper)

The distributed framework uses Zookeeper as its registry.

(1) Task registration

The scheduling center and execution cluster use Zookeeper as the registry center. All data is registered in the form of nodes and node content, and the host status is periodically reported to Zookeeper.

(2) Metadata storage

The registry not only provides registration services, but also stores information about each executor (including executor instance information, Task metadata uploaded by the executor, and some temporary state data while the Task is running).

(3) Event release

Based on the Zookeeper event push mechanism, the task is published, and the balance algorithm is used to ensure balanced distribution of scheduler task preemption.

(4) Load balancing

Ensure that the scheduler obtains a balanced number of jobs to avoid pressure from a single node.

3.5.5 Persisting Storage (DB)

MySQL is used as the data persistence solution.

Except the dynamic Task metadata stored in the registry, other related metadata is stored in MySQL, including but not limited to manually entered tasks, configured jobs, scheduled Task dependency information, scheduling logs, service personnel operation logs, and Task execution logs.

3.6 SIa-task Key running process

3.6.1 Task Publishing process

(1) You can create a Job on the UI. You can select the Job type, set the alarm email, and set the Job description. Then perform Task scheduling for the created Job.

(2) After the Job is created and the Task scheduling relationship is set, the Task can be published and the corresponding Job can be activated, executed once, stopped and deleted through the UI.

(3) The user’s Task can be captured by the crawler or manually created by using the UI.

3.6.2 Process Execution

(1) After a Job is created, you can activate scheduled tasks.

(2) When a Job reaches the scheduled time, the scheduling center triggers the Job, notifies the Task executor to execute it through HTTP according to the preset Task scheduling logic, and asynchronously monitors the Task execution result.

(3) If the execution result is successful, determine whether there is a post-task. If there is, continue the next scheduling. If there is no, it indicates that the Job is executed and the call ends. If the execution fails, a fault recovery policy is triggered: Stop immediately, ignore the failure, retry multiple times, and switch to another actuator.

3.6.3 Status Flow

During the Job life cycle, there are four states: NULL stopped, READY, RUNNING, and STOP. The following figure shows the state flow and flow conditions.

3.7 SiA-Task module design

The physical network topology of SIA-Task is as follows:

Module interaction design of SIA-Task:

(1) Create Task tasks through the choreography center or automatically capture tasks through Hunter, and asynchronously save Task information to DB; Create and activate a Job, and create JobKey in ZooKeeper.

(2) The scheduling center will monitor the creation event of JobKey in ZooKeeper, and then preempt the created Job. After the preemption succeeds, the Scheduled Quartz task is added, and the Job is triggered when the time reaches. The scheduling center asynchronously invokes the executor service to execute tasks in the Job (there may be multiple tasks, following the Task failure policy) and returns the results to the scheduling center.

(3) The Job execution status can be changed on ZooKeeper at any time, and can be queried through the query interface in the choreographer center.

(4) After the Job is executed, wait for the next Job execution.

3.7.1 Central design of task scheduling

The choreographer center can interact with DB and ZooKeeper. Its main functions can be divided into three aspects:

  • Data Persistence interface service;
  • The metadata on ZooKeeper is changed.
  • Data visualization: view various statistical data of the system.

The monitoring display on the homepage of the choreography center is as follows:

3.7.2 Task Scheduling Center Design

The scheduling center mainly interacts with DB, ZK and actuators. Its main functions can be divided into the following aspects:

  • Job Logs are recorded
  • Description The Job status in ZK changed
  • Call the executor service to execute the Job
  • The scheduling center is highly available
  • Job Schedules the thread pool

3.7.3 Task actuator Design

The actuator can interact with ZK and the scheduling center, and its main functions can be divided into two aspects:

  • Accept the scheduling of the dispatching center, perform scheduled tasks, and return the results to the dispatching center;
  • Automatically captures tasks on actuators and submits them to ZK.

Example execution Task:

@OnlineTask(description = "Sample Online Task",enableSerial=true)
@RequestMapping(value = "/example", method = { RequestMethod.POST }, produces = "application/json; charset=UTF-8")
@CrossOrigin(methods = { RequestMethod.POST }, origins = "*")
@ResponseBody
public String example(@RequestBody String json) {   
    /** * TODO: client business logic processing */
    Map<String, String> info = new HashMap<String, String>();
    info.put("status"."success");
    info.put("result"."as you need");
    return JSONHelper.toString(info);
}
Copy the code

As you can see, Task writing is very simple.

3.8 SiA-Task highly available design

Distributed services generally need to consider high availability solutions. Sia-task also enhances different dimensions for different service components to ensure high availability.

3.8.1 High availability of the Task Orchestration center

Sia-task achieves high availability of the choreography center through the separation of front and back ends and service separation. Failure of an instance in the cluster does not affect other instances in the cluster, so other available orchestration centers in the cluster can be used without special operations.

3.8.2 High Availability of the Task Scheduling center

3.8.2.1 Exception Transfer

If the services of an instance node in the scheduling center cluster are down, all jobs on the instance node are smoothly migrated to available instances in the cluster without missing scheduled task execution. In addition, when the crashed instance is successfully recovered and connected to the cluster again, the instance continues to preempt jobs to provide services.

3.8.2.2 Configuring a thread Pool

Scheduling is implemented in thread pool mode to avoid scheduling delay caused by blocking of single thread. The number of threads in the program pool. The default value is 10. If multiple time-consuming tasks are concurrently executed, the size of the thread pool should be selected based on service characteristics.

org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool org.quartz.threadPool.threadCount = 60
org.quartz.threadPool.threadPriority = 5
org.quartz.threadPool.threadsInheritContextClassLoaderOfInitializingThread = true
Copy the code

Sia-task takes advantage of the threadPool provided by quartz itself. The thread pool is redefined to allocate a unique thread pool for each Job. The size of the thread pool can be dynamically scaled based on the number of tasks choreographed by jobs. This ensures that the threads scheduled for each Job are completely independent and prevent thread resources from being exhausted due to the sudden increase in the number of tasks choreographed. In addition, the thread pool resource recycling logic is provided to reclaim the allocated thread pool resources when the Job terminates permanently.

public static ExecutorService getExecutorService(String JobKey) {

    ExecutorService exec = executorPool.get(JobKey);
     if (exec == null) {
        LOGGER.info(Constants.LOG_PREFIX + "Initialize thread pool for running Jobs,Job is {}",JobKey);
      exec = Executors.newCachedThreadPool();
      executorPool.putIfAbsent(JobKey, exec);
      exec = executorPool.get(JobKey);
  }
    return exec;
}
Copy the code
3.8.2.3 Full Log Tracing

Sia-task comprehensively tracks the whole scheduling life cycle of a Job. AOP is used to enhance the log, and the scheduling center logs every time a Job is triggered. Task logs are also recorded when tasks are executed based on jobs.

Logs are classified into Job logs and Task logs:

  • Job log: Contains scheduler information, scheduling time, scheduling status, and other additional attributes.
  • Task log: Contains executor information, execution time, execution status, returned information, and other additional attributes.
3.8.2.4 Asynchronous Encapsulation
  • Sia-task is designed from the very beginning to take into account the loss of concurrent thread resources in the dispatch center when tasks are called remotely. Remote scheduling of tasks encapsulated in Jobs adopts asynchronous invocation, and the time of each Task request logic is very lightweight. An HTTP request seen only once.
  • User-defined timeout Settings are supported for tasks. Two timeout modes are supported: ConnectTimeout and readTimeout. Users can set the timeout based on the service execution period.
public interface RestTemplate {

/** * Asynchronous Post method *@param request
 * @param responseType
 * @param uriVariables
 * @param <T>
 * @return* /
 <T> ListenableFuture<ResponseEntity<T>> postAsyncForEntity(Request request, Class<T> responseType, Object... uriVariables); }
Copy the code
3.8.2.5 Customizing a Scheduler Resource Pool

Sia-task designs the scheduling resource pool from the perspective of physical resources. We pool the scheduler for the consideration of some special cases. The scheduler can carry on the transformation of state through different operations, thus carrying on the transformation of capability.

  • Work scheduler resource pool: Manages scheduler resources that have the ability to get tasks and can actually get them.
  • Offline scheduler resource pool: Manages scheduler resources that can obtain tasks but are not allowed to obtain them.
  • Offline scheduler resource pool: Manages the offline scheduler resources in the offline scheduler resource pool.

3.8.3 High Availability of a Task actuator

  • Considering the instability of the network, SIA-Task has also made a very important design for the instability of the network. It supports the connectivity testing of nodes and anticipates the health of TASK instance nodes to ensure the health of TASK instance nodes in advance and ensure the high availability of scheduling tasks.

  • In addition, siA-Task redesigns the reconnection mechanism of ZooKeeper to ensure that the TASK instance node can recover and retry after the connection is lost due to network problems. After the TASK recovers, it is scheduled to receive normal tasks in the execution pool.

  • Typically, actuators are also clustered. As the execution unit of a Task, if the execution fails on one machine in the execution cluster, the scheduling center will perform failover based on the failure policy. Two failover strategies are provided: polling failover and maximum compensation failover. If only one of the available actuators succeeds, the Task succeeds. If all the actuators fail, the Task fails. The maximum compensation transfer is that the execution of the actuator is repeated for several times. If the execution succeeds, no transfer will be performed. If the execution still fails, the polling transfer policy will be executed.

Four,

This paper gives a brief introduction to the microservice TASK scheduling platform SIA-Task, including the design background, architecture design and product component functions and features. The micro-service TASK scheduling platform SIA-Task basically solves the current business requirements and provides simple and efficient scheduling services. Sia-task continues to iterate to provide better services. Technical documentation and usage documentation will also be provided later.

Links to guide

Open source: github.com/siaorg/sia-…

Authors: MAO Zhengwei/LI Pengfei/Liang Xin

The SpringCloud community