Summary of

Quzrzt LTS Elastic-Job xxl-Job saturn opencorn antares

1. What is a cluster or distributed scheduled task

A scheduled task management mode that integrates scattered and unreliable scheduled tasks into a unified platform and implements cluster management scheduling and distributed deployment. This is called a distributed scheduled task.

2. Common open source solutions

Elastic-job, xxl-job, Quartz, Saturn, Opencron, Antares, LTS

2.1 elastic-job

Elastic-job is a distributed scheduling solution developed by Dangdang based on Quartz. It consists of two relatively independent sub-projects, elastic-Job-Lite and Elastic-Job-Cloud.

Elastik-job-lite is positioned as a lightweight decentralized solution that uses JAR packages to coordinate distributed tasks.

Elastice-job-cloud uses Mesos + Docker(TBD) solutions to provide additional service highlights such as resource governance, application distribution, and process isolation:

  • It is based on the Quartz timed task framework and therefore has most of the features of Quartz
  • Use ZooKeeper for coordination, scheduling center, and more lightweight
  • Sharding of tasks is supported
  • Supports flexible capacity expansion and horizontal expansion. When a task is running again, the system checks the current number of servers and shards the task again. After the sharding is complete, the task is continued
  • Failover and fault-tolerant processing: When a scheduling server breaks down or disconnects from ZooKeeper, it immediately stops operations and then searches for other idle scheduling servers to run the remaining tasks
  • Provides an o&M interface to manage jobs and registries.

Official highlights:

  • distributed

Rewrite Quartz’s database-based distributed functionality to implement a registry using Zookeeper.

  • Parallel scheduling

Task fragmentation is adopted. A task is divided into N independent task items, which are executed by distributed servers in parallel.

  • Elastic capacity expansion and reduction

After a task is divided into N tasks, each server executes the assigned tasks. If a new server is added to the cluster or an existing server goes offline, elastice-job triggers job re-sharding before the next job starts without changing the execution of the current one.

  • Centralized management

The zooKeeper-based registry is used to centrally manage and coordinate the status, allocation and monitoring of distributed jobs. External systems can directly manage and monitor elastic- Jobs based on Zookeeper data.

  • Customize procedural tasks

Homework can be divided into two kinds of simple and data stream processing mode, data flow is divided into high throughput processing models and order processing, including high throughput processing mode can be turned on enough thread processing data quickly, and order processing pattern will each shard assigned to a separate thread, used to guarantee the order of the same shard.

  • Failure to transfer

Elastic capacity expansion or reduction is re-sharded before the next job runs. However, during this job, jobs assigned by offline servers will not be re-assigned. Failover can be performed in this job run with an idle server fetching orphan job fragments. Fail-over also compromises performance.

  • Runtime state collection

Monitor the running status of jobs, count the number of data successes and failures processed in the latest period of time, and record the start time, end time, and next running time of jobs.

  • Jobs stopped, resumed, and disabled

Used to start and stop a job and to stop a job from running (often used online).

  • Spring namespace support

Elastice-job can run directly independent of Spring, but also provides a custom namespace for easy integration with Spring.

  • Operational platform

Provides a Web console for managing jobs and registries.

  • The stability of

In the absence of server fluctuations, there is no re-sharding; Even if the server has fluctuations, the next sharding result will be calculated according to the server IP and job name hash value of the stable sharding order, and try not to make big changes.

  • A high performance

Batch data processing of the same server adopts automatic cutting and multi-thread parallel processing.

  • flexibility

All trade-offs between functionality and performance can be turned on or off by configuration. For example, elastic-job updates the necessary information about the running status of the job to the registry. If the jobs are executed frequently, a large number of Zookeeper write operations will occur, and distributed Zookeeper data synchronization may cause network storms. Therefore, some functionality can be sacrificed in exchange for performance gains.

  • idempotence

Elastice-job can sacrifice performance to ensure that the same shard entry does not run on two servers at the same time.

  • Fault tolerance

If the job server fails to communicate with the Zookeeper server, the job is stopped immediately. This prevents the job registry from allocating invalid fragments to other job servers. However, the job server is still performing tasks, causing repeated execution.

Inadequate:

  • Heterogeneous languages are not supported
  • The current central-less design is difficult to support multiple languages, so the feasibility of dispatching center should be considered later.
  • The monitoring system needs to be improved. At present, only simple monitoring of survival and data backlog can be done through the registry. The following aspects need to be monitored in the future:

Add monitoring dimensions, such as job uptime. Jmx-based internal state monitoring. Full data monitoring based on history sends all monitoring data to the external monitoring center in the form of Flume to provide real-time analysis.

  • Multiple registries are not supported.
  • Need to add task workflow, such as task dependency, initialization task, cleaning task, etc.
  • The real-time performance of failover function needs to be improved.
  • Lack of support for more job types, such as file, MQ, etc.
  • Lack of more sharding policy support.

Elastic – Job combines quartz’s excellent time scheduling function and ZooKeeper to implement flexible sharding policies. In addition, it also adds a lot of practical monitoring and management functions, as well as its active open source community, complete documentation, elegant code and other advantages, so it is the recommended choice of distributed task scheduling framework. Extice-job learning (netease Lede Technical Team, specific description)

System architecture

Console interface

2.2 xxl-job :

The framework of the building is indeed very simple, only need to modify the basic configuration can run by individuals open source a lightweight distributed task scheduling framework, mainly divided into two parts, dispatching center and actuator control center at the time of startup initialization, RPC proxy objects will be generated by default implementation of HTTP protocol (call), After the executor project is started, the dispatching center invoks the code in the executor project through jobHandle after the timer is triggered. Its core functions are similar to those of Elastice-Job and its technical documentation is relatively complete.

Advantages:

  • 1, simple: support CRUD operation through Web page, simple operation, one minute to get started;
  • 2, dynamic: support dynamic modify task status, suspend/resume task, and terminate the running task, effective immediately;
  • 3. Scheduling center HA (Central) : The scheduling center adopts the central design. The “Scheduling center” is implemented based on cluster Quartz and supports cluster deployment to ensure the HA of the scheduling center;
  • 4. Executor HA (distributed) : Tasks are executed in distributed mode. Task executors support cluster deployment to ensure task HA.
  • 5. Registry: The executor automatically registers tasks periodically, and the scheduling center automatically discovers registered tasks and triggers execution. At the same time, it also supports manual input actuator address;
  • 6. Flexible capacity expansion and reduction: Once a new actuator goes online or offline, tasks will be reassigned in the next scheduling;
  • 7. Routing policies: When the executor is deployed in a cluster, various routing policies are provided, including the first, last, polling, random, consistent HASH, least Frequently used, most recently unused, failover, and busy failover.
  • 8. Failover: If the task routing policy is Failover, if a machine in the actuator cluster fails, the system automatically switches over to a normal actuator to send scheduling requests.
  • 9. Blocking processing strategy: the processing strategy when the scheduling is too dense for the executor to process, including single-machine serial (default), discarding subsequent scheduling, scheduling before overwriting;
  • 10, task timeout control: support custom task timeout, task running timeout will actively interrupt the task;
  • 11. Retry when a task fails: Supports user-defined task retry times. When a task fails, the system automatically retries the task based on the preset retry times.
  • 12. Failure handling strategy; Processing policy for scheduling failures. By default, failure alarm and retry policies are provided.
  • 13. Fragment broadcast task: When the “fragment broadcast” is selected in the task routing policy, a task scheduling task will broadcast to trigger all the actuators in the cluster to execute a task, and the fragment task can be developed according to the fragment parameters.
  • 14. Dynamic sharding: The fragment broadcast task is sharding based on the executor. Dynamic expansion of the executor cluster is supported to dynamically increase the number of fragments and coordinate business processing. It can significantly improve the task processing capability and speed when performing large data operations.
  • 15. Event triggering: In addition to “Cron mode” and “Task Dependent Mode”, event – based task triggering mode is supported. The scheduling center provides the API service to trigger a single task execution, which can be flexibly triggered based on service events.
  • 16. Task progress monitoring: Support real-time monitoring of task progress;
  • 17, Rolling real-time log: support online view of scheduling results, and support Rolling real-time view of the complete execution log output by the actuator;
  • 18, GLUE: Provide Web IDE, support online development task logic code, dynamic release, real-time compilation effect, omit the process of deployment online. Supports 30 versions of historical version backtracking.
  • 19. Script tasks: Support the development and running of script tasks in GLUE mode, including Shell, Python, NodeJS and other scripts;
  • 20. Task dependency: Support to configure subtask dependency. When the parent task is completed and successfully executed, it will actively trigger the execution of a subtask, and multiple subtasks are separated by commas.
  • 21. Consistency: The Scheduling center ensures the consistency of distributed scheduling in the cluster through DB locks. A task scheduling will trigger only one execution.
  • 22. User-defined task parameters: support online configuration of scheduling task input parameters, effective immediately;
  • 23, scheduling thread pool: scheduling system multi-thread trigger scheduling operation, to ensure the accurate execution of scheduling, not blocked;
  • 24. Data encryption: encrypt the communication between the scheduling center and the actuator to improve the security of scheduling information;
  • 25. Email alarm: support email alarm when the task fails, support configuring multiple email addresses to send alarm emails;
  • 26. Push Maven central repository: The latest stable version will be pushed to Maven central repository for convenient user access and use;
  • 27, running report: support real-time view of running data, such as the number of tasks, scheduling times, number of actuators, etc.; And scheduling report, such as scheduling date distribution map, scheduling success distribution map, etc.
  • 28. Full asynchrony: all asynchrony is realized at the bottom of the system. Peak load clipping is carried out for intensive scheduling.
  • 29. Internationalization: The scheduling center supports internationalization Settings and provides two optional languages, Chinese and English. The default is Chinese.

Other rough ideas:

    1. You can manually write Java code online via GLUE and then add scheduled tasks.
    1. You can customize control corn expressions.
    1. You can execute, pause, edit scheduled tasks, view task logs, and more directly from the console.

System control interface System Architecture Diagram:

2.3 LTS LTS (Light-task-Scheduler) is a distributed task scheduling framework that supports real-time, scheduled, and Cron tasks. Good scalability and extensibility, provides support for Spring (including Xml and annotations), and provides business loggers. Support node monitoring, task execution monitoring, JVM monitoring, support dynamic submission, change, stop tasks.

Dependent environment:

JDK 1.6 + Maven 3.0.5 + zookeeper/redis MySQL 5.6 + / mongoDB

Management interface Architecture diagram The flow chart

2.4 quartz

A common quartz cluster solution is to configure timer information in the database, so that only one node is running for the same task in a pessimistic database lock mode.

Advantages:

  • Ensure high availability (HA) of nodes. If one node fails, the other node e can be overridden

Disadvantages:

  • Only one node can run a task. Other nodes do not perform tasks, resulting in low performance and resource waste
  • When dealing with a large number of short tasks, each node frequently compets for database locks, and the more nodes there are, the worse this situation becomes. Performance will be poor
  • Quartz’s distribution only solves the problem of cluster high availability, but does not solve the problem of task sharding, which cannot achieve horizontal scaling

Quartz, as a leader in open source job scheduling, is the first choice for job scheduling. Quartz uses apis to manage tasks in a clustered environment to avoid these problems, but there are also the following problems:

  • Problem 1: The way to call API operation task, not humanized;
  • Problem 2: The QuartzJobBean needs to be persisted into the underlying data table, which is quite intrusive.
  • Problem 3: Scheduling logic and QuartzJobBean are coupled in the same project, which will lead to a problem. When the number of scheduling tasks gradually increases and the scheduling task logic gradually increases, the performance of the scheduling system will be greatly limited by services.
  • Problem 4: At the bottom of Quartz, the DB lock is obtained by “preemption” and the successful node is responsible for the running task, which will lead to a very large load gap between nodes. Xxl-job implements collaborative assignment tasks through actuators, giving full play to cluster advantages and balancing load among nodes.

Service architecture

2.5 Saturn :

Saturn is a distributed task scheduling product open-source by Vipshop on Github. It was developed based on Dangdang Elastice-Job 1.0, which improved some features and added some new features.

Operating environment: Linux(Shell jobs only support Linux, Java/MSG jobs support Linux and Windows)

Java 1.7 +

Maven 3.0.4 +

Features:

  • Event – and time-based triggering is supported
  • The combination of manually specified resource allocation policies and automatic average policies
  • Mission development languages are not restricted. Support for shells (PHP, Bash, Python…) Jobs and Java jobs
  • Support localization mode task scheduling of 1 machine and 1 fragment
  • Tasks are executed in parallel by fragmentation
  • Framework and business code isolation, zero dependencies
  • The scheduling of fragments is evenly distributed according to load
  • Visual management
  • Supports second – level task triggering
  • Visual monitoring and alarm
  • Support for containerized (Docker) deployment

Window:

  • Support multi-language development python, Go, Shell, Java, Php.
  • The administrative console and statistical analysis of data are improved

Disadvantages:

  • Fewer documents, fewer companies used

2.6 opencron

A fully functional and truly universal Linux scheduled task scheduling system, to meet a variety of scenarios under a variety of complex scheduled task scheduling, while the integration of Linux real-time monitoring, WebSSH, to provide a convenient management of scheduled task platform

Disadvantages:

  • Supports only kill tasks, on-site execution, and query task running status. The main function is to modify and query tasks.
  • Tasks and task fragments cannot be added dynamically.

2.7 antares

Advantages:

  • A task is scheduled only by a node in a server cluster, based on mature Quartz
  • In parallel execution, users can pre-slice tasks to improve task execution efficiency
  • Failure to transfer
  • Flexible capacity expansion allows machines to be added dynamically while tasks are running
  • Friendly admin console

Disadvantages:

  • Tasks cannot be added dynamically, but can only be triggered, paused, or deleted on the console
  • There is not much documentation and the open source community is not active enough

The system architecture diagram is as follows

4. Compare

Several representative open source products are listed here

The framework quartz elastic-job-lite xxl-job LTS Saturn antares opencron
Rely on Mysql, jdk1.6 + Jdk1.7 +, zookeeper 3.4.6 +, maven3.0.4 + Mysql5.6 +,Servlet 3.1, JDk1.7 +, Maven3.0 +,Tomcat8 + Jdk1.6 +, maven3.0 + zookeeper, mysql5.6 + Jdk1.7 + ZK3.4.6 +, mysql5.6 + JDK 1.7+, Redis, Zookeeper Jdk1.7 +, Tomcat8.0 +
HA Multi-node deployment, by competing for database locks to ensure that only one node performs the task You can dynamically add servers by registering and discovering ZooKeeper. Horizontal expansion is supported. You can manually add scheduled tasks, start or suspend tasks. There are monitoring. In cluster deployment, servers can be added dynamically. You can manually add scheduled tasks, start or suspend tasks. There are monitoring. In cluster deployment, you can manually start and suspend tasks. Surveillance, alarm. Cluster deployment, Cluster deployment
Task fragmentation support support support support support
documentation perfect perfect perfect perfect The document is slightly less The document is slightly less The document is slightly less
Management interface There is no support support support support support support
How easy is it simple simple simple More complicated More complicated general general
Advanced features Flexible capacity expansion, multiple job modes, failover, state collection, multi-threaded data processing, idempotence, fault tolerance, Spring namespace support Elastic capacity expansion, sharding broadcast, failover, Rolling real-time log, GLUE (support online code editing, no publishing), task progress monitoring, task dependence, data encryption, email alarm, running reports, internationalization Support for Spring, Spring Boot, service logger, SPI extension support, failover, node monitoring, diversified task execution results support, FailStore fault tolerance, dynamic capacity expansion. Support based on event and time trigger, manual assigned resource allocation + automatic allocation, without language restrictions, task fragmentation, framework and business code separation, zero dependence, support visual management, support second level task trigger, support monitoring and alarm, support container (Docker) deployment. Task fragmentation, failover, elastic expansion, You can run the Quartz, Crontab, kill task on site and query the running status of the task
Using the enterprise Popular products, not high requirements for distributed scheduling companies use a large area Krypton, Dangdang, Gome, Jinpomelo, Lenovo, Vipshop, Asiainfo, Ping an, Zhu Bajie Dianping, Yunmanman, Youxin used cars, PpDAI Vipshop

Reprinted from: wsk1103. Making. IO / 2018/09/10 /…