Distributed timing task framework selection, written too well!

Source: author: mailto:[email protected] http://www.expectfly.com/2017… Technical selection of distributed timing task scheme

Why do we need timed tasks

Let’s start with the following business scenario solutions:

The payment system runs at 1am every day, clearing for one day and clearing for the last month on the first day of each month
E-commerce stores buy goods at the hour, and prices start to be discounted at 8 o ‘clock
In the 12306 ticket purchasing system, if the order is not paid successfully for more than 30 minutes, it will be recycled and processed
Once the product is successfully shipped, you need to send a SMS reminder to the customer

There are many similar business scenarios. How can we solve them?

Timing tasks address many business scenarios that require us to do a task at a particular moment. In general, the system can use messaging to replace part of the timing task, and the two have many similarities and can replace each other in scenarios.

For example, in the above business scenario of sending short messages to inform customers of successful delivery, we can send MQ messages to the queue after successful delivery, and then consume MQ messages and send short messages.

But not interchangeable in some scenarios:

A) Time-driven/Event-driven: Internal systems can generally be time-driven, but external systems can only be time-driven. If afraid to take the price of external websites, climb once every hour

B) Batch processing/piecemeal processing: Batch processing of accumulated data is more efficient and has advantages over message-oriented middleware in the case of no need for real-time. And some business logic can only be processed in batches. Such as Mobile settle our phone bill every month

C) Real-time/non-real-time: Messaging middleware can process data in real time, but it does not need real-time in some cases, such as VIP upgrade

D) System internal/system decoupling: Timed task scheduling is generally within the system, while message-oriented middleware can be used between two systems

What is the framework for timed tasks

stand-alone

Timer: is a timer class that can be configured for a specified timer task. The TimerTask class is a timed task class that implements the Runnable interface and aborts threads if a defect exception is not checked
ScheduledExecutorService: Scheduled tasks with relative latency or cycles, downside of not having an absolute date or time
Spring Timing Framework: Simple configuration and more features. Spring Timer is a priority if the system uses a stand-alone machine

distribution

Quartz: Java’s de facto standard for timed tasks. But Quartz focuses on timed tasks, not data, and doesn’t have a process tailored to data processing. Although Quartz can implement high availability of jobs based on the database, it lacks the capability of distributed parallel scheduling
TBSchedule: Alibaba’s early open source distributed task scheduling system. The code is slightly stale and uses a Timer instead of a thread pool for task scheduling. Timer is notoriously flawed when it comes to handling exceptional conditions. Also, the TBSchedule job type is relatively simple and can only be a mode of obtaining/processing data. There is a lack of documentation is more serious
Elastic-job: This is an elastic distributed task scheduling system developed by Dangdang. It uses ZooKeeper to achieve distributed coordination, high availability and sharding of tasks. It is currently in version 2.15 and supports cloud development
Saturn: Vipshop’s proprietary distributed scheduling platform for scheduled tasks, based on Dangdang’s Elastic-Job version 1, is well deployed on Docker containers.
XXL-Job: It is a distributed task scheduling platform released by Xu Xueli, an employee of Dianping, in 2015. It is a lightweight distributed task scheduling framework, whose core design goals are rapid development, simple learning, lightweight and easy to expand.

Comparison of distributed task scheduling systems

Elastic — job(e-job) and xx-job(x-job)

Project background and community strength

X-job: Xu Xueli, an employee of Dianping; 3 contributors; Lot 2470 star, 1015 fork 6 | | QQ discussion group has registered in the use of more than 40 companies | documentation is complete E – Job: dangdang open source, contributors 17 people; Lot 2524 star, 1015 fork | QQ discussion groups 1, source discussion group 1 | have registration in the use of more than 50 companies complete | | document have a clear plan of development

Support for cluster deployment

X-Job: The only requirement for cluster deployment is to ensure that the configuration of each cluster node (DB, login account, etc.) is consistent. The dispatch center differentiates between clusters by DB configuration.

Actuators support cluster deployment, improve scheduling system availability, and improve task processing capacity. The only requirement for cluster deployment is to ensure that the configuration item “xxl.job.admin.addresses/ dispatch center addresses” of each executor in the cluster is consistent, and executors can automatically register executors according to this configuration.

E-job: Rewrite Quartz’s database-based distributed functionality and implement the registry instead of ZooKeeper

Job Registry: A global job registry control center based on the implementation of ZooKeeper and its client Curator. Used to register, control, and coordinate distributed job execution.

Tasks cannot be executed repeatedly during a multi-node deployment

X-Job: Use Quartz’s database-based distributed functionality

E-job: After splitting the task into N task items, each server executes the assigned task items respectively. If a new server is added to the cluster, or an existing server goes offline, Elastic-Job will trigger a task resharding before the next task starts, leaving the current task intact.

Log traceability

X-job: Yes, there is log query interface

E-job: Important events of the scheduling process can be handled by event subscription for query, statistics, and monitoring. Elastic-Job currently provides two event subscriptions for logging events based on a relational database.

Monitoring alarm

X-job: A failure alarm will be triggered if the schedule fails, such as sending an alarm email.

The email address to be notified when task scheduling fails. Support to configure multiple email addresses, separated by commas when configuring multiple email addresses

E-job: This can be done by itself via event subscription

Job running state monitoring, monitoring the survival of the job server, monitoring the success of recent data processing, and data flow type jobs (can judge whether the job flow is normal by monitoring the number of recent data processing success, if it is less than the threshold of normal job processing, alarm can be selected). Monitor recent data processing failure (the processing result can be judged by monitoring the number of recent data processing failures. If the number is greater than 0, alarm can be selected.)

Elastic expansion and contraction

X-Job: With Quartz’s database-based distributed functionality, the number of servers exceeding a certain number can put a certain amount of pressure on the database

E-job: Realize registration, control and coordination of various services through ZK

Support for parallel scheduling

X-job: Multiple threads in the scheduling system (default 10 threads) trigger the scheduling operation to ensure that the scheduling is executed accurately and does not get blocked.

E-job: It is implemented by task sharding. Divide a task into n independent task items, and the distributed server executes each assigned shard item in parallel.

High availability strategy

X-job: “Scheduling Center” ensures the consistency of cluster distributed scheduling through DB lock. A task scheduling can only trigger one execution.

E-job: High availability of the Scheduler is achieved by running several Elastic-job-cloud-scheduler instances pointing to the same ZooKeeper cluster. ZooKeeper is used to perform leader election in the event that the current main Elastic-job-cloud-scheduler instance fails. Clusters are formed by at least two scheduler instances, with only one scheduler instance in the cluster serving and the others in a “standby” state. When this instance fails, the cluster elects one of the remaining instances to continue serving.

Failure handling strategy

X-job: The handling policy for a schedule failure. Policies include: failure alert (default), failure retry;

E-job: The Job will be resharded before the next Job runs. However, during the execution of this Job, the jobs assigned by the offline server will not be reassigned. Failover function can be used in the operation of this job with the idle server to grab the orphan job to be executed in slices. Failover also compromises performance.

Dynamic sharding strategy

X-job: The sharded broadcast task is sharded with the actuator as the dimension, which supports dynamic expansion of the actuator cluster to dynamically increase the number of sharding and cooperate with the business processing; It can significantly improve the task processing capacity and speed when carrying out large data volume business operations.

When the execution cluster is deployed, if the task routing policy chooses “sharded broadcast”, a task scheduling will broadcast and trigger all the executors in the corresponding cluster to execute a task, and pass the sharding parameters at the same time. Sharding tasks can be developed according to sharding parameters.

E-job: Support multiple sharding strategies, you can customize the sharding strategy

By default, there are three sharding strategies: sharding strategy based on the average allocation algorithm, sharding strategy based on the even number of the hash value of the Job name, sharding strategy based on the rotation of the Job instance list based on the hash value of the Job name, and supporting custom sharding strategy

Elastic-job sharding is implemented through ZooKeeper. The sharding of the sharding is allocated by the primary node. The following three situations will trigger the execution of the sharding algorithm on the primary node: a. New Job instance joins the cluster; b. Existing Job instance goes offline (if it is the leader node that goes offline, the election will first trigger the execution of the sharding algorithm)

Contrast that with the Quartz framework

The way to call the API to operate the task is not human;
The need to persist the business QuartzJobBean into the underlying data table makes the system quite invasive.
The scheduling logic and QuartzJobBean are coupled in the same project, which will lead to a problem. When the number of scheduling tasks increases gradually, and the scheduling task logic becomes heavier and heavier, the performance of the scheduling system will be greatly limited by the business.
Quartz focuses on timed tasks rather than data, and does not have a process tailored to data processing. Although Quartz can implement high availability of jobs based on the database, it lacks the capability of distributed parallel scheduling.

Comprehensive comparison

Summary and Conclusions

Thing in common:

Both e-job and X-job have a wide user base and complete technical documentation, and both can meet the basic functional requirements of timed tasks.

Difference:

X-Job focuses on simple business implementation and convenient management, simple learning cost, and rich failure strategies and routing strategies. It is recommended to use in the situation where the user base is relatively small and the number of servers is within a certain range.

E-job focuses on data, and increases the idea of elastic expansion and data sharding, so as to make greater use of the resources of distributed servers. However, the learning cost is relatively high, and it is recommended to use when “the amount of data is large and the number of deployed servers is large”

Attached are other plans for timed tasks

More than 10 days after the delivery of goods not received when the system automatically confirmed the receipt of goods to achieve a variety of ways:

Timed midnight every day screening the next day can automatically confirm the receipt of the order, and then the next day every 10 minutes to confirm the receipt of the order is not too expensive, it is relatively accurate time

If the state of automatic confirmation of receiving goods is just to let the client see it, and so on the next time the user goes online, it can be done a calculation.

Delayed and timed message delivery

ActiveMQ provides a broker-side message timing mechanism. If you want messages to be delivered to the consumer 60 seconds later, you do not want messages to be delivered to the broker immediately. If you want messages to be delivered to the consumer 60 seconds later, you want messages to be delivered once in a specified number of times

RabbitMQ can set X-message-tt on a Queue and Message to control the lifetime of a Message. If the Queue times out, the Message becomes a dead letter. With DLX, when a message becomes dead-letter in one queue, it can be republished to another Exchange. The message can then be consumed again.

Recent hot article recommended:

1.1,000+ Java interview questions and answers (latest version of 2021)

2. I finally got the IntelliJ IDEA activation code through open source project. How nice!

3. Ali Mock tool officially open source, kill all the Mock tools in the market!

4.Spring Cloud 2020.0.0 is officially released, a new and disruptive version!

5. “Java development manual (Songshan edition)” the latest release, fast download!

Feel good, don’t forget with thumb up + forward oh!