Why a distributed scheduled task?

I am 3Y, a markdown programmer with one year CRUD experience and ten years’ experience 👨🏻💻 known as a quality octuan player all the year round

The introduction of distributed timing task framework has been planned for a long time, Austin has been working on it since about a year ago, but the code hasn’t been written yet and the article has been delayed until now. Today, I would like to talk about the topic of regular tasks.

After reading this article you will understand what scheduled tasks are and why the Austin project introduced a distributed scheduled task framework. Download the code to see how I use XXl-Job.

01, how to simply implement the timing function?

I started to learn Java by watching videos. At that time, when I was learning basic Java API, the videos I watched also had timing functions (supported by JDK native). I remember that the lecturer wrote Timer to explain timing tasks.

At that time, I did not know the actual function of timing task, so I had never used Timer to realize the function of timing when I was just learning.

Later, I learned concurrency. The lecturer mentioned ScheduledExecutorService, which is more powerful than Timer and can be used to implement timing in the JDK

The strength is that ScheduledExecutorService has a thread pool and Timer is a single thread, so it can use resources more rationally.

When I was studying concurrency, I didn’t pay much attention to it (it wasn’t the focus of concurrency), so I didn’t use ScheduledExecutorService for timing.

Later, when it came to learning how to do projects, there was a Quartz course. I remember it took me a long time to understand, and it finally dawned on me that all this code had been written to implement timing.

The most obvious advantage over ScheduledExecutorService and Timer is that it supports CRon expressions.

Why did I take so long to understand? The Quartz API is so complex (it has its own terminology and conceptual stuff). However, this kind of follow the project, I follow the code step by step.

I can’t remember the Quartz API, but I understood it at the time: it turns out that we can write code that relies on “component packages” to do what we want, which turns out to be cron expressions.

When I was a junior, I wanted to write a small project with the knowledge points I had learned, to sort out what I had learned. Then I remembered Quartz.

By then I had learned about Spring/SpringBoot. So when I searched online for Spring integration with Quartz, I came across SpringTask and later found the @Schedule annotation.

With a simple annotation, scheduled tasks can be implemented and cron expressions can be supported.

A Quartz ((and a hammer!!!!!

02, Internship && work scheduled tasks

When I got to work, I learned a new term, distributed timed task framework. It wasn’t until I entered the workplace that I realized how good a scheduled task was!

Here are some of the most common positions I use for timed tasks at work:

1. Dynamic creation of scheduled task push operation message (scheduled push message)

2. Scan the schedule task table of advertising settlement to find the corresponding settleable record (schedule scan table to update the status)

3. Regularly update data records every day (regularly update data)

A lot of people ask me if I’ve ever used distributed transactions, and I tend to say, “No, we’re all just scanning the table to make sure the data is consistent.” Of course, if you’re in an interview, you can talk about distributed transactions. How do you actually scan the meter? It’s a regular sweep.

In addition, I briefly looked at the distributed timing task framework developed by the company. I remember it was expanded based on Quartz, including failover, sharding and other mechanisms.

In general, scheduled tasks are used when the application is started or configured in advance on the Web page (scheduled task frameworks support CRon expressions, so they are periodic or scheduled tasks), which is the most common scenario.

03. Why distributed scheduled tasks

Mentioned before the Timer/ScheduledExecutorService SpringTask (@ the Schedule) are single, but once we on the production environment, application deployment is often cluster pattern.

In a cluster, we usually expect a scheduled task to be executed only on a certain machine. In this case, the scheduled task implemented on a single machine is not easy to handle.

Quartz has a cluster deployment solution, so some people use database row locks or Redis distributed locks to implement their own scheduled tasks running on an application machine. It certainly can be done, including some of the more well-known distributed timed task frameworks that do the same thing and solve the problem.

But these were not the only issues we encountered. I wanted to support fault tolerance (retry on failure), sharding, manually triggering a task, having a better background interface for managing scheduled tasks, routing load balancing, and so on. These functions, as “distributed periodic task framework” have.

Since there are already so many wheels, there is no need for us as users/demanders to re-implement our own, just use the existing wheels, we can learn the implementation design ideas of the existing wheels.

04. Distributed scheduled task foundation

Quartz is an excellent open source component that abstracts timed tasks into three roles: scheduler, executor, and task, to the point that distributed timed task frameworks on the market have similar roles.

For us users, it is common to import a client package, then customize our own timing task logic based on its rules (perhaps using annotations to identify it, or implementing an interface).

It should be easy to understand this process when you look at the character abstraction and general usage posture in the execution diagram above. We can think about two more questions:

1. Task information and scheduling information need to be stored. Where should they be stored? The scheduler needs to “inform” the executor to execute, then “inform” in what way to do?

2. How does the scheduler find the tasks that need to be performed?

In view of the first problem, distributed timing task framework can be divided into two schools: centralized and decentralized

The so-called “centralization” means that the scheduler and the executor are separated, and the scheduler uniformly schedules and informs the executor to perform scheduled tasks
The so-called “decentralization” refers to: scheduler and actuator coupling, scheduling their own execution

For the “centralized” school, storing relevant information is likely to be in the DataBase, and the client package we introduce is actually executer-related code. The scheduler implements the logic of task scheduling, and the remote call executor triggers the corresponding logic.

When the scheduler “informs” the executor to execute a task, either through an “RPC” call or by writing the task information to a message queue for the executor to consume.

For the “decentralized” genre, storing relevant information is likely to be in a registry (Zookeeper), and the client package we introduced is essentially executor + scheduler related code.

Depending on the registry to complete the task allocation, “centralized” school of scheduling is to ensure that a task is only consumed by one machine, which needs to write distributed lock related logic in the code to ensure, while “decentralized” relying on the registry eliminates this link.

To answer the second question, how does the scheduler find the tasks that need to be performed? These days, the newer distributed timing task frameworks generally use “time wheels”.

1. If we want to find the tasks to be executed daily, we may put these tasks in a List and judge them. At this time, the time complexity of the query is O(n).

2. With a slight improvement, we might put these tasks in a minimum heap (sort by time) where the add/delete/change time is O(logn) and the query is O(1).

3, to improve, we put these tasks in a ring array, then the add, delete, change and check time complexity is O(1). However, the size of the ring array determines the size of the tasks we can store. Tasks beyond the ring array need to be stored in a different array structure.

4, finally improve, we can have a multi-layer ring array, the accuracy of different levels of ring array is not the same, using multi-layer ring array can greatly improve our accuracy.

05. Selection of distributed timing task framework

Distributed regular task framework can choose or a lot, now well-known are: XXL – JOB/Elastic – JOB/LTS/SchedulerX/Saturn/PowerJob etc etc. Companies that are able to do so may expand on Quartz and develop their own distributed timing task framework for their own company.

I don’t come from this field, and for me, the technology selection for my Austin project focuses on two main things (in fact, the same reasons I chose Apollo as a distributed configuration center) : maturity, stability, and an active community.

This time I chose XXL-Job as Austin’s distributed task scheduling framework. Xxl-job is already available in many companies. However, the latest version is in 2021-02, and there has been no major update for nearly a year.

06. Why does AUSTIN need a distributed timing task framework

Back to the Austin architecture, I have created the Austin-admin page, which provides “message template” management functions.

Sending a message is not only done by the “technical side” calling the interface, but also by the “operational side” setting timing to push it.

For this function, I need to use the distributed periodic task framework as the middleware to support my business, and it is very important that the distributed periodic task framework needs to support the function of dynamically creating periodic tasks.

When clicking “start” on the page, you need to create a scheduled task, when clicking “pause” on the page, you need to stop the scheduled task, when clicking “delete” template on the page, if there was a scheduled task, you need to delete it together. When you click “Edit” on the page and save, you also need to stop the scheduled task.

Well, that’s all you need to do

07. Connect AUSTIN to XXL-job

The steps to access xxL-job distributed scheduled task framework are quite simple (see the documentation), let me briefly explain. Access to the specific code you can pull ausitn down to see, I will focus on my experience when I access.

Official website: www.xuxueli.com/xxl-job/#%E…

1. Introduce maven dependencies for xxL-job-core in your own projects

2. In MySQL, run the SQL script /xxl-job/doc/db/tables_xxl_job

3. Download the xxL-job source code from Gitee or GitHub, modify the xxL-job-admin database configuration, and start the XXL-job-admin project.

4. Add xxL-job-related configuration information to your own project

5, use @xxlJob annotation modification method to write the timing task related logic

Xxl-job is a distributed scheduled task framework that belongs to the “centralized” school. The scheduler and the executor are separated.

I mentioned earlier that Austin needs to dynamically add, delete, and change scheduled tasks, and XXl-job is supported, but I don’t think it’s packaged well enough, just an HTTP interface on the scheduler. Calling the HTTP interface is relatively cumbersome, many related Javabeans are not defined in the Core package, I had to write again.

So, IT took me a long time and a lot of code to complete the dynamic add, delete, and change the scheduled task.

The scheduler and the executor are deployed separately, which means that the network between the scheduler and the executor must be accessible: Originally I did not install any local environment, including MySQL, I am connected to the cloud server, but now I have to debug in the network accessible environment, so I have to start xxl-job-admin in the local scheduling center to debug.

Is it weird to open a new port to the xxl-job-admin scheduling center when starting the actuator instead of reusing the SpringBoot default port?

08,

This article mainly discusses what is scheduled task, why to use scheduled task, in the Java domain if there are scheduled task related requirements can be implemented with what, distributed scheduled task basic knowledge and how to access XXL-job

I believe you have a basic understanding of distributed timing task framework. If you are interested, you can pick an open source framework to learn. If you want to know the code for access, you can pull down my Austin project to have a look.

The main code in Austin – cron XXL bag, and code of distributed applications in Austin – web MessageTemplateController with template to add and delete coupled together.

The next post will tell you how I designed to call messages for push delivery when a timed task was triggered and a crowd file was obtained.

Now that you’ve seen it, a “like” is not too much, is it? I’m 3y. See you in the next video.

Follow my wechat public number [Java3y] in addition to technology I will also talk about some daily, some words can only say quietly ~ [line interview + write Java project from zero] continuous high intensity update! O star!!!!! Original is not easy!! Three times!!

Austin project source code Gitee link: gitee.com/austin

Austin project source code on GitHub: github.com/austin

01, how to simply implement the timing function?

02, Internship && work scheduled tasks

03. Why distributed scheduled tasks

04. Distributed scheduled task foundation

05. Selection of distributed timing task framework

06. Why does AUSTIN need a distributed timing task framework

07. Connect AUSTIN to XXL-job

08,

Related Posts

Chapter 31 SQL Command DROP DATABASE

Wechat public platform development – Basic chapter

Method based on reflection