This article is intended for those with basic Java knowledge

Author: HelloGitHub – Salieri

HelloGitHub introduces the Open Source project series.

twitter

High availability to today is not a novel word, how to achieve high availability we already know. The combination of multi-instance deployment, service registration, and service discovery will not take long to achieve high availability. So many people see the introduction page for PowerJob that says that any component supports cluster deployment for high availability and assume that they have followed the same process. Then when you look at the system dependent components, you find…… emmm…… They are? No watching. Look for Nacos? emmm…… Can’t find…… either Not only did I not find it, I found that the documentation clearly states that the minimum dependency is only a relational database. Many users see here a little puzzled, normally there will be two doubts.

First, why not use a registry?

To achieve high availability in a distributed environment, concepts such as service registration and service discovery are definitely required. There is no external registry, which is basically a set of similar mechanisms to implement themselves. So why do we do that?

The answer is simple — cost. This cost refers to the user’s access cost. For a heavy-duty open source project that needs to be deployed, every one less external dependency is one more potential user. Additional system dependencies represent additional technology stacks and additional maintenance costs. If an enterprise does not have such a technology system (for example, ZooKeeper is not used) and PowerJob relies heavily on ZooKeeper, it is likely to be goodbye

With the first problem solved, let’s move on to the second problem

Simple high “available”

The basic components of PowerJob system are the scheduling server and the worker. The server is responsible for scheduling scheduled tasks and distributing them to the worker for execution. It is a typical C/S architecture.

Under the C/S architecture, if the goal is “high availability” that the server and client can communicate with each other, it is actually very easy to achieve.

First, start multiple server application instances for cluster deployment. Then fill all the IP addresses of multiple servers into the worker’s configuration file. When the worker starts, it randomly connects to an IP and tries again if it fails. Once successfully connected to a server, the server starts reporting its address information. The server can also communicate with the worker by holding this information. As a result, the simplest version of a “high availability” cluster is set up. But… Does it really work?

The answer is clearly no (otherwise there wouldn’t be this article). There are two main problems in the above schemes:

  1. Task scheduling must be unique, that is, a task can only be scheduled by one machine at a time; otherwise, repeated execution will occur. In the scheme mentioned above, each server is completely equivalent, so it can only rely on distributed locks to ensure uniqueness, that is, the server that gets the lock performs scheduling, while other servers can only act as war reporters and silently edge OB. In this scheme, no matter how many servers are deployed, the overall scheduling performance of the system is actually fixed. Multi-instance deployment can only achieve high availability, but not high performance.
  2. The server cannot hold complete worker cluster information. PowerJob is a task scheduling middleware designed to provide accurate scheduling and distributed computing capabilities for each department and line of business in an enterprise. So there must be a concept of cluster grouping, just as ProducerGroup and ConsumerGroup exist in RocketMQ, PowerJob has an AppName concept. An AppName logically corresponds to a group of tasks in an application and physically corresponds to the cluster in which the application is deployed. In order to facilitate the unified management of the server and the implementation of some additional functions (distributed computing), it is a strong demand for the server to hold the complete cluster information under a certain AppName, and the “blind cat hit a dead mouse” scheme mentioned above, obviously cannot do this.

Based on the above two points, the journey of PowerJob is to explore a more reasonable and powerful high availability architecture.

Grouping isolation

In fact, based on the previous problems, the prototype of this mechanism is almost out.

Since the server needs to hold the complete cluster information under a certain group, we can naturally think of whether all workers in a certain group can be connected to a certain server. Once all the machines in a group are connected to a server, this is a small subsystem. Although there are multiple servers and worker clusters in the whole PowerJob system, for the operation of this group, it is enough to have the worker cluster corresponding to this group and the server they connect to. Then within this small “subsystem” there is only one server, and there is no double scheduling problem (this is achieved by the server scheduling only the tasks under its AppName).

So, after layer upon layer of peeling, the question becomes: How do you get all the machines in a group to connect to the same server?

When you look at this question, I’m sure many of you will have the same thought as I did at that time: That’s it?

“It would be too easy for all machines to connect to the same server. You only need to configure one IP.”

“How to configure one IP address to make high availability, how to use multiple server resources?”

“🤔 seems to make some sense, then hash(appName) mod as subscript, so that all machines in the same group have the same initial IP, different groups can also connect to different servers.”

“Well, what if the connected server dies?”

“This is easy to do. You can use an open addressing method like the one used to resolve hash conflicts. You can start from the failed server index and try again backwards.

“🤔 seems to be very reasonable, hum, worker selection is just like this, the scheme is done, the league of Heroes start!”

While I was in the midst of a bloody battle and pointing at the head of the enemy general, the picture… Fixed forever in the moment before seeing blood. The words “trying to reconnect” came into my eyes and brought me into deep contemplation.

Although every time playing the game will scold Tencent that *** of the potato server, but scold return scold, in the mind is still understand, most of the cases are their own network fluctuation lead to the game dropped (who call me cheap to do a mobile broadband, ah, Yang Yongxin, king of thunder and lightning also graph one joy, really want to quit the addiction to the net still have to see mobile broadband).

Huh? For your own reasons? Network fluctuation? Drops? Reconnection? This string of words pulled me back into the plan I had just designed and hit me in the head. Once the worker thinks that the server is unavailable due to its network fluctuation and reconnects to another server, the constraint that all workers are connected to the same server will be broken…… Therefore, this scheme is naturally an unworkable scheme full of loopholes.

The week after that was a broken one. In order to solve this problem, I have designed countless “strange beast” solutions, and then rejected and shot one by one.

In fact, this experience is now looking back to think particularly funny, also have their own stupid. The reason for the failure of countless schemes is the same, that is, the wrong starting point. I have been trying to let the worker decide which server to connect to, but I have repeatedly ignored the fact that the worker can never get the real survival information of the server (for example, the heartbeat cannot be transmitted, which may be due to the network failure of the worker). Therefore, the worker should not decide which server to connect to; the server should decide. All the worker can do is service discovery. To understand this point, the specific scheme has emerged.

PS: The birth of this plan, I probably paid a pound of brain cell price (have to say this weight loss method is pretty good)… Brain cells don’t die in vain, and while those bizarre solutions don’t make it to the official version, without them there’s no way to get to the truth. To commemorate and mourn, I named the final design V4: For whom the Bell Tolls.

V4: For whom the bell tolls

If you understand that the Worker cannot initiate a Server re-election, this problem is basically solved…… Due to space reasons and online has written a small partner source analysis of the blog, I will not repeat the “wheel”, here mainly about the design ideas.

As mentioned above, the worker cannot decide which server to connect to because it cannot obtain the exact state of the server. Therefore, all the worker needs to do is service discovery. That is, HTTP is periodically used to request any server to obtain the server corresponding to the current group (appName).

After the server receives the service discovery request from the worker, it actually carries out a small distributed primary selection: server_info table exists in the database that the server depends on, which records the server information corresponding to each group (appName). If the server finds records in the table, it means that other workers in the worker cluster have requested the server to vote in advance. At this time, only a PING request needs to be sent to check whether the server is alive. If the server is found alive, the server information is returned as the server for the grouping. Otherwise, you complete the usurpation and write your information to the database table as the server for that group.

Careful friends may ask again? Send a PING request to check whether the server is alive or not. Different requests can cause problems on both the sender and the receiver. Why do you think that the original server is down?

Indeed, under this scheme, there is still no way to solve the server hung or not hung this “true or false Monkey King” metaphysical problem. But does it matter? Our goal is that all workers in a group are connected to the same server. Therefore, the whole cluster will eventually connect to the same server with the support of the service discovery mechanism even if the situation of accidental usurpation occurs, which perfectly meets our needs.

So far, it took 6 days, from the original doubt of life, to the implementation of the perfect plan, is really tortuous ~

The last

Finally, affixed with two small partners contribute to the source analysis of the article, I personally check, there is no quality problem (this said I feel good piaoha ha ha), please rest assured that the audience master view ~

  • PowerJob source code analysis – group isolation design

  • PowerJob source code interpretation 1: Server and Worker communication interpretation

That’s the end of this article. I believe that through this article and the previous article, you have a certain understanding of PowerJob scheduling layer and high availability high performance architecture. Next is the next preview segment ~

In order to preserve the mystery, I have chosen not to announce it this time

All surprises, see you next time

Project Address:

Github.com/KFCFans/Pow…


Follow the public account to join the communication group (author in Java Group)