background

Recently, there is a business demand, the background is as follows: there is a breeding game, by feeding the pets to upgrade, after feeding, the pets need 4 hours to eat. Now there is a new requirement to use item cards to enrich gameplay. Prop card has two kinds, one is speed card, one is automatic feeding card. The acceleration card shorts the time it takes to eat by up to two hours, and the automatic feeding card helps the system automatically feed the pet once it has eaten the current dog food.

Auto-feeding in business requirements is a typical delayed task. A delayed task is one that needs to be triggered automatically at a specified point in the future. Similar scenarios include:

  • 2 hours before the end of the activity to push messages to users;
  • 2 hours before the expiration of coupons to users push messages;
  • Second kill, the order will be automatically cancelled if the payment is not made within 10 minutes after the order is placed;

Industry Solutions

Sweep the table

For the delayed task, the common scheme is to sweep the table. Scanning a table is to use a background process to scan the entire data table of the database every once in a while to determine whether each task has reached the trigger condition. If the conditions are met, the corresponding business is executed. Scanning the full table puts great pressure on the database, so it is generally selected to scan the slave database. The biggest advantage of scanning the table is that it is relatively simple to implement, and the data itself is stored in the DB, so there is no need to worry about the task data will be lost, failed tasks can be re-entered in the next scan. However, scanning tables has the following problems:

  • It takes a period of time to sweep a whole table, which will cause delay in triggering tasks. Sometimes, a process will sweep multiple tables.
  • It is not possible to scan the table too often, because too often will cause too much pressure on the database, and it can only be scanned again at a longer interval, which is usually at least one minute. This also causes task delays;
  • The scan table sweeps the slave library, and there is a delay in master/slave synchronization. Especially when big transactions occur, it will lead to a delay of several minutes or even hours.
  • The method of scanning the table is very cumbersome, scanning a whole table at a time, but there may be few tasks that need to be triggered, and the resource utilization is very low.

The biggest problem with table scanning is that it will have delay and cannot be triggered within the specified time. For scenarios with high timeliness, this scheme cannot meet the requirements.

Delayed message queue

Currently, some MQ message queues can support delayed messages, such as Kafka. After a message is sent, you can specify how long it will take before it is sent to the consumer. This solution also has a low development cost, but requires the use of middleware that supports delayed messages. And one of the bottlenecks is that if you have a delayed task that needs to be updated you can’t do it because the message has already been sent and can’t be retrieved.

Time slice polling

A time slice is made of a ring queue, and a linked list is maintained in each cell of the ring queue. Each time there is a current pointer to a cell in the ring queue, and each time the timer times out, the current pointer points to the next cell in the ring queue. And then work on the tasks in the list that this cell holds. If you just maintain this, if you want to do second granularity for up to a day, then the ring queue will be very large. So, someone improved it, when there is a task in the queue, divide the length of time by the length of the ring queue, and call it the winding number. In this way, when the element is traversed, the winding number is reduced by one. If the winding number is 0, the change task will be executed. Otherwise, the change task will not be executed.

Kafka’s internal implementation of delayed messages is implemented by time slice polling.

For a scene with a very large time span, using this method will lead to a large number of elements on the linked list, and the cost of traversing the linked list is not small, even in a time slice traversing. Therefore, a further improvement was made to divide the time slices into different granularity. For example, a time wheel with granularity of hours, a time wheel with granularity of minutes, a time wheel with granularity of seconds. The time wheel in hours will be put into the minute time wheel when it reaches the trigger condition, and the minute time wheel will be put into the second time wheel when it reaches the trigger condition. (Image from the Internet, deleted)

The time slice in this scheme is stored in memory, so the polling efficiency is very high, and the time slice can be adjusted according to different granularity, so it is also very flexible. However, this solution needs to implement its own persistence and high availability, as well as storage management, and it will take a long time to develop without ready-made wheels.

Redis ZSET implementation

Redis implements delayed tasks through its data structure ZSET. ZSET stores a score and a value. Values can be sorted by score, but sets are unordered.

The implementation of delayed task is divided into the following steps:

(1) The execution time of the task is taken as score, and the task data to be executed is taken as value and stored in ZSET; ZRANGEBYSCORE key -INF + INF limit 0 1 withscores ZRANGEBYSCORE key -INF limit 0 1 withscores (3) If the minimum score is less than or equal to the current timestamp, the task will be taken out to execute, otherwise sleep for a period of time before query

Redis ZSET is implemented by skip list, complexity O(logN), N is the number of elements stored in ZSET. Redis can rely on its own persistence to achieve persistence, redis clusters to support high concurrency and high availability. So the development cost is very low and it can be done in real time.

The specific implementation

The time delay of table sweep method is too high to meet the real-time requirements. The message queue currently used by the team does not support delayed message queue, and the time round method is time-consuming to develop, so Redis is finally selected to achieve it.

The principle of Redis to realize delayed tasks has been introduced previously. In order to achieve higher concurrency, it is necessary to design on the basis of the principle. The implementation will be detailed next. The architectural design drawing is as follows:

Description:

  • To prevent a key from being stored after the data quantity changes, first, the query speed slows down because the time complexity is O(logN). Second, if there are multiple tasks at the same point in time, a key cannot be distributed, resulting in congestion. Therefore, we design it as multiple keys to store, and hash route to corresponding keys by UUID. If the number of tasks increases, we can quickly expand the number of Redis keys to resist the increase.
  • Set up the same number of processes or threads with multiple keys. Each process has a number corresponding to a key, and poll the corresponding key continuously.
  • The process that polls the key is called the Event process. The event process only queries the task, but does not process the business, and writes the task to the message queue. In addition, Work fetches messages from the message queue and then executes the business. In this way, work can be distributed, while event can only be distributed. In this way, the concurrency can be very high. Even if there are a large number of tasks at the same time, the task can be completed within a small delay.
  • In order to avoid the single-node deployment of the Event process, the data stored in Redis will be overstocked when the machine is down and messages cannot be fetched. We deploy the Event process on multiple machines and use ZooKeeper to select the master. Only the process on the Leader host gets messages from Redis. If the leader host is down, ZooKeeper automatically selects a new leader.
  • In real business, you also rely on DB to write data. When a delayed task is generated, DB is first modified and then data is written to Redis. In this case, DB update is successful and then redis write fails. In this case, retry is used to reduce the probability of redis write failure.

In addition to the delayed task, the time of the delayed task can be changed if the execution time of the delayed task has not reached. To achieve this requirement, we maintain a ZSET for each user, which stores all the delayed tasks of the user. For the sake of description, we call this ZSET zset-user. If the user needs to modify his delayed task, if there is no way to find the task from the ZSET of the whole delayed task, but even if it can be found, it can only traverse the ZSET, obviously this method is too slow and consumes too much resources. The method we adopted was to take the delayed task of this USER from zset-user, modify score, and finally re-zadd to the delayed task ZSET and zset-user. ZADD would cover the original task, while score was updated. Thus, this requirement can only be fulfilled through Redis.

This article, through the background of business requirements, first discusses the implementation of delayed tasks in the industry, and then elaborates on the method of realizing delayed tasks through REDis, and analyzes the design ideas of high concurrency and high availability.