Abstract: Using Redis to achieve room business management practice and thinking.

Article | namely constructs the business backstage development team

In some interactive scenes, such as voice chat rooms, e-commerce live broadcasting, etc., interactive functions such as member control, connecting the mic, offering flowers, and sending bullets are usually required to store and manage the data of the room and the members in the room.

Then how to organize, store and operate these data to complete the established business, and at the same time to ensure data consistency between the server and the client, is one of the problems to be considered in the business background to realize such audio-video interactive scenes.

RoomKit, as a new low-code Engagement Platform (LCEP) product launched by Architecture Technology, highly abstracts the general capabilities of audio and video calls, whiteboard graffiti, file presentation, real-time messaging and other functions. The module functions can be assembled arbitrarily. Users can build multiple business scenarios with low/zero code. Therefore, in the background logic of Roomkit, room data management, as a core part of the business, runs through the whole development process.

As a high-performance KV database, Redis is widely used in background development. We also use Redis for room data management in the background of Roomkit.

In this paper, we will take a look at the technical difficulties and solutions encountered by the construction background development team when implementing the business with Redis. Readers can also refer to this article to solve similar problems encountered when implementing their business with the Construction aPaaS layer.

I. Introduction of Roomkit background

1. Division of functional modules

Based on the business logic, RoomKit divides the code into two modules: a room control module and a function plug-in module. The functions of these modules are described in detail below.

  • Room control module

The room control module is mainly used to manage the room list, the room state and the state interaction of the members in the room.

RoomKit applies to a variety of scenarios, such as large class, live broadcast, small class, video conference and 1V1, but these scenarios can be actually divided into quasi-video conference scenarios and quasi-live broadcast scenarios.

The logical focus of the two scenarios is different. In a video conference-like scenario, there are relatively few participants, but the interaction between members is frequent. However, the number of participants in the quasi-live broadcast scene is generally large, but the interaction between the anchor and the audience is relatively small. Based on this standard RoomKit background divides the room control module into two sub-modules. Irrelevant to the topic of this article, the description is not expanded here.

  • Function plug-in module

Functional plug-in modules refer to plug-in functions supported by RoomKit, such as sharing, teaching plug-in, IM, etc. Their logic has nothing to do with the scene, and they can be developed as independent modules. Interaction with the room control module is completed by providing handlers to each other.

2. Background service architecture

RoomKit background manages room data based on Redis, and makes use of the real-time push capability provided by the instant signaling background to push notifications of room status changes to the client in real time. The following diagram shows the architectural relationship between the backend and other services and clients.

Second, use Redis to manage the key technology of room data

Here we mainly take [room control module] as an example to introduce some key points when Roomkit background realizes room business by using Redis.

1. Use Redis to store room data

In order to realize in-room interaction, Roomkit background needs to record the status information of rooms and members. To share this data between business server programs, we chose to store the data in Redis.

The Hash structure of Redis can naturally be used to record room status and other data. For setting up a room, such as starting a class, you only need to change the corresponding field.

To keep track of the currently open rooms, the Roomkit background records the room ID and creation time in a global ZSET. The background periodically traverses these rooms to process statistics, check for offline members, and so on.

The member status in the room will also be recorded in a hash structure, and the member ID will be recorded in the ZSET corresponding to the room ID, where score is the login time or the last heartbeat time of the member, and each heartbeat of the member will update the score to the current time.

With ZSET’s sorting by score feature, the background can easily screen out offline members and move out of the room.

2. Redis Key is designed without expiration time

Memory leaks are common when writing code, and key leaks also occur when Redis is used to store data. When Redis is used as cache middleware, it usually sets an expiration time on the key to avoid key leakage. But in Roomkit, room data destruction happens at the end of the room, which is controlled by the members, and setting too short a TTL will result in data loss, while setting too long a TTL will actually cause Redis key leak problems.

To address this problem, Roomkit’s background is designed to have no expiration time, meaning no TTL is set for the key.

To prevent key leaks, in the Roomkit background, each dynamically created key is recorded in a set of fixed keys from which all keys can be indexed at destruction time.

For example, in the sharing module, a SET structure with key personal_share:{uid} is created to record the shared content created by the member in order to close the shared content when the member exits the room. The key itself will be logged into another SET structure with a share_recycle_bin key.

At the end of the room, the data is cleaned up by getting and deleting the key recorded in cycle_bin.

local keys=redis.call("SMEMBERS","share_recycle_bin")
redis.call("DEL",unpack(keys))
Copy the code

In addition, Roomit optimizes batch deletion to avoid long Redis execution blocks caused by too many keys to be recycled.

3. Multiple machines cooperatively traverse the task list

In order to check and count each room, the room list needs to be traversed periodically to take out the rooms to be checked for business definition check. If the list is long, it takes a long time to complete the check on a single machine. In a stand-alone programming environment, this kind of problem is usually solved using techniques such as thread pooling.

In a multi-machine environment, we expect these tasks to be distributed evenly across the servers, which requires collaboration between the servers.

Roomkit background uses Redis single thread execution feature and ZScan mechanism to realize distributed cooperative traversal.

The brief implementation is as follows:

local now = tonumber(redis.call("TIME")[1])
local cursor = redis.call("GET","cursor_key") 
if cursor=="0" then 
    local next_check_time = redis.call("GET","next_check_time_key") 
    if tonumber(next_check_time)>now then 
        return next_check_time-now 
    end
end
local scan_result = redis.call("ZSCAN","room_zset_key",cursor) 
redis.call("SET","cursor_key",scan_result[1]) 
if scan_result[1]=="0" then redis.call("SET","next_check_time_key",now+scan_interval) end 
return scan_result[2]
Copy the code

A single node performs a walk, using the ZSCAN command to get a batch of room numbers to check, and updates the returned CURSOR values to the global CURSOR. If the cursor is 0, it indicates that the traversal is complete, and the time of the next traversal will be set at this time. This time is checked on each iteration and returned if the start time is not reached.

The room traversal logic does not directly interact with the client and supports horizontal expansion. In actual deployment, it can be used as an independent functional component to dynamically adjust the number of workers according to the business load.

4. Seq and final consistency

In a strong interaction scenario, it is very important to keep data consistency between the client and server, otherwise state confusion will occur and affect the interaction effect. So we use SEQ mechanism to ensure the logical order of data changes.

In fact, we can regard the data in the room locally held by the client as a copy of the room data held by the background, so that the client (regarded as follower) and the background (regarded as leader) can be regarded as a distributed data storage system.

After changing data, Roomkit background needs to push change notification to all clients through the in-room broadcast capability provided by the signaling background, including change event and detailed data of change, so as to keep the client consistent with the server.

However, the network on which the client resides is very complex. In a weak network, notification loss and disorder may occur, which may cause abnormal status of members and rooms.

Due to the inevitable weak network situation, in order to ensure that the local data held by the client can be consistent with the background data, Roomkit adds a SEQ number for verification on each notification. When the background data status changes, SEQ will be increased, so SEQ actually represents the version number of background data.

When entering the room for the first time, the client will pull the full data and the corresponding SEQ, and then use the data in the notification to make incremental updates to the local data and advance THE SEQ. In addition, during the heartbeat of the client, the background also returns the latest SEQ. If the client finds that the LOCAL SEQ is inconsistent with the seQ returned by the heartbeat, the client pulls the full data again to keep the data consistent with the background.

Using SEQ can also avoid some of the problems of long notifications caused by all operations.

For example, in a small class, if all changes are placed in a notification, the message will be too long to be sent. Therefore, the background is optimized for such notification of all operations, only sending the event itself, not the change data. After receiving the notification, the client checks whether SEQ is the next SEQ of the local SEQ. If yes, the client considers that all operation data is consistent with the background data and can perform local operation replay to make the data consistent with the background data. Otherwise, the client needs to pull global data for overwriting.

5. CAS operations

Race conditions are the most common problem in concurrent programming, and they can occur in multi-machine environments as well. This is usually avoided using critical section techniques such as Redis Lua, but what if business flows cannot complete within a critical section? This is where we need to use CAS.

The Roomkit backend uses Redis’ Lua scripting mechanism for some complex logic implementations. However, as we all know, Redis is executed by single thread. If the Lua script is complicated, it will take too long to execute and block the execution of other commands. So the Roomkit background splits some of the excessively long LuA and uses CAS to avoid race conditions.

For example, in speech mode, the value of the speaker field in the hash structure of the room state is set to the ID of the current presenter. If the presenter exits the room directly, the background also selects an online member to be set as the presenter while processing the general member exit logic. However, the logic for selecting the next presenter is complicated. If the member exit logic is used together, the execution may be blocked. And if it is divided into two lua script execution, may appear such a situation: the members, set up speaker script begins to exit the script has been completed before execution, there is another set speaker’s request to the backend and succeed, this time if the execution is set speaker script, will cover set the request of the speaker, lead to abnormal performance of the client.

Solution 1: Use CAS to check whether the value of the speaker field has changed before executing the script. If the value has changed, abort the script.

local speaker=redis.call("HGET","room_stat_key","speaker")
if speaker~=left_speaker_id then return end
-- select next speaker...
Copy the code

There is still a flaw in this solution: when the process crashes before setting up the presenter script to start executing, the presenter field value will not change, causing the member to exit the room, but the presenter is still the member’s logically inconsistent system state.

Solution 2: In the member exit script, set the speaker field to the initial value, that is, the null value, indicating that there is no speaker at present. In this way, if the process crashes, the system can still maintain logical consistency even though the speaker setting script cannot be executed.

This changes the logic of setting the speaker script to check whether the value of the speaker field is null, and if not, abort execution.

-- user left script
--...
local speaker=redis.call("HSET","room_stat_key","speaker","")
--...
​
-- set speaker script
--...
local speaker=redis.call("HGET","room_stat_key","speaker")
if speaker~="" then return end
-- select next speaker...
Copy the code

Three, endnotes

This paper summarizes how Redis is used by the backend development team to implement the Roomkit backend business. In the distributed environment to ensure the correctness of business, ensure the consistency of data, is the vast background developers unremitting pursuit of the goal, thinking and practice about these problems hope to help readers.