How to shorten the idle fish message processing time effectively

Author: Leisure fish technology — book leisure

background

With the rapid growth of users, Xianyu IM is also facing unprecedented challenges. After years of business iteration, the code of side IM has not been clear enough due to years of iteration hierarchy, and the data synchronization problems hidden in the message have been magnified with the increase of users.

Analysis of the situation

Background need to be synchronized to the client side of the packet, the background will be depending on the type of business data packets into different data fields, the existence, uniqueness and continuous packets inside the corresponding domain, the serial number of each packet sent to the client side and has been successfully after consumption, end side can record every moment data fields have been synchronized version number, The next data synchronization starts with the number of the local data domain and is continuously synchronized to the client.

Of course, users will not wait for messages online all the time, so the combination of push and pull was used to ensure data synchronization.

Use ACCS to push the latest data content to the client in real time when online. ACCS is taobao wireless to provide developers with full duplex, low latency, high security channel services.
After offline startup, the data difference when offline is pulled based on the local data domain id.
When a black hole (packet Version discontinuous) occurs in data acquisition, synchronous data pull is triggered.

Such a synchronization strategy can basically guarantee IM data synchronization, but it also comes with some implicit problems:

In the case of short time intensive data push, multiple data domain synchronization will be triggered quickly. If the data returned from the domain synchronization is faulty, a new round of synchronization is triggered, wasting network resources. Redundant packets/invalid data contents occupy processing resources of valid contents and waste CPU and memory resources.
The server does not know whether the client consumes data packets in the data domain and can only passively return data based on the current data domain information.
The logical separation of data collection/message data body resolution/storage drop is not clear enough, so it cannot ABTest the code separation and replacement of a specific layer.

What to do

In view of these problems, the layering of idle fish IM is reformed and the data synchronization layer is removed. In addition to the hope that this data synchronization content can be used in IM in the future, we also hope that with the increase of stability, other business scenarios can be enabled.

This paper focuses on some solutions to the synchronous IM data on the lower side of the idle fish.

Data synchronization optimization

Split & stratify

For the server side, after the service side produces data packets, the current data domain information will be spliced together, and then the data will be pushed to the end side through the data synchronization layer.

After receiving the data packet, the client determines the service side that needs to consume the data packet based on the current data domain information. After ensuring that the data packet is complete and continuous in the data domain, the client unshells the data body, submits the data to the service side for consumption, and responds to consumption.

Data synchronization layer extraction, add the data synchronization in the shell, shell, calibration, retry procedure package together, can make the upper business only need to care about their need to monitor data domain information, and then when the data fields to update data, to consumer access to these data, and no longer need to care about packet is complete.

On the service side, you only need to care about the protocols interconnected with the service side, and on the data side, you only need to care about the protocols packaged on the data side. The network layer is responsible for real data transmission.

Align data layer data transfer protocol and describe the current data packet body data domain information
Separate the processing/merging/dropping of messages into data consumers
Up and down rely on abstraction, removing the dependency on concrete implementation

Data layer structure model

The data synchronization layer is split into the following architecture based on the data model stripping and the solution to the current problems:

Step1: establish ACCS long link service when App starts to ensure push channel link, and trigger a data pull according to the current local data domain information.

Step2: the data consumer registers the consumer information and the data domain information that needs to be monitored. Here is a one-to-many relationship.

Step3: after the new data arrives at the end, the data packet is put into the buffer pool of the specified data domain. After the conclusion of a batch of data induction, the data reading starts again.

Step4: pop up the data packet with the highest priority according to the current data domain priority and judge whether the data domain version meets the requirements of consumers. If yes, the data packet will be unhulled and thrown to consumers for consumption. If not, the incremental data domain synchronization pull will be triggered according to the domain information of the last correct data packet.

Step5: when triggering synchronous pull of data domain, block data is read. At this time, the data that is reached through ACCS will continue to be summarized into the specified data domain queue and wait for the result of synchronous pull of data domain. Then, the data packets will be sorted and de-weighted and merged into the corresponding data domain queue. Then reactivate the data read.

Step6: After the packet body is correctly consumed by the consumer, the domain information is updated and the server is informed of the correct data domain information through the uplink channel.

Data domain synchronization protocol

A Region does not need to carry too much data, but the data packets must be clearly described

ID of the target user to determine whether the target packet is correct
Data domain ID and priority information
Domain priority version of the current packet

Sorting strategy

For domain data, both at the time of writing data to sort and in reading to find all needs a sort of operation, the optimal time complexity is O (logn) level, found in the actual coding because in a data field, packet Version information is the only continuous and there is no fault, Therefore, Map storage with Versio as the main key is adopted here, which not only reduces the time complexity, but also enables the content of the packet arriving at the end of the uniquely identified packet to cover the content of the previous packet.

Some problems & solutions

Balancing multiple data sources and unique data consumption

Whenever a data packet is generated for the current user, if the current ACCS long link exists, the data packet will be pushed to the client through ACCS. If the App switches to the background for a period of time, or is directly killed, the ACCS link is disconnected, then it can only be pushed to the user’s notification panel offline. Therefore, every time the App switches to the active state, it needs to trigger a data synchronization from the background according to the data domain information stored locally

The source of data packet reaching the end is mainly the push of ACCS long link and the pull of domain synchronization, but the consumption of data packet is the only consumer divided according to the monitoring of data domain, that is, only one data packet can be consumed at the same time.

In pressure test, in a short period of time when the background concentration of packets through ACCS side when pushed to the end, end side receives the packet is not orderly, and discrete data packet domain version will trigger a new domain synchronization, a packet will lead to the same through two different channels to repeatedly touch to end side, wasted unnecessary traffic.

When the data domain is synchronized, the new data packets generated by this time node will also be pushed to the end side. The data body is valid and needs to be consumed correctly.

Solutions to these problems:

A data intermediate layer is loaded in the middle of data consumption and data acquisition. When data domain synchronization is triggered, block data is read and data packets pushed down by ACCS are stored in a data transfer station. When data domain synchronization pulls back, the data is merged and the data reading process is restarted.

Data domain priority

Need to push to the end side of the packet, according to the different priorities of the business have a different division, the user and the user’s chat the packet than operations class message priority, so much as priority packets quickly arrived at the client side, a high-priority packets need to be priority consumption data domain, The priority of the data domain is also a priority policy that needs to be dynamically adjusted and constantly changed.

Solutions to this problem:

Different data fields generate different data queues, and packets in the high-priority queues will be read and consumed preferentially.

The data domain information brought back from each packet body can be marked with the current data domain priority. When the data domain priority changes, the packet consumption priority policy can be adjusted.

The optimization effect

In addition to the hierarchical structure, the data synchronization layer and the dependent service content can be conveniently decouplable and each link can be pluggable. In data synchronization, the message consumption time/traffic is saved, and the optimization effect is more obvious in pressure test scenarios.

Pressure test scenario: 100 out-of-order data packets are pushed within 500ms

Message processing time (receive – screen up) increased by 31%

Traffic loss (the cumulative size of data packets eventually pulled to the end) is reduced by 35%

The follow-up plan

The capability of the data synchronization layer is improved

The goal of data synchronization side is not only to ensure the complete arrival of data packets to the end side, but also to reduce data pulling as much as possible on the premise of ensuring stability, so that every data acquisition is effective. The subsequent data synchronization layer will proceed to further optimize the effective data rate and arrival rate.

Dynamically and intelligently adjust the data synchronization priority based on different scenarios.
Blocking long link push ensures that only push mode or pull mode exists at the same time, further reducing the push of redundant packets.

Xianyu IM end – side overall architecture upgrade

Upgrading the data synchronization layer strategy is mainly to improve the idle IM capability. After stratification of data synchronization, the next step is to streamline the message processing, monitor and trace each process, and improve the correct parsing, storage and dropping rate of IM packets.

After the data source side is removed, the subsequent RECTIFICATION of IM will gradually separate the message processing layer
The key nodes of message processing are reported in a process, and a complete monitoring system is established to find problems before users’ public opinions
Dynamic self-check of message integrity to minimize data compensation and completion.

The subsequent xianyu side IM experience upgrade series of articles welcome your attention

Note: the update will be released in mid-November, please look forward to it