Author: Jiang Wenhao (Four)

Complexity of message client calculations

In the design of the client side, the general layer will contain at least the lower layer of data service layer and the upper layer of UI layer. The lower layer of data model is mainly determined by the domain, relatively independent and stable, while THE UI is more variable, and can combine a variety of data. Due to the relative variability of the UI and the relative stability of the model, between the data layer and the UI, data needs to be processed several times before it can be presented to the UI. For simple cases, such as converting the timestamp of original data to the string required by PD, the data processing logic will be complicated if it involves the association of different data, paging loading and change calculation.

Message as a rich client, this part of the logic is very complex, coupled with the existence of state, can be said to be one of the most complex logic in the message client, this complexity is mainly reflected in the following dimensions:

  1. Local data: The client only stores part of the message data, and when obtaining data, the local data is not complete, it needs to request the server asynchronously, and it also needs to support the upper layer specified request strategy, which makes the interface cannot adopt the form of request-response, and must use the streaming interface, the separation of data callback and result callback, and multiple data callback. Increased the complexity of processing logic;
  2. Support change synchronization: In addition to active pull, session message data needs to support push of changes, and data (including caching and UI presentation) needs to be consistent for all changes;
  3. Multiple data sources: For historical reasons, there are multiple sources of the same type of data (such as a session) for a message, so you need to request multiple times, merge multiple data callbacks, handle errors, and try to load as fast as possible. Through the efforts of many students, OpenIM and DT data sources were removed from Taobao and Qianniu. Today, users still have BC, CC and IMBA sources for sessions and messages on Taobao and Qianniu.
  4. A variety of data aggregation: UI display needs to aggregate session, message, Profile (avatar nickname), group, group member information and other business data to aggregate multiple relevant data together according to different rules.
  5. Support paging request: the total amount of data is relatively large, need to load through the paging mechanism, in addition to the standard paging load, but also support to a certain message from the middle to start loading, which appears bi-directional paging load, as well as enter and exit the middle load state conversion and exception processing;
  6. Multiple data merging: due to the need of business, news updates and replace relationship between (such as the logistics of the same order status updates), pull the new data to modify existing message status data, rather than simply add to the head or the tail, the new message will lead to existing news updates as well as the position changes in the structure of the data;
  7. Complex data structure: The message has two UI forms, list and tree, and the corresponding state also has two forms. The calculation logic for the change of these data structures is complicated. For the tree, it also needs to support virtual node calculation and dynamic structure change.

This logic involves all the core service data models such as session, message, profile, group, group member and relationship in the message client, with a total of about 25,000 lines of code, accounting for about 8% of the total message code, which is the core data processing. Because these logics are easily coupled together to form higher-dimensional logics, represented by a large number of conditional branches and recursive nesting, this higher-dimensional logic is difficult to write, difficult to maintain, and takes up a lot of package sizes, so it is necessary to abstract and simplify these logics.

The target

  1. Build another layer of abstraction on different models, different interfaces and similar logic to unify client data processing;
  2. Simplify the high-dimensional data processing logic into a clearer processing model, reducing the code volume by 60%;
  3. Achieve data processing of the two – end consistency.

Analysis of message data processing

Usually, the client is divided into three layers: data service layer, logic layer and UI layer. The data acquisition and calculation will be classified into the logical layer. The problem here is that the data service layer corresponds to the field definition, the UI layer corresponds to the rendering, animation and interactive event handling, such logic layer is easy to become a suture, data request, data conversion, context, asynchronous processing, maintenance management, change the synchronous recursive logic, status, all does not belong to the other two parts will be thrown into the logical layer, The logical layer is bloated.

The left side of the figure below is the working content and upstream and downstream of this process, and the right side is the data flow and calculation process of data pulling and change processing:

It can be seen that it is too broad to define this part of data processing only as logic, which is not conducive to targeted optimization, so it is necessary to conduct in-depth analysis and research.

After summarizing, decomposing, analyzing and synthesizing the six core data processing links of session, message, profile, group, group member and relationship, we can simplify the data processing process into the following process:

  1. Request session \ message data for each channel, combine multiple result callbacks into one result callback, process multiple data callbacks, request Profile, group, group member, relational data, business data corresponding to session \ message;
  2. Establish associations between session \ messages and profiles, groups, group members, relational data, business data, generate aggregated data, and handle dependencies, priorities, and cache consistency among aggregated data;
  3. The data into the array \ tree structure, support the request to the data and data structure of the existing data for replacement, update and merge calculation, support tree structure and virtual node dynamic calculation, support UI local update;
  4. Respond to change events such as addition, deletion and modification of various data, calculate change and result according to event processing, and ensure data consistency;
  5. When entering and exiting the intermediate load, it deals with the correctness of various data cache, association relation and loading information.
  6. Support special logic, such as new data is not sorted by time, but directly added to the head or tail;
  7. Each logic in the middle of the exception handling, timeout mechanism, thread synchronization, screen time optimization, logging, monitoring and other logic.

We can classify these logics into two categories:

As the logic of computation

Corresponding to processes 2, 3, 4, 5, 6 above.

If we regard this logic as a black box and care about its input, output and function, we can conclude that the core work of this logic is to convert various input data into specific output data, which perfectly corresponds to the concept of computing, namely:

Based on the concept of calculation, this calculation process can be formally abstracted into a function F, so as to realize the abstraction of state calculation. The figure above intuitively shows that the input parameter is the input and current state, and the output is the new state and result:

f :: (Input, State1) -> (State2, Result)

Let’s analyze the input and output parameters and form of function F:

First, the Input can be pulled data, data changes such as adding, deleting, or explicit events such as message read. Here we can unify all events by defining insert, update, and delete, since all events logically must be uniquely mapped to these three events (although in practice, RemoveAll and Reload are also supported because some services don’t have the ability to calculate change details).

For a data insert event, its corresponding f must be in the form of \state -> insert someData into state, that is, Input has been included in the implementation of F. Therefore, function F can be further simplified as:

f :: (State1) -> (State2, Result)

Where the form of f is determined by the output events, resulting in a very simplified function abstraction.

Third, the analysis of the above can draw an inference, namely events and functions are equivalent (conversion) each other, this makes it can be used to handle events to achieve the function of processing, which can be optimized through data processing computing performance, you can see, the boundary data and process of break gives us greater ability.

Fourthly, the State parameter needs to include aggregated data, so data association needs to be processed. Generally, we can abstract the data association scene into a form of one master data corresponding to multiple affiliated data, and judge the association relationship by defining a pair function:

pair :: (mainData, subData) -> Bool

In this way, the pair function can be injected to implement the association between master and subordinate data, and then the associated data can be aggregated.

Fifth, State also involves data \ tree structure calculation, which is different in different scenes. It can be abstracted as a DataStructure, define add, delete, change and check interfaces, and then use different datastructures in different scenes.

As structured data acquisition logic

Corresponding to procedures 1 and 6 above.

The function of this part of logic is to obtain data such as session message Profile and monitor change events. As the interfaces of the six major services are different, the previous implementation was one-to-one interconnection. After the abstraction, we can realize the abstraction of this part of logic by defining the Inject of pull interface and change interface, which belongs to the standard operation and will not be described again.

This part of the data acquisition of the second characteristic is the request of parallel distribution and vertical combination, for example, have more than one channel is parallel to the request of the data request need to each channel, each channel request according to the request of the different strategies and every step of the callback data to the next request (here the difference between the Future/Promise with the standards, The tasks of Future/Promise steps are different, the logic of the next step requires the data of the previous step, and the logic of the next step is the same, maybe the last step is local, and the next one is remote, so it can be simpler than Future/Promise).

If there is no abstraction, there is at least three dimensional logic, that is, to fetch the master data for multiple steps of multiple channels, and then to fetch the subordinate data for the master data, the logic will be very complex to write. The key here is that each channel request, each step request is very similar, mainly the structure of multiple requests is different, and the structure of the data request is determined by parameters and data, so it can be called structured data acquisition, that is, it can be simplified by abstraction of the request structure.

Core functions can be defined for parallel and vertical combinations of structured data retrieval tasks:

dispatch :: [param] -> [task]

compose :: strategy -> task

The Dispatch function corresponds to flatMap in Rx. However, since Mobile iOS does not integrate RxSwift and OpenCombine, the official Combine framework can only be used on iOS13, so it has to implement a lightweight one.

This abstracts and simplifies task acquisition by structuring tasks through dispatch and compose.

Technical solution

Core Technical Solution

Core modules:

  1. MergeDispatcher: realize the structuralization of data acquisition, unify data and changes into changes, and deal with all exceptions;
  2. Calculator: Realized the association and aggregation of master data and affiliated data, multi-thread synchronization of calculation, and change reporting;
  3. DataStructure: Perform structural calculations of master data.

In addition, Inject provides request interface and change event for calculation, is the injection point of all data, the upper layer obtains data structure composed of aggregated data after calculation through ModelService, and change event.

Call relationships and data flow

The ModelService initializes DataStructure, CalCulator, Master data, auxiliary data Inject, and initializes MergeDispatcher

  1. When the UI needs data, the Load interface of ModelService is called;
  2. ModelService calls MergeDispatcher’s load interface directly;
  3. MergeDispatcher invokes the Load interface of the master Data Inject in parallel, and invokes the load interface of the auxiliary data Inject to request auxiliary data every time the master data is called back. The corresponding timeout logic is executed according to the scenario, and the master data and auxiliary data are given to Calculator for calculation. The data after the timeout also continued to be given to Calculator for calculation;
  4. Calculator performed multi-thread synchronization of calculation, updated the cache of master data, affiliated data and associated relations, generated aggregate data, and fed the master data to the DataStructure calculation structure, and then reported the returned full amount and changes.
  5. After receiving the data, the data structure adds, deletes, and modifies the current state (data structure) and returns the corresponding new state and change array.

Technical effect

Finally, we realized the separation of calculation and data acquisition, the whole calculation process was in Calculator, and the data acquisition was mainly in MergeDispatcher. The two parts were realized independently without coupling, and the logical level was reduced from the original number of models * number of interfaces * data structure to the number of events * data structure. The processing model is very clear and applies to any model.

For the calculation process, a higher-order calculation function f :: is abstracted logically. (State1) -> (State2, Result), this function is very simple in form, but it tightly captures the essence of such complex state calculation, so that we can unify the calculation process. The correctness of the whole calculation process has a complete theoretical basis, and the subsequent addition of the model will not increase the calculation logic.

For data acquisition, we typed the data into primary data and auxiliary data, and abstracted the structure of the request, so as to realize the unification and simplification of all data acquisition.

Pay attention to [Alibaba mobile technology] wechat public number, every week 3 mobile technology practice & dry goods to give you thinking!