Multi-dimensional Optimization: A deeper consideration of high concurrency strategy at the front end

WeTest takeaway

The improvement of an index is always accompanied by the implementation of corresponding optimization strategies. Optimization is not a simple one step process, but a process of continuous iteration and overturn. The deeper optimization scheme is often the optimal tradeoff result of the current situation after a deep understanding of the problem scenario and the advantages and disadvantages of the basic strategy under a certain thinking strategy. In this paper, the author starts from the specific point of front-end high concurrency optimization, and gradually elaborates some thoughts on the thinking level of the author in the optimization “art”. I hope I can give you a resonance and understanding.

Background: The reason why I used to start with the theme of high concurrency, one is that I have been responsible for some super high concurrency business (hand Q red envelope), in this aspect is some experience. And is relative to the business function logic involves the optimization of this kind of front end light level, products, and many people to discuss cooperation and complete design optimization (that is, the logic itself is not a pure mind from front end staff), the front-end high concurrency this front end level logic pure by full control of the staff in the front, perhaps I as the front end, thinking can come out of view will be deeper and more general.

First, the general optimization ideas

Speaking of optimization, when you receive the “optimization metrics” task. Two things are usually done — analyzing the pain points corresponding to the “optimization indicators” and finding technical solutions to the pain points and implementing them. Is that enough? My answer is no. In my opinion, this is only the first layer of optimization. Although we can achieve better optimization effect by using better technology, it is not perfect, and the optimization effect can be better. So what was missing? Next, I will explain my optimization ideas step by step. First of all, the general optimization thinking is the foundation, let’s first look at the general optimization thinking, the basic front-end high concurrency strategy is what?

Two, analysis of essential pain points

The core difference between a high concurrency scenario and a normal scenario is that concurrent traffic surges. Therefore, the front-end high concurrency strategy is essentially to solve the problem caused by the surge of traffic. So what’s the problem with this surge in traffic?

Let’s start with a normal H5 access flow diagram:

Under normal circumstances, the data flow from the client to the background is very balanced, and the number of visits of users is within the range that the background can bear.

In a high-concurrency scenario, if no high-concurrency strategy is implemented, the original access flow graph will look like this (requests from the front end to the background in red will be rejected by the background and may even break the background) :

The pain point of high concurrency can be clearly seen in the figure: the data flow process is unbalanced at both ends. Addressing this pain point requires bringing both ends back into balance. We can start from two aspects. On the one hand, the backstage level should provide more carrying capacity as much as possible (such as increasing machines, etc.). On the other hand, in the front-end level, as much as possible to strengthen its streamlined filtering ability as the “door” between the user and the background.

After enhancing the thin filtering capability of the front-end “gate”, we expect to see an access flow diagram like this:

Although the number of concurrent users is very large, the pain point of imbalance between the two ends is solved under the strategy of high concurrency in the front end. So what are some of these high-concurrency strategies? Let’s look for them one by one.

Third, to find feasible technical solutions

The role of the front end “gate” is to enhance two aspects of the ability, one is streamlined, the other is filtering.

streamline

First, let’s look at the simplified technical solution. If the carrying capacity of the background is compared to a “circle”, the channel between the front end and the background is equivalent to a pipe with the width of the outlet of this circle, and the water in which can be understood as the request in H5. In H5, there are actually two such circles, one is the maximum concurrency and the other is the maximum flow. This corresponds to the number and size of requests in our parallel requests, which can be reduced to provide more “water” in and out while the area of the “circle” is fixed.

Therefore, in the simplified technical scheme, it is necessary to simplify the number and size of requests in parallel requests.

1. Simplify requests

We tend to focus on purely technical solutions when the number of requests can no longer be logically reduced (for example, by removing useless requests).

H5 request number reduction scheme, the current general scheme is as follows, the core is: merge.

The figure lists the types of resources commonly used in H5 (there are others such as video and audio, not to mention). As you can see, with the current technology listed in the figure, the reduction in the number of requests can be as extreme as possible. In extreme cases, it is possible for a business to have only one request.

2. Simplify the request size

Similarly, when the size of a request can no longer be reduced logically (for example, by removing useless functions, code), we tend to focus on purely technical solutions. H5 request size reduction scheme, the current general scheme is as follows, the core is: compression

As you can see, with current technology, each resource can be further compressed to reduce the size of the request.

filter

The above is the technical solution of the front end “door” simplification ability, then let’s look at the technical solution of the front end “door” filtering ability. Or just water pipe analogy, front-end filtration, can be understood as in the front “door” added a layer of rebound specific water net, for the water does not need to enter the rebound off (DO not know whether this analogy to water is appropriate, in short, to express is similar to such a truth). There are many ways to rebound the “water”, one is passive, that is, only allow a certain amount of water through, the excess part can not get in, this strategy is generally used in the background, called “overload protection”; The other is active, where the data is stored further forward at the expense of timeliness. At the front-end level, this is often referred to as “local caching”. When a request is made and the cached content is found on the front end, it is no longer necessary to access the server.

Therefore, in the technical scheme of filtering, the front end can be completed by caching.

1. Cache filtering requests

H5 request filtering scheme, the current general scheme is as follows, the core is cache.

Through the specific front-end cache technology, requests that originally need to reach the background can be directly obtained from the front-end cache to achieve the effect of “filtering”.

Fourth, the basic strategy under the general optimization idea

After completing the above two steps — analyzing the essential pain points and finding the feasible technical solutions, it is common to choose the appropriate solutions and apply them to our projects. For merge, we merge files of the same type. For compression, we compress all the uncompressed code; For caching, we consistently enabled long HTTP cache, used localstorage cache, and used offline packages. The overall strategy, although effective to a certain extent, but I think it is often not enough. To achieve a more thorough optimization, it is necessary to do more in-depth thinking and strategy adjustment on the optimization scheme and the optimization scene itself. And this is often driven by corresponding thinking patterns. Now I will talk about some of the thinking that I have summarized for deeper optimization, which will focus on differentiation thinking.

Differentiated thinking

Differentiation thinking focuses on differentiation decomposition of technology and scene after in-depth understanding of technology and scene, so as to achieve further technical optimization of each different scene.

From the previous two steps — analyzing the essential pain points and looking for feasible technical solutions, we learned that high concurrency can be dealt with from three aspects of merging, compression and caching at the front-end technology level. It is obvious that the more thorough these strategies are, the more concurrency the front-end layer can block. In fact, we often can’t do this, and have to choose one of the more moderate solutions.

For example, we do not consolidate the entire H5 project resources into a single request because of the impact on page access time. The reason is that, in essence, every purely technical strategy, with its advantages, is bound to bring more or less disadvantages, just as the saying that every good has its disadvantages. When this disadvantage becomes an issue that affects the core capabilities of the project, such as the page access time in the experience aspect, even the solution that can improve the concurrency capability is often not adopted after weighing the pros and cons. This is why I said earlier that only choosing a compromise optimization is not thorough enough.

After weighing the pros and cons, we usually choose a compromise plan (strategy 3 chosen in the following figure) :

A more thorough optimization should be to understand the influence of disadvantages caused by each scheme, conduct differentiated analysis of project scenarios based on the influence of disadvantages, and implement differentiation of strategies and schemes according to the tolerance of each scenario to the influence of disadvantages. For scenarios that can accept the influence of disadvantages, use the optimal scheme; For scenarios where disadvantages are less acceptable, a better scheme is used; And so on to using a compromise. So as to achieve the fine optimization of differentiation. After the optimization, the overall strategy plan will become similar to the following figure. The original project only uses compromise strategy 3, while after differentiated treatment, some project modules will be removed and better strategy 1 and strategy 2 will be used.

Next, I will use this thinking to further differentiate and optimize the three strategies of high front-end concurrency — merge, compress, and cache — from the previous two steps.

A differentiated merger strategy

Code merges, and at a certain point the disadvantages of merging become magnified.

Disadvantages are: a single request is too large, resulting in the impact on the page first screen rendering time; After static and static request merge (CGI + HTML), the requirement of cache timeliness will be greatly increased (cache timeliness depends on the highest requirement of each combined resource, the barrel principle).

According to the impact of each malpractice, the following is a differentiated analysis for specific scenarios.

1. Differentiation decomposition of relevancy of first-screen experience of resources

If a single file is too large after merging, it will affect the first screen rendering time. Then, we can start from the impact point and divide the page network resource request by differentiation according to the first-screen experience relevance (impact point), so as to minimize the impact of merger on experience. At its simplest, we can divide resources into two parts — high correlation resources (first screen) and low correlation resources (non-first screen). Each part of the resource is processed separately, as far as possible to achieve the ultimate merge, improve the concurrency ability. When again the merged file is too large to affect the rendering time, it is further classified in this level, and so on. For example, for CSS, first screen rendering related JS and image resources, they can be regarded as highly relevant resources, put the image base64 into CSS, and then all inlined into THE HTML page, and merged with the page. Js and image resources that are not related to the first screen are combined separately as low correlation resources. In this way, the time-consuming experience of rendering the first screen of the page is not affected, and the number of concurrent requests is minimized.

2, “resource time-dependent degree” differentiation decomposition

For static and static request merging (CGI + HTML), it affects the cache timeliness, resulting in higher requirements for cache timeliness. Then we can also start from the impact point, the page request according to the time-dependent degree (impact point) differentiation split. In the simplest way, we can divide pages into two categories — pages requiring high timeliness (the entrance is not controllable, so it is not cached) and pages requiring low timeliness (the entrance is completely controllable, which can be updated by modifying the page offline package and so on, and can be cached). For pages with high timeliness requirements, dynamic and static request merging will not affect the pages of this type, which can combine CGI and HTML. For pages with low timeliness requirements that can be cached (such as using offline packages), no CGI and HTML merge is performed.

According to the specific scene, the corresponding optimal merger strategy is adopted differentially, and the optimization effect will be further improved.

Differentiated compression strategies

Code compression, too, has its drawbacks. Disadvantages: The higher the compression degree, the worse the code readability, it is not convenient to locate online problems; Although there is a better compression algorithm, the algorithm itself has its own limitations.

1. Differentiated decomposition of “resource readable dependence”

For readability, the influence of the actually existing code level of solution on the market, such as project support the debug mode switch (this solution is a kind of difference thinking, according to the usage scenario differentiation into code readability demands high scenario and code readability low scenarios, such as online code belongs to low code readability, adopt extreme compressed version of the code; The code in debug mode belongs to the code with high readability requirements, and the uncompressed version of the code is adopted. The two modes can be switched parameterized), Sourcemap, etc.

2. Differentiation of “support degree of resource platform”

For the limitation of each compression algorithm (or the limitation of the product of each compression algorithm, for example, the image resource has a variety of formats, and each format has its limitation), it affects the part of the platform that it does not support (outside the limitation), which will lead to the unavailability of that part of the platform. From the point of impact, we can divide the network resources by platform support (point of impact). We can sort the schemes according to the compression effect of the compression algorithm, and make differentiated judgment and screening on the platform support degree of the scheme from high to low. If the scheme is supported, the current algorithm type (format) is used; if the scheme is not supported, the next algorithm type (format) is used.

For example, for image resources, image formats are rich and varied, and various formats actually come from different compression algorithms used by each format, and each format has its own areas of expertise. At this point, instead of just using the most common format, we should use the above differentiation idea to load images. According to the compression degree of each image format, the platform that supports TPG (called SHARPP in the company) requests TPG images, and the platform that does not support TPG decides whether to support WebP. Platforms that support WebP request images in WebP format. If they do not support WebP, go further. As for the field of image format, JPG format should be used for pictures with rich colors, PNG format should be used for those with simple colors or transparent channels, and differentiated selection of image format should be carried out according to the most suitable one. Even for image size (size has nothing to do with compression, but the objective is to reduce the size request, therefore, used in analogy), we can also adopt the differentiation ideas, such as according to the current client resolution, return to the optimal size of pictures to the client, client request returns so as to achieve the high resolution high resolution images, A low – resolution client requests a low – resolution image.

According to the specific scene, the corresponding optimal compression strategy is differentiated, and the optimization effect will be further improved.

Differentiated caching strategies

Similar to the above, the caching strategy also has its drawbacks. Disadvantages: The longer the cache time is, the worse the accuracy of the data will be, and there will be the problem that although the cache data is valid, it is quite different from the latest data.

1. Differentiation and decomposition of “resource time-dependent degree”

In view of the impact of resource validity on cache time, we can grade resources for timeliness. It can be roughly divided into update controllable resources and update uncontrollable resources. Uncontrollability here refers to whether the page can feel the update in real time after the resource is updated. Js, CSS, images and other resources deployed by front-end developers can be used as controllable resources for update, and extremely long cache can be set, because the version information can be synchronized to the front-end page in real time when such resources are updated (for example, the name of the pulled file is changed, the time stamp is changed, etc.). Resources that cannot synchronize version information to the front-end page in real time can be regarded as uncontrollable update resources. For these resources, we can make differentiated grades according to the timeliness requirements of the business for each resource.

Take the QQ profile picture resource used in the H5 project in Hand Q as an example. This scene looks like a resource that cannot be controlled to update the project. After users modify their profile pictures by using mobile Q or PC QQ, each H5 project will not be aware of the modification, and H5 will not receive the update notification in real time (unless both parties make synchronous notification at the interface level). At this point, if the avatars cache is set for a long time, the user will update the avatars, but they will still see the old avatars in the H5 project. However, if there is no caching, it is bound to cause great concurrency pressure on the avatar server in high concurrency scenarios. At this time, it is necessary to do further differentiation decomposition of this update uncontrollable resources.

For H5 projects with strong social characteristics in Hand Q (such as hand Q red envelope, hand Q AA collection, etc.), although there are many pictures, they still have different requirements on timeliness. At its simplest, we can divide avatars into two categories, high and low time-sensitive avatars. After a little analysis, in fact, it can be found that, for users, the user’s own (master state) avatar change is the most sensitive, if users modify their avatar on Q or PC QQ, after entering the H5 to find their avatar has not changed is not very tolerant. At this point, the user’s own profile picture can be used as a high timeliness profile picture. However, as for the timeliness of other users’ (guest state) profile picture, users actually do not care too much about whether it changes or not. Therefore, non-users’ own profile picture can be regarded as low timeliness profile picture. Finally, in terms of the strategy, the high time-sensitive avatar is cached for a short time. For less time-sensitive avatars, cache a relatively long time. (It’s also easy to implement, just put the differentiation logic in front of you and timestamp it to determine the cache time.)

According to the specific scene, the corresponding optimal cache strategy is differentiated, and the optimization effect will be further improved.

Fifth, more “dimension” optimization

Under the guidance of differentiation thinking, high concurrency optimization strategy has been further improved. The core idea of this thinking is to balance the advantages and disadvantages of the scheme with the actual scene. From the perspective of universality, this thinking is also applicable to many things at work. It is a universal thinking, and is not limited to solving the problem of high concurrency at the front end. At the same time, not all solutions can solve problems perfectly only by using differentiated thinking. Differentiation thinking is just one way of thinking. In fact, there are many ways. An excellent optimization scheme is often the final product of multi-dimensional thinking and trade-offs.

Boundary amplification thinking

For example, boundary-magnifying thinking means that when we are doing something, our vision should not only stay in the field that we can completely control, but also enlarge the boundary and think about the solution of the problem from a more peripheral vision.

As mentioned above, there is a drawback to the current H5 cache strategy that needs to be optimized by using the boundary magnification thinking. The disadvantage is that the current browser cache technology has its own limitations, the effectiveness of the cache depends on the degree of secondary access by users. At the heart of the problem is that cache timing is coupled to the user’s first access. As a result, in some highly concurrent H5 activities (unlike business H5, which are mostly first-time users), the effect of caching is not as large as might be expected.

This can’t be solved at the level of pure front-end technology. But when we broaden our thinking horizons and consider the platforms that host the H5, we may be able to solve this problem. At the heart of this problem is the decoupling of resource caching from page access, which the hosting platform (especially the terminal) is capable of doing. In hand Q, this solution is called “offline pack”. Offline packages support both passive and active caching. The page content can be cached to the user’s mobile client through pre-download or active push without users’ active access. First-time users can also hit the cache directly. This greatly improves the effectiveness of the cache. During the Spring Festival, hand Q each high concurrency H5 all use this technology to improve the high concurrency of the page.

Logical thinking

Another example is logical comprehensive thinking, which means that when we do a thing, our vision should not only stay in the partial logic, but should see the whole logic. If logic has normal and abnormal states, we cannot only consider normal states. Logic can be bidirectional and unidirectional. For bidirectional logic, we cannot only consider the positive side.

In fact, all of the front-end high-concurrency strategies I mentioned earlier (including the diagram above) only consider the forward logical segment of the data flow, that is, the flow of data from the client to the server. The reverse logic of the data flow is not considered (that is, the logical case where the data goes from the server to the client). In high concurrency scenarios, the reverse logical segment of data is often a critical part of the logic. Optimizing data, however good, is only half the battle without considering the high-concurrency strategy of the reverse logical segment of the data stream. The following is the full logical process of data flow (the red part is the reverse logical section of data flow) :

In the reverse logic section of data flow, the front end plays the role of data receiver in this logic layer, and the received data may have various states, and the front end needs to deal with these states accordingly. Under high concurrency, if the background is overloaded, some data will be returned incorrectly. At this point, the simplest state can be divided into two states — success and failure. The successful state page should be displayed normally, and the abnormal state should be degraded rather than completely unavailable. If a static resource CDN request fails, the front end can handle a layer of exception logic like this: Temporarily switch the static resource domain name of the current abnormal user to the backup domain name (such as the page domain name or the standby domain name), which can degrade the experience that should be blank screen and unusable to the experience with slow access speed, and also provide time to adjust the capacity expansion of THE CDN machine when the error rate exceeds the threshold. Of course, if combined with the above differentiation idea, we can also differentiate the overall status of the current server (such as the current load degree), through the configuration of return strategy to inform the page of the current server concurrency degree, the front end of these states to do differential treatment, gradually degraded. For example, when the CGI concurrency exceeds a certain limit, the front end can consider gradually shielding some non-core but high access TO THE CGI page entrance, and finally only retain the core CGI entrance, so as to ensure that the core function of the project is not affected by high concurrency.

Five, the conclusion

This article is the author engaged in 4 years of front-end work since some thinking. Based on the optimization point of high concurrency strategy in the front end, some thoughts of the author in its “technique” and “thinking” are elaborated to you step by step. Opinions may be limited by one’s own experience and vision. I hope the strategies and thinking mentioned in this article will give you some insights as a reader. The article is longer, more text, thank you for your patience to read!

Note:

Author: Xu Jiawei, Tencent Web front-end development senior engineer

Original: http://www.cnblogs.com/wetest/p/7069005.html