Today's talk: Performance optimization and its measurement

Front-end early chat conference, a new starting point for front-end growth, jointly held with the Nuggets. Add wechat Codingdreamer into the conference exclusive surrounding group, to win at the new starting line.

The 26th | special front-end interactive, understand the interactive programming interactive performance interactive art | | | the possibility of a world map, and so on, live 5-15 afternoon, five lecturer (ant/taobao/bytes, etc.), point I get on the bus 👉 (registration address) :

All past issues are recorded and can be purchasedGo toUnlock all at once

👉 More activities

The text is as follows

This article is the 18th session of the front-end early chat performance optimization special, but also the 123rd early chat, from Ali UC-haiyu share.

I am Lin Jiewei from UC kernel team. Haiyu is my flower name. Today I’m going to talk about how to rethink performance optimization and how to measure it.

This topic is divided into three parts. The first part introduces the development of performance optimization techniques and standards. The second part covers user-centric metrics, and I’ll end with some lessons on how to optimize performance.

Before we get started, let’s look at some terminology:

PWA, or Progressive Web application development, aims to enhance web capabilities, close the gap with native applications, and create similar user experiences.
Lighthouse is an automated testing tool for testing the performance of your pages and recommending improvements.
SSR is a server-side rendering method that requests data and forms HTML at the server.
NSR can be thought of as SSR running on the client side
Rehydration, commonly known as “hydration,” refers to the process of reusing server-side renderings of DOM structures and data and performing event binding logic on the front end to launch a page.

Optimize technology and standard development

Development of optimization technology

Let’s start by looking at the evolution of performance optimization over the last 20 years. Scattered across the timeline are technical practices covering methodologies, testing tools, network optimization, solutions, and metrics.

One of the earliest guidelines we saw came from Yahoo’s 34 Catch-22 in 2007, which focused on performance best practices. Then Google put forward the Real performance model in 2015, which directly quantifies the performance standard from the methodology level.

Test tools: YSlow, Web Page Test, Page Speed, and Lighthouse.

Network optimization: Network optimization is mainly reflected in the continuous upgrade of protocols. In 2009, Google began to experiment with THE SPDY protocol, which evolved into the HTTP2 standard we use today. HTTP2 uses multiplexing, binary needle compression and other technologies to greatly improve performance. But there are still some problems. Google launched the initial version of QUIC in 2012 in order to continue to optimize network performance. QUIC is a UDP-based multiplexing and secure transport, and finally HTTP over QUIC is the HTTP protocol based on QUIC, officially confirmed as HTTP3 in 2018. Google and Facebook have already pioneered the HTTP3 protocol in production environments.

Moving on to the solution side, In 2010 Facebook introduced BigPipe, a dynamic web loading technology that significantly increased the first screen performance time. Then, in 2016, Google introduced the CONCEPT of PWA at I/O, hoping to improve the performance of all appearance applications by enhancing the power of the Web. In 2017, THE PROMOTION of PWA was implemented. In the same year, AMP was launched and widely applied in mobile web pages, and AMP and PWA began to be used together. It can be seen that performance solutions have evolved from the pursuit of faster loading to the pursuit of native experience.

The last part of the timeline we see is Google’s launch of Web Vitals this year, which was just an initiative to provide developers with uniform performance metrics. You can see one of the technological developments of the last 20 years or so is Google. As the main user of Web technology, Google has a profound impact on the development of Web technology.

Next we will focus on the Real performance model and PWA.

RAIL model

First of all, RAIL model. When we do performance optimization, we often need a clear guide to tell developers what to optimize, how to optimize and what to optimize, or we need a standard to tell developers what performance means by user principle. Based on this background, the Chrome team proposed the RAIL model. Simply put, the RAIL model provides a methodology for thinking about performance problems. It breaks down the user experience into several key actions, such as click scroll and load, and the RAIL model also sets corresponding performance goals for these actions.

When we look at the process of a user visiting a page, the user generally has to go through several processes such as loading, animation and page interaction and idle. RAIL represents these four different aspects of web application life cycle.

The first is Response, which represents a responsiveness of the page to the user’s actions. If a user clicks a button, the page needs to give feedback within 100 milliseconds so that the user doesn’t feel delayed. The goal of 100 milliseconds is derived from a human-computer interaction project, and given that the main layer performs other tasks, the model requires that the task be completed within 50 milliseconds.

Animation refers to the Animation process, the goal is to ensure a smooth Animation. Generally we need a 62ps increment to keep the animation smooth. This requires each frame of animation to be completed in 16 milliseconds. And since browser coupons also take time, the model dictates that the application should be generated within 10 milliseconds.

The third one is Idle, and Idle means that the main thread is Idle, and the target value is maximized, and the main thread is Idle. Why would you do that? Since we mentioned earlier that we need to respond to the user in 100 milliseconds, if our mainline layer is busy, we can’t respond to the user in 100 milliseconds. So, in conjunction with the previous goal, our model requires that the maximum time for a delayed task running on the main layer should not exceed 50 milliseconds.

The last one is Load, which represents the loading speed of the page. The target needs to be set according to the network condition and device condition. The model provides a reference target. On a 3G network, medium devices become user interactive in less than 5 seconds.

We mentioned earlier that event processing should be completed in 50 milliseconds, and extended tasks should not exceed 50 milliseconds. Why is it 50 milliseconds?

Just to clarify, we see a task scheduling situation on the main thread. In addition to processing user events, the main thread also has other tasks that take up part of the time, so the event processing takes a little bit of time.

According to the previous goal, we need to respond to user events in 100 milliseconds. Combined with the task execution in the graph, we need to perform two tasks in 100 milliseconds, assigned to each task is 50 milliseconds, so we also define tasks that exceed 50 milliseconds as long tasks.

PWA

Web applications are often inferior to native applications in user experience. First of all, Web applications often rely on the network to load content, and there are problems such as slow loading in weak network environment and inaccessible offline condition. Secondly, appearance applications cannot be added to the desktop like Native, and users often need to input URLS to obtain content. In addition, some of the capabilities of native apps are also missing in the Web, such as the ability to push messages. Although native apps have good experience, there are still some problems.

For example, the development cost of native applications is high, the dynamics is poor, and users need to download and install before using them. Based on these backgrounds, Google proposed the concept of PWA in 2016, hoping to narrow the gap with Native applications by enhancing the ability of Web and provide comparable user experience. We can see that PWA has several important features. The first one is Service Worker, which can be regarded as a programmable network agent. It provides offline support, including caching and preloading. Service Worker is also the basis of its PWA feature implementation.

App Manifest is used to define the appearance and behavior of external applications, including ICONS added to the desktop and flash animations in full screen mode.

The third is Push and Notification, which complements the ability to Push and receive messages for Web applications.

The last one is offline cache. Offline cache makes use of the offline ability of Service workers, enabling users to use some functions even when they are offline. PWA also includes other features, such as reading device status, The ultimate purpose of sharing through Bluetooth is to gradually achieve the experience of native applications through progressive enhancement.

At present, the mainstream browsers also support PWA to varying degrees, and some websites at home and abroad have also carried out PWA practice, such as Alibaba and Jingdong, etc. These websites have also obtained some quantifiable benefits after applying PWA. According to a case shared by Google, JD.com Indonesia saw a 53% increase in conversion rates after it used PWA’s ability to cache desktop installation notifications.

Standards organizations

We all know that standards cannot be established without standards organizations, and performance standards are no exception. There are two important organizations in the field of journalism. One is the establishment of w3c in 1994, which is the most authoritative and influential international neutrality standards body in the field of Web technology.

Another is that in 2010, the W3C established the Web Performance Working Group, whose goal is to develop methods and apis to measure the performance of Web applications.

Let’s take a look at a set of apis developed by the Web Performance Working Group and you can see from this diagram that these apis fall into three categories

The first type of API is the framework class, which mainly contains the leftmost part, high Resolution Time, and the performance Timeline, which provides high-precision time interface and interface for querying performance data.

The second type is the measurement class API used to detect performance data for different aspects of the page life cycle, corresponding to the middle section.

The third category is apis for various optimization strategies to improve page performance, mainly the right-most section, which provides capabilities including page visibility, task scheduling preloading, and so on.

First, take a look at the Performance Timeline. The performance timeline consists of three parts:

The first part is the high-precision time interface, including the PERFORMANCE, NAO method on the object.

The second is the main API for querying performance interfaces provided by Performance Time, such as getEntries and getEntries byType and byName.

Finally, the Performance Timeline defines two important objects that are fundamental to performance measurement. PerformanceEntry is a base class for the other entries. We can see from getEntry that the browser returns a list of entries, all of which are inherited from PerformanceEntry.

The other is PerformanceObserver, which monitors event-based metrics, and we’ll take a look at one use of the API.

The Navigation timing interface of performance can be used to calculate the Navigation performance of the page at each stage from request initiation to load completion.

Navigation time indicates a type of Navigation. Currently, there are 4 types, respectively representing navigation refresh, forward and backward, and pre-render. From the timing of navigation, we can see that navigation timing overuses resource, and the time of resource timing. That corresponds to the part of the diagram.

Resource Timing records the time between the request and loading of a subresource. The performance data of a subresource can be queried using the getEntriesByType interface. The input type is Resource.

One thing to note about this API is that for cross-domain resource performance data, we need to return the time-Allow-Origin request header correctly so that the performance of the sub-resource can be retrieved by the page.

Let’s take a look at some of the newer apis. Paint Timing is used to test two metrics, one is FP, the other is FCP, the first draw and the first content draw, calculation is relatively simple, through the previous observer interface to listen.

Another relatively new interface is Event Timing.

The background of Event Timing is that we need to track the processing delay of page input events. Currently, it is in the draft stage. This API is mainly used to calculate the index FID, and the algorithm is relatively simple, which is to subtract the start processing time from the first input and receive time.

Frame Timing provides the ability to record slow Frame information, which is also a standard in the draft stage. The definition of Frame is that between two Vsyncs at the screen refresh rate of 60 Hz, the upper limit of slow Frame processing time is 16.6 ms. Use the same method as above.

Finally, the long task API is used to monitor the status of a long task for the main thread. As we all know, common tasks will block and the main layer to the page cannot respond to user input quickly. Therefore, for long tasks, the general optimization method is to reasonably split them into sub-tasks for optimization.

To summarize, we use Navigation Timing to obtain the Navigation performance of the page, and Resource Timing to obtain the loading performance of sub-resources. The four bold parts in the middle test the user indicators we will introduce next. Finally, there are the long task API and Frame Timing, which provide optimization information that our application needs to focus on. In addition to providing metrics for API appearance performance, the working group also provides a set of optimization strategies for API page visibility, task scheduling, and Resources Hints, which are not covered here.

User-centric metrics

From the introduction of the standard API, we know how to detect application performance and how to find problems and optimize them accordingly. However, we still lack a quantifiable standard to measure the real feelings of users, and how to evaluate the real feelings of users more accurately.

Overview of user specifications

Let’s take a look at some of the latest indicators. First of all, from the user’s point of view, the loading process of a web page, the first thing users will pay attention to is whether the content of the web page is presented fast enough. If the page loading is too slow, users tend to lose patience and leave. Then when the user sees the content they are interested in, the user will subconsciously operate the page. At this time, if the page does not give feedback in time, the user will notice the delay of our page. Finally, if the web content in the presentation process appears a large jitter, such as the page suddenly snapped out, or the window content has a large offset, this time the user is often unhappy. Therefore, when we formulate indicators, we need to take into account the feelings of users from the above three aspects, which, in general, are the loading speed perceived by users, the responsiveness of pages and the smoothness of content presentation.

The six indicators we have introduced here include all three considerations. The green part is the first content drawing time and the maximum content drawing time respectively. These two metrics measure how fast a page loads, based on how the user sees the content being drawn effectively. The blue part contains only three indicators, namely the first input delay, the user delivery time and the main thread cumulative blocking time, reflecting only one of the responsiveness of the page. FID needs user input to trigger, so it is generally used in the production environment. In the laboratory environment, TVT and TTI are generally used instead. Finally, in orange, CLS refers to the cumulative layout offset over the life of a page, reflecting the smoothness of page content presentation, which is provided as a score and can be tested in production and laboratory environments.

Let’s look at the first set of indicators. FTP is the first content rendering, while LCP is the maximum content rendering. FTP is relatively simple. In actual detection, the browser will continue to send the maximum detected elements to the page in the form of events, so the maximum content time detection, generally based on the last event received by the page. At present, the largest content can be pictures and text, including image label, SVG image, video cover and background image. Finally, one of LCP’s detections is supported through the Element Timing API, using a method that also listens for performance events.

TTI page deliverable time, which is a complex indicator, needs to consider both the main thread and the network.

The top of our figure shows the network requests, and the bottom is a task distribution of the main line layer. First, we will start from FTP and search to the right. We will find a 5-second time window in which there are no regular tasks and no more than two network requests. We can see that the gray rectangle in the figure is the one that meets this requirement. Once we find this time window, we go back to the beginning of this time window until we find the most recent long task, the end time of the long task, the page interaction time that we define, and then we look at FTP and TVB, FTP is the mobile phone user input delay, TPT is the main thread accumulated accumulative blocking time, happy FID refers to the main thread received the user input for the first time, and began to respond to this part of the time, we can see that there is a long task in the picture, and then the user event occurred in front of the task, and then the browser began to process. At the end of a long task, however, the speed delay mainly occurs when the main layer is busy. A common scenario is that the main thread is parsing and executing a huge GS while the user is interacting with the page, and the main thread is unable to respond to the user.

Let’s take a look at a calculation method for FID. FID is generally used as a substitute index for FOR in laboratory tests. Its calculation rule is to take all tasks between FTP and TCL, and the cumulative length of the task exceeds 50 milliseconds. Mathematically it’s the sum of all the long tasks minus the number of frequent tasks times 50, which is a calculation of the first input delay.

Finally, CLS is a cumulative layout offset. When the starting position of the visible elements in the window changes, we consider that a layout offset has occurred. As the text node shown in this diagram, let’s look at a calculation of the offset score of a monochrome layout. The offset score is equal to the range of influence. The score times the displacement one score. Range of influence of a fraction is the range of influence of our element. A ratio of the red area to the window area. Displacement fraction, displacement height over window height. The diagram is mostly just the arrow part, and the browser supports this indicator through the Layout Instability API.

Next, let’s look at Web Vitals. The background of Web Vitals is that Google hopes to provide a unified measurement standard for application performance. Core Web Vitals is the most Core part of Web Vitals, including LCP, FID and CLS. Core Web Vitals also sets performance goals for these metrics. Another thing to note about Core Web Vitals is that Google plans to add Core Web Vitals to its page-ranking algorithm in 2021. In order to simplify the collection of indicators, Google also provides Web Vitals library. Developers only need to introduce module configuration and upload data to complete the collection of indicators.

Another thing we need to note is that Web Vitals is just released, and browser support for it is not perfect, so far only in the latest Chromium support is quite complete.

The latest Devtools also adds support for Web Vitals. Let’s take a quick look at this with a recorded video

Start Devtools using F2, select the Performance panel, select Web Vitals, and record screen. Next, enter the URL, stop recording when the page is finished loading, and you’ll see the latest Devtools performance panel with a lot of new content.

I’ve circled it in red here, and you can see that the latest Devtools, which provides Web Vitals, Long Tasks, all have separate panes, layout offsets, and they’ve marked them for us. As you can see from the timeline, there are more LCD indicators.

How to optimize performance

Finally, we’ll look at how to do performance tuning.

What does performance mean

Performance that is generally perceived by the user, including load speed, responsiveness, animation fluency, power consumption, and memory usage. In terms of power consumption and memory footprint, external performance optimization means we don’t have a particularly good technical means to optimize it. So for these two features, the least we can do is keep the phone from getting hot and the app from crashing.

Commonly used optimization methods

Take a look at some commonly used optimization methods. Generally speaking, caching and preloading techniques are the most effective and direct means of performance optimization. Different options are used in different scenarios.

Let’s look at a few principles of optimization. The first is to remember that networks are unreliable, and if we can solve the network problem, we can solve most of the performance problems. The second is that JS is single threaded, so we should use Worker to parallelize. In the browser environment, Worker may be Web Worker or Server Worker, while on the end, we may use APP Worker with better performance.

The third point includes the server and client. For example, the SSR that uses the server to initiate requests and render data, the client can provide offline ability and JS Worker support.

Finally, we need to test the environment on a real environment. Taking V8 as an example, V8’s previous use of benchmarking to measure performance led to optimizations that were not helpful or even counterproductive in real-world situations. So our performance tests are ultimately measured in the real world.

Next, we look at our caching technology. We mainly use CDN, browser cache and application offline package. Browser cache is relatively large, v8 Code cache is bytecode cache, which can greatly reduce the time of JS compilation and parsing.

BFCache is used to cache the state of the entire page, mainly used in forward and backward navigation, pre-rendering and other scenes. Network cache, mainly refers to DNS cache, network cache. The dashed lines indicate only the caches that our frontend uses frequently, including IndexDb, localStorage, cookies, and Cache Storage. In addition, HTTP Cache, Memory Cache and so on are the most widely used browsers.

For cache use, our general optimization is to use CDN to speed up static resources and set the cache time properly for our static resources. It is also worth noting that we need to avoid registering events for Unload to affect the best practice of a forward and backward front-end portion of the page.

We should avoid using synchronous localstorage as much as possible. In addition to its underlying synchronization, it also has some limitations on storage (typically 5M). Instead, use asynchronous IndexDB and the Cache API, which are used to Cache request-related resources. Other resources can be stored in IndexDB. Finally, based on the optimization on the end, we can deliver resources in advance on the end by providing offline packages on the end. Furthermore, we can build the key JS resources of the main document directly into the memory cache, because we have the fastest memory access.

Preload techniques are mostly standard-defined Resources Hints and Preload, where Resources Hints include DNS-PreFETCH, Preconnect and PreFETCH. Preload is defined separately and not in Resources Hints.

Preload is used to preload key resources for the current page, prefetch is used to only load resources for the next page. This is the most important difference, prerender is pre-render. It loads documents and sub-resources. So we don’t use prerender on a large scale, we only use it on pages that users are most likely to visit.

Let’s take a look at common rendering schemes. According to the environment and timing of rendering, it can be divided into four categories, namely static rendering, front-end rendering, server rendering and client rendering.

Let’s take a look at a rendering stream for each rendering method, starting with SR. SR is a way to generate HTML structure at compile time and render directly after the page is loaded. As you can see, it generally performs best and is only available and only available on static pages.

CSR is only the most common front-end rendering method. CSR is a rendering method that puts all the computing logic, including data request and data rendering, in the front end for execution. Generally, its performance mainly depends on the network condition and the performance of the device.

Let’s take a look at the server side rendering, usually in the server side directly render HTML, can be considered as SSR, traditional SSR uses PHP and Java languages, combined with back-end templates to generate HTML structure, in the context of the separation of the front and back end, this rendering method is no longer applicable.

Mainstream server rendering, we are going to talk today about SSR with Rehydrate. Isomorphism of the front and back ends is used. SSR with Rehydrate both the data request and the first screen data request and data assembly happen on the server side, so the page can be rendered quickly after the document loads. However, it has a disadvantage that it needs to be re-executed in the front end, which can lead to long page response times. Compared to CSR, SSR with Rehydrate provides shorter first screen performance and interaction times, and also performs better on low end devices and weak networks, as server rendering also takes time to process, which also introduces the problem of long first byte times.

Another thing we need to consider is that SSR rendering is on the server side, which will increase the cost of the server, so generally our SSR will add a layer of caching at the back end.

NSR can be regarded as SSR running on the client side. The principle is to carry out the work of SSR on the server side on the client side. In addition, some resources and data prefetch on the client can be used to further improve performance. As can be seen from our pipeline diagram, THE NSR data request, the first screen data request and the data line are parallel with a WebView initialization and framework JS initialization, greatly reducing the first screen time.

This method is widely used in UC line information flow and mobile shopping venue.

Of course, we should also see that each on-site method has certain advantages, but also has its limitations and disadvantages. What we need to do is to choose the appropriate publicity plan according to the actual scene. SR has the best performance, but it has a fatal flaw: it can only be used on static page lines. The performance of CSR depends on device performance and network status, and the performance of weak network and medium and low-end devices is poor. SSR greatly improves the first-screen performance by only taking advantage of the good network and performance of the server. It also performs better on low-end devices. SSR can be used as a general optimization method.

Generally, WE will put SSR pages outside the end. NSR makes use of the cache and data prefetch ability of the end to make the application initialization and keeping the flat line parallel, which is the best choice in addition to SR. In addition, since it is rendered on the end, there is no additional cost to the server.

Other optimization methods. For example, in a mobile scenario, we can reuse the WebView to greatly reduce the WebView initialization time. In addition, in some scenes that are difficult to optimize, we can use some of them to give users some feedback and make users feel that the performance is not too bad through some humanistic care.

We can also see that The iPhone does this better. The iPhone is not the best in actual performance, but it is the best in performance. Let’s take a look at the practice of expanding the body page of UC information flow. This video shows a comparison of the body page of UC information flow before and after optimization. It can be seen that the optimized page has achieved the effect of expanding.

Let’s take a look at some background introduction. First of all, the list page of the information flow uses weis technology, and the text page uses H5 technology. Click from the list page to enter the body page, and in the middle there will be a 300-millisecond WebView entry animation, which is the interaction shown in this picture. So the performance technical goal is to render the first screen in 300 milliseconds, even after the animation is finished, the page should see the page. In addition, in the business, the main text industry also has the demand of delivery in the end, so cross-platform needs to be considered.

Let’s look at a technical strategy for information flow optimization.

The first question is why WEEX is not chosen for the information flow optimization strategy. Weex has better performance advantages on the end, but it is not a standard technology, and the pages outside the end also need to be optimized separately. H5 natural cross-platform, is also a standard technology, performance is not superior to WEEX, but the ceiling of optimization can be very high, that is to say, our H5 can be optimized to achieve better performance.

The second question is why wasn’t SSR used? SSR is used outside the end, end inside can have a better solution. Here is the NSR rendering scheme we will cover later.

The third question, information flow, why did the information flow team not choose PWA? One problem of PWA is the high cost of starting and keeping Service workers alive, which takes about 100ms to 1000ms.

According to the technical goals we set before, if we use PWA scheme, we cannot achieve most of the expansion goals, so WE choose NSR rendering as the final optimization scheme of information flow.

There are several key techniques involved. The first is to limit the utilization side, putting all resources into offline packages and even more radical prefabrication of key resources such as master documents into internal memory.

The other is data prefetch, which loads data when the data is under the list page together. Through the previous two optimizations, we basically eliminate the performance impact caused by network instability.

In addition, NSR is used. As we saw earlier, NSR can make the frame initialization of the page and data rendering parallel, so that the page can be rendered after the WebView comes into play. Of course, to achieve the ultimate optimization, we also need to completely slim down the page, to ensure that the first screen uses the smallest JS bundle.

We saw that in the process of information flow optimization, the technical solution changed from CSR to NSR, which brought an engineering problem. In other words, in the past, only one CSR Bundle was enough for the front-end, but after the technical transformation, we needed to support CSR at the same time. SSR and CSR rendering in different ways requires us to build three JS bundles supporting different rendering methods in one set of code.

So one of the engineering challenges here and here is how to optimize performance and at the same time reduce r&d costs as much as possible. Finally, from the optimization of information flow, we can also get some inspiration. Here is the summary of the information flow students, we can see that in addition to the knowledge of performance optimization, the most important thing is not to set limits for yourself.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Today’s talk: Performance optimization and its measurement

The text is as follows