Summary of front-end page performance optimization

Front-end page performance optimization is a topic that every front-end engineer cannot escape, because the user experience is our eternal goal. In this article, I will start from performance indicators and summarize my study and practice of front-end page performance optimization from the perspectives of network resource optimization and page rendering optimization.

This article was first published on my technology blog: Summary of Front-end Page Performance Optimization

Cover image: Photo by Chuttersnap on Unsplash

Establishment and interpretation of performance indicators

The first step is to clarify what one measure of performance is. That is to say, there is a comparison of how bad my page performance was before the optimization and how much I improved it after the optimization. So how do you collect this data? Here are two ways.

Web Navigation Timing API

What if we wanted to collect the raw metrics of the page ourselves? Browsers provide us with a native Timing API, and there are now two standards, which are described below.

Navigation Timing Level 1
- window.performance.timing
  
  Before the API, if we wanted to collect the time it took to fully load a page, we might need to do something like this:
```
```html <html> <head> <script type="text/javascript"> var start = new Date().getTime(); function onLoad() { var now = new Date().getTime(); var latency = now - start; alert("page loading time: " + latency); } </script> </head> <body onload="onLoad()"> <! - Main page body goes from here. --> </body> </html> ```Copy the code
```
The script above calculates how long it takes to load the page after the first JS script in head executes, but it doesn’t give any information about how long it takes to get the page from the server, or the page initialization life cycle.

For the sake of accurate and reliable access can be used to measure the data of site performance, window. Performance. The timing can get difficult to obtain data before, such as the time needed for unloading the previous page, the time domain lookups, execution window load handler the total time spent, and so on. The image below shows the window. The performance. Timing of each attribute, and the corresponding phase of each page.

From this we can calculate the old document unload, redirect, application cache, DNS lookup, TCP handshake, HTTP request processing, HTTP response processing, DOM processing, document loading completion and other page performance points. Please refer to the navigation-timing W3C specification and how several page key metrics are calculated
- window.performance.navigation
  
  The PerformanceNavigation interface presents information on how to navigate to the current document. Type and redirectCount are used to describe operations related to page loading.
Navigation Timing Level 2

Abandoned the timing of Level 1, Level 2 standard and navigation of these two interfaces, instead defines PerformanceNavigationTiming object, that object can get like this:

window.performance.getEntriesByType("navigation")[0];

The attribute values of the Level 1 interface are JavaScript based Date objects, while Level 2 uses High Resolution Time to solve the Time precision problem. And Level 2 Navigation Timing API also update the Processing Model, expanded the PerformanceResourceTiming interface, dot can get more detailed information.

Document Processing is unique to Navigation Timing, and Resource Timing is also introduced later. Overall, Level 2 standard is more comprehensive and divides Web Performance Timing into Performance metrics. However, there are still compatibility issues among major browsers.

The Performance Navigation Timing API and the Performance Navigation Timing API are the main Performance Timing apis. The Resource Timing API, Paint Timing API, and Long Task Timing API, and how to use the PerformanceObserver to get performance data asynchronously.

Window. The performance. GetEntriesByType (” resource “) PerformanceResourceTiming interface support retrieval and analysis about the detailed network timing data application resource loading, I can use this to determine how long it takes to get a particular resource, such as XMLHttpRequest,, an image, or a script
Window. The performance. GetEntriesByType PerformancePaintTiming interface (” paint “) during the web building to provide “drawing” (also known as “rendering”) takes the information of the operation.
- First-paint: The time from navigation until the browser renders the first pixel to the screen (white screen time).
- First-contentful-paint: The time the browser renders the first content from the DOM (FCP).
PerformanceLongTaskTiming Long Tasks API can be used to understand the browser when the main thread is blocked for too Long time which affect the frame rate or input lag. Currently, the API reports any task that takes more than 50 milliseconds to execute.
Use the PerformanceObserver to listen for the performance metrics described above
```
// Instantiate the performance observer
var perfObserver = new PerformanceObserver(function(list, obj) {
  // Get all the resource entries collected so far
  // (You can also use getEntriesByType/getEntriesByName here)
  var entries = list.getEntries();

  // Iterate over entries
});

// Run the observer
perfObserver.observe({
  // Polls Timing entries
  entryTypes: ["navigation"."resource"."longtask"]});Copy the code
```
The PerformanceObserver can passively subscribe to performance-related events, meaning that the API usually does not interfere with the performance of the main thread of the page because its callbacks are usually triggered during idle time. By default, the PerformanceObserver object can only observe entries as they appear. If we want to lazily load performance analysis code (without blocking higher-priority resources), we need to do this:
```
// Run the observer
perfObserver.observe({
  // Polls for Navigation and Resource Timing entries
  entryTypes: ["navigation"."resource"].// set the buffered flag to true
  buffered: true});Copy the code
```
Set buffered to true and the browser will return history entries in its PerformanceObserver buffer the first time the PerformanceObserver callback is called.

Web Vitals

Web Vitals is Google’s initiative to provide a unified guide to Web quality, metrics that are critical to delivering a great user experience on the Web. In order to simplify the scene and help the site focus on the most important metrics, Core Web Vitals was introduced. Core Web Vitals is a subset of Web Vitals, including LCP(Largest Contentful Paint), FID(First Input Delay) and Cumulative Layout Shift (CLS).

Largest Contentful Paint (LCP) : Maximum content drawing, measuring loading performance. To provide a good user experience, the LCP should occur within 2.5 seconds after the page first starts loading.
FID(First Input Delay) : the First Input Delay, measuring interactivity. To provide a good user experience, a page’s FID should be 100 milliseconds or less.
CLS(Cumulative Layout Shift) : Cumulative Layout Shift, measurement of visionThe stability of. In order to provide a good user experience, the CLS of the page should be maintained at0.1Or less.

From the meaning of these three indicators, we can find that these three indicators respectively from the page loading speed, page interaction and page visual stability of these three aspects to measure the performance of the page.

So how do you collect these indicators? Google provides a web-Vitals library that is only about 1K in size. Using this library, we can capture not only the three metrics mentioned earlier, but also First Contentful Paint (FCP) and Time to First Byte (TTFB). For details, go to Web-Vitals.

Network Resource optimization

Network resource optimization can be divided into network optimization and resource optimization, first network level optimization.

CDN

CDN stands for content delivery network. CDN service providers cache resources from source stations to high-performance acceleration nodes across the country. When a user accesses corresponding service resources, the system schedules the user to the nearest node and returns the IP address of the nearest node to the user. In this way, the user can obtain the required content in a faster and more stable manner. There are two core points of CDN, one is cache and the other is back source:

Cache: Cache resources requested by the source server as required.
Back source: if the CDN node does not respond to the resource that should be cached (it has not been cached or the cache has expired), it will return to the source station to obtain the resource

HTTP & TCP

The optimization of HTTP application layer and TCP transport layer is the only way of front-end performance optimization. We all know that merging (reducing) HTTP requests for static resources (such as front Sprite graphics) is a front-end optimization guideline, but what’s the rationale behind it? Are you really serious about reducing HTTP requests? Let’s take a look at how this works.

The disadvantage of HTTP / 1.1
- HTTP/1.1 queue header blocking problem
  
  We all know that persistent connections are used in HTTP/1.1, and while a TCP pipe can be shared, only one request can be processed in the pipe at a time. Other requests can only be blocked until the current request ends. This means that if some request is blocked for 10 seconds, subsequent queued requests will be delayed for 10 seconds. This queue head blocking problem prevents these data requests from running in parallel. This is why browsers can support multiple TCP connections (up to six) for each individual domain name, and requests can be distributed over these individual connections to effect parallel request processing.
- TCP slow start
  
  TCP slow start is a TCP congestion control policy. TCP has a slow start after establishing a connection. This slow start means that the number of packets sent is increased bit by bit. The size of the congestion window increases by 1 each time the sender receives an acknowledgement packet. Slow starts cause performance problems because if a request for a small page-critical resource has to go through this slow start process, the rendering performance of the page will be significantly reduced.
Therefore, the Sprite diagram we mentioned at the beginning is necessary for HTTP 1.1’s disadvantages. A TCP connection can process only one HTTP request at a time. So when the site has a lot of resources and the browser is limited by the number of TCP connections, the page load is slow, so it is desirable to combine small images into one large image to reduce HTTP requests.
Advantages and disadvantages of HTTP/2
- Multiplexing (core advantage)
  
  HTTP/2 uses a multiplexing mechanism.
  
  HTTP/2 introduces the binary framing layer, where the browser converts each request into multiple frames with the request ID number. After receiving all the frames, the server consolidates the frames with the same ID into one complete request. After processing the request, the response is converted into multiple frames with the response ID number. The browser merges the frames according to the ID number. Through this mechanism, HTTP/2 implements the parallel transfer of resources. In addition, HTTP/2 uses only one TCP connection for a domain name, so as to solve the HTTP/1.1 queue blocking problem, but also to solve the problem of competing bandwidth caused by multiple TCP connections.
  
  Let’s go back to the Sprite front. We now know that in HTTP/2, multiple requests are no longer a performance-intensive affair. Compared with image format and size optimization (WebP, etc.), the latter can obviously produce better performance optimization. So in HTTP/2, front Sprite diagrams are no longer a best practice.
- Other advantages
  - Based on the binary framing layer, HTTP/2 can also set the priority of requests, which solves the problem of resource prioritization.
  - Server push, no browser to take the initiative to request the key resources of the page, once the HTML parsing is finished, you can get the key rendering path of the resources.
  - Head compression. HTTP/2 compresses the request and response headers.
- Disadvantages: HTTP/2 TCP header blocking problem
  
  HTTP/2 addresses the application layer queue blocking problem and does not change to the same TCP transport layer protocol as HTTP/ 1.1. We know that TCP is a connection-oriented (one-to-one, single-connection) communication protocol that is guaranteed to be reliable. If a packet is lost or delayed, the entire TCP connection will be suspended, waiting for the retransmission of the lost or delayed packet. However, in HTTP/2, a domain name only uses one TCP connection, and each request runs in a long TCP connection. If packet loss occurs in one data flow, all requests in the TCP connection will be blocked, thus affecting the transmission efficiency of HTTP/2.
HTTP3 outlook
- QUIC agreement
  
  The main difference between HTTP/3 and HTTP/2 is that HTTP/3 is based on QUIC as the transport layer to process streams, while HTTP/2 uses TCP to process streams in the HTTP layer.
  
  QUIC can be considered as a set of protocols integrating “TCP+HTTP/2 multiplexing +TLS”. The quick handshake function (based on UDP) implements 0-RTT or 1-RTT to establish a connection, which can greatly improve the speed of opening the page for the first time. However, HTTP/3 is not compatible with current browsers and Safari does not support it by default.
Preconnect & DNS-Prefetch

Resource Hints Preconnect and DNS-prefetch are standard with resource hints as follows:
```
  <link rel="preconnect" href="https://example.com">
  <link rel="dns-prefetch" href="https://example.com">
Copy the code
```
DNS Lookup + TCP Handshake + TLS Handshake (HTTPS) Preconnect: DNS Lookup + TCP Handshake + TLS Handshake (HTTPS) While very simple, it still takes up valuable CPU time, especially on secure connections. If the connection is not used within 10 seconds, the browser closes it, wasting all the early connection work.

Preconnect is currently supported in Safari 11.1 and up, but not in newer Versions of Firefox. Dns-prefetch has better browser compatibility, but it only handles DNS queries.

Resource per se optimization

Let’s take a look at some common resource sizing optimizations that can be done at the WebPack packaging stage, and some compression can also be done at the HTTP level.

html

The first screen HTML should be controlled within 14KB, otherwise the RTT will be increased and the rendering time of the first screen will be affected. Webpack plugin html-webpack-plugin has a minify:true configuration to enable HTML compression. Also, don’t abuse inline CSS styles and JS scripts.
JS

Let’s focus on the cool things we can do with WebPack.
- scope hoisting
  
  Scope collieries translate as scope lifting. In Webpack, this feature is used to detect if import chaining can be inlined to reduce unnecessary Module code. Open ModuleConcatenationPlugin plug-in can open the scope in webpack hoisting. This plug-in is only enabled by default in the Production Mode production environment. If you want to know more about the analysis of scope collieries, you can see another article of mine: Brief analysis of Webpack scope collieries
- code splitting
  
  Webpack’s Code Splitting provides the ability to split code into bundles that can then be loaded on demand or in parallel. Code separation can be used to get smaller bundles, reducing file loading time. Common solutions are:
  1. The out-of-the-box SplitChunksPlugin configuration allows you to automatically split chunks.
  2. Dynamic imports allow code to be loaded on demand. An analysis of webPack dynamic import can be found in my other article: Analyzing the implementation of WebPack dynamic Import
  3. Using the Entry configuration, you can configure multiple code packaging entry points to manually separate code.
- tree shaking
  
  Tree shaking is a term used to remove useless code from JS. ES2015 Modules are built into WebPack, and export detection of useless modules is also supported in WebPack 2. The WebPack 4 version extends this functionality and provides the compiler with hints to indicate which files in the project are “pure” so they can be safely removed by adding the “sideEffects” attribute to package.json. For an in-depth analysis of Webpack Tree shaking and its current shortcomings, see my article: Exploring Tree Shaking for Webpack
- minify
  
  Webpack4 + uses TerserPlugin for code compression by default in production.
CSS
- Inline the key CSS on the first screen
  
  Inline key CSS files on the first screen to improve page rendering time. Because CSS blocks THE execution of JS, and JS blocks DOM generation, which blocks rendering of the page, CSS can also block rendering of the page. Some CSS-in-J schemes, such as Styled Components, are also critical CSS-friendly, as styled Components keep track of the components rendered on the page and inject their inline styles completely automatically, rather than some CSS links. Combined with component-level code splitting, less code can be loaded on demand.
- Load the CSS dynamically and asynchronously
  
  You can query the properties of large CSS files by media and divide them into multiple CSS files for different purposes. In this way, specific CSS files can be loaded only in specific scenarios.
- CSS File Compression
  
  Compression of CSS can be turned on using webpack’s mini-CSs-extract-plugin.
Image compression

Webpack’s IMg-loader can support image compression plug-ins in different formats. Of course, if the image is small, we can also consider inline base64 image (urL-loader).

HTTP level resource compression

The content-Encoding entity header is used to compress data for a particular media type. With nginx configuration, we can implement:

# enable gzip on; Set the lowest HTTP protocol version required for gzip (HTTP/)1.1, HTTP/1.0) gzip_http_version1.1; Set the compression level, the higher the compression level, the longer the compression time (19 -) gzip_comp_level4; # set the minimum number of bytes to compress. Content-length gets gzip_min_length1000; Gzip_types text/plain Application /javascript text/ CSS;Copy the code

Caching of resources

HTTP cache
- Strong cache
  
  Strong caching refers to the use of the browser’s cached data without making network requests when the cached data is not invalid. Strong caching is implemented by both Expires and cache-control response headers.
  - Expires
    
    Its value is one GMT time. This value describes an absolute time returned by the server. Expires is limited to local time, and changing the local time can invalidate the cache.
  - Cache-Control
    
    There are the following values:
    - No-cache: The browser must revalidate with the server each time it uses the cached version of a URL
    - No-store: Browsers and other intermediate caches (such as CDN) never store any version of a file.
    - Private: The browser can cache files, but the intermediate cache cannot.
    - Public: Response content can be cached by any server.
    - Max-age: indicates the cache duration, which specifies the relative amount of time.
  Cache-control takes precedence over Expires if both headers are present together.
- Negotiate the cache
  
  When the browser does not match the strong cache, it sends a request to the server. The server checks whether the resource is updated or not. If there is no update, the server returns the status code 304.
  
  The negotiated cache is managed using last-Modified, if-modified-since and ETag, if-none-match pairs of headers. Last-modified is the time when the server determines that the resource has been Modified. The browser will carry the if-modified-since value on the next request so that the server can check whether the resource has been Modified. ETag and if-none-match are used in a similar way to last-Modified and if-Modified-since, except that the ETag value is more complex to generate, usually the hash of the content and the Last Modified timestamp.
  
  ETag has a higher priority than last-Modified.
Use long-term caching

In order to make more efficient use of the cache, we usually set a long cache time for static resources. In order to enable the browser to retrieve the latest resources, we will change the static resources to a different version of the hash file name, such as main.8e0d62a03.js. We can use Webpack to package such hash file fingerprints. Webpack has three hash generation methods:
- Hash: Relates to the construction of the entire project. The hash value of the entire project will change whenever the project is modified.
- Chunkhash: Related to chunks packed by Webpack, different entries will generate different chunkhash values.
- Contenthash: Defines hash based on the content of the file. Contenthash does not change if the content of the file remains unchanged
The service worker cache

The Service Worker intercepts network-type HTTP requests and uses a caching policy to determine which resources should be returned to the browser. The Service Worker cache serves the same purpose as the HTTP cache, but the Service Worker cache provides more caching capabilities, such as fine-grained control over what is cached and how it is done.

Here are a few common Service Worker caching strategies (and some of the out-of-the-box caching strategies provided by Workbox).
- Network only: Always get the latest content from the Network.
- Network Falling back to cache: The latest content needs to be provided. However, if the network is down or unstable, slightly older content can be served.
- Stale-while-revalidate: Cache content can be provided immediately, but newer cache content should be used in the future.
- Cache first, fall back to network: Provide content from the Cache first to improve performance, but the Service Worker should occasionally check for updates.
- Cache only: only Cache is used.
Stale-while-revalidate We mentioned stale-while-revalidate in the service worker cache policy above. It is an HTTP cache invalidation policy popularized by HTTP RFC 5861. This strategy first returns data from the cache (expired), sends a FETCH request (revalidation), and finally gets the latest data. Stale-while-revalidate is used in the same way as max-age:
```
   Cache-Control: max-age=1, stale-while-revalidate=59
Copy the code
```
If the request time is repeated within the next 1 second, the previously cached value will remain up to date and will be used as is without any revalidation. If the request is repeated between 1 and 59 seconds later, the cache is expired, but you can use the expired cache directly with asynchronous revalidate. After 59 seconds, the cache is completely expired and a network request needs to be made.

Vercel introduced a React Hooks library for SWR to get data based on the stale-while-revalidate idea. It enables our component to load cached data immediately and asynchronously refresh subscribed data to provide updated interface data. The component has the ability to continuously and automatically obtain data update streams.

Above we have introduced the browser’s HTTP cache and service worker cache. Let’s briefly review the order in which browsers cache requests for resources (from highest priority to lowest) :

Memory cache (if available)
Service worker cache
HTTP caching (strong caching then negotiated caching)
Server or CDN

Resource hints and preloading

Preload preload
```
  <link rel="preload" href="sintel-short.mp4" as="video" type="video/mp4">
Copy the code
```
The preload value of the rel attribute of the element allows the request to be declared in HTML to indicate that the resource will be used soon, so that the browser can load the resource early (and increase its priority). This ensures that resources are available earlier and is less likely to block page rendering, which improves performance.
- Application scenarios of Preload
  
  The basic use of preload is to load resources discovered later as early as possible. Not all resources are in HTML, although browser preloaders can find most resources on HTML tags early on. Some resources are hidden in CSS and JavaScript, and browsers can’t find them and download them early enough. As a result, in many cases these resources end up delaying the first rendering or the loading of key parts of the page.
  
  Font resource loading is optimized using Preload. In most cases, fonts are critical to rendering text on a page, and the use of fonts is so buried in CSS that even if the browser’s preloader parses the CSS, it’s impossible to determine whether they’re needed or not.
```
  <link rel="preload" href="font.woff2" as="font" type="font/woff2">
Copy the code
```
  With Preload, we can increase the priority of font resources so that browsers can preload as early as possible. In some cases, using Preload for font loading can cut the loading time of the entire page in half.
- Precautions for using Preload
  1. While preload’s benefits are obvious, it can waste users’ bandwidth if abused. And if preload resources are not used within 3s, a warning is displayed on the browser’s Console.
  2. Do not omit the AS attribute. Omitting the AS attribute or using an invalid value makes the Preload request equivalent to an XHR request, where the browser doesn’t know what it is getting and gets it at a fairly low priority.
- Preload compatibility
  
  Preload is currently supported by all major browsers. If it is an unsupported browser, it is ignored instead of reporting an error.
Prefetch Resource prompt
```
  <link rel="prefetch" href="/library.js" as="script">
Copy the code
```
Prefetch is one of the directives in the W3C Resource Hints standard. The use of prefetch is the same as preload, but the functionality is quite different. It basically tells the browser to get the resources it might need for the next navigation. This basically means that resources are fetched at a very low priority (because the browser knows that everything that is needed on the current page is more important than we guess what resources might be needed on the next page). This means that the resources prefetched are primarily used to speed up the next navigation rather than the current one.

In terms of browser compatibility, Prefetch supports Internet Explorer 11. It is also important to note that prefetch and preload resources that can be cached (for example, if there is a valid cache-control) are stored in the HTTP cache and placed in the browser’s in-memory cache; If the resource is not cacheable, it is not stored in the HTTP cache. Instead, it rises to the memory cache and stays there until it is used.
Use Webpack to support prefetch and preload

Webpack V4.6.0 + adds support for prefetching and preloading.
```
import(/* webpackPrefetch: true */ './path/to/LoginModal.js');
Copy the code
```
This generates and appends to the header of the page, indicating that the browser prefetches the login-modal-chunk.js file during idle time. And webpack adds a Prefetch hint as soon as the parent chunk completes loading.
```
  import(/* webpackPreload: true */ 'ChartingLibrary');
Copy the code
```
Preload Chunk starts loading in parallel when the parent chunk loads. Prefetch Chunk starts loading after the parent chunk finishes loading.
quicklink

Quicklink is a small (< 1KB minified/gzipped) NPM library from Google Chrome LABS designed to speed up subsequent page loads by preloading in-port links during idle hours.

Its main principles are:
- Examine links inside viewports (using Intersection Observer)
- Wait for the browser to be free so that page resources can be prefetched when the browser is free (using requestIdleCallback)
- Check whether the user is in a slow connection (using the navigator. Connection. EffectiveType) or enabled data save (using the navigator. Connection. SaveData)
- Preload the URL of the link (using or XHR). Provides some control over the priority of requests: default to low priority, use rel=prefetch or XHR, and for high priority resources try fetch() or fall back to XHR.
A demo provided by QuickLink shows that page load performance can be improved by 4 seconds using QuickLink

Lazy loading technique

The previous section covered a few techniques for preloading, followed by lazy loading. Lazy loading is lazy loading, which can greatly reduce the loading of invalid resources, thus improving the performance of the page. The core scenario for lazy loading is that resources outside the current viewport (or resources in non-critical render paths) do not need to be loaded. The most common examples of lazy loading at the code level are the Dynamic Import of third-party libraries or the react.lazy of components we use, which is essentially the Dynamic Import () syntax. Here are some other ways to implement lazy resource loading:

Intersection Observer

The Intersection Observers API allows users to know when observed elements enter or exit the browser’s viewport. With this feature, we can avoid loading resources that are not in the current viewport. In my open source project React SSR based music website imitating MOO music style, lazy loading of pictures using IntersectionObserver API** is also implemented, and the specific code is portal.

Among the many third-party front-end lazy loading libraries, there is a high-performance lightweight JS library Lozad.js, which can support lazy loading of img, picture, IFrame, video, audio, responsive pictures, background pictures and multi-background pictures and other resources. Unlike existing lazy-loaded libraries, which hook them to browser rolling events or periodically call getBoundingClientRect() on lazy-loaded elements, Lozad.js uses Intersection Observers API, which does not block the MAIN JS thread. Each call to getBoundingClientRect() forces the browser to rearrange the entire page, potentially causing the browser to stall.
Browser native image lazy loading
```
  <img src="image.png" loading="lazy" alt="..." width="200" height="200">
Copy the code
```
The code above will enable lazy loading of images native to browsers (chromium-based browsers and Firefox, browsers that don’t support loading will ignore it). This way we don’t have to use other JS libraries for lazy loading of images.

The loading property has three values:
1. Auto: Uses the default loading behavior of the browser, which is the same as not using the loading property.
2. Lazy: Delays loading a resource until it reaches a threshold of distance from the viewport.
3. Eager: Loads the resource immediately, no matter where it is on the page.
How to understand the viewport distance threshold when loading=lazy?

Chromium’s lazy loading implementation tries to ensure that off-screen images load early enough so that they are finished loading by the time the user scrolls around them. By capturing image resources before they are visible in the viewport, you maximize the chances that they will have been loaded by the time they become visible. So how do you load images as early as possible? In other words, how far away from the current viewport is an invisible image before the browser loads the following image? The answer is that the Chromium distance threshold is not fixed, depending on several factors:
1. The type of image resource.
2. Whether data-savings is enabled.
3. Current network status (effective Connection type).
According to the above three factors, Chromium is constantly improving the algorithm of the distance threshold, which not only saves the image download, but also ensures that the image is loaded when the user scrolls to the image.

Page rendering optimization

In the previous chapter, we summarized the optimization of network resources from the two perspectives of network connection and resource loading. Next, we will analyze and optimize the rendering of the page after we get the page resources.

Brief analysis of rendering process

First of all, we know that when the network process receives the response header of the request, if it checks that the content-Type field in the response header is text/ HTML, it will judge that this is an HTML type file and prepare a rendering process for the request, and the subsequent page rendering pipeline will be expanded in this rendering process. We can think of the HTML code as the blueprint for building the initial DOM of the browser page UI. Whenever a script element is parsed, the browser stops building the DOM from THE HTML and starts executing javascript code; When CSS text is received, it is converted to styleSheets that the browser can understand. So the core job of the renderer process is to turn HTML, CSS, and JavaScript into web pages that users can interact with.

HTML parsing

When the renderer process receives the submission message for the navigation and begins to receive THE HTML data, the main thread begins to parse the text string (HTML) and convert it into the Document Object model (DOM).

The general analysis process is as follows: When the renderer process receives a byte stream from the network process, the HTML parser converts the byte stream into multiple tokens (Tag tokens and text tokens). Tag tokens are divided into StartTag and EndTag, such as StartTag. Is EndTag. Then, by maintaining a Token stack structure, the newly generated tokens are continuously pushed and removed from the stack, the tokens are resolved into DOM nodes, and DOM nodes are added to the DOM tree.

If it exists in the HTML documentThe preload scanner looks at the Token generated by the HTML parser and sends the request to the network process.
Script analytical

When the HTML parser encounters aScript tag, the parser pauses parsing of the HTML document and must load, parse, and execute JavaScript code. Why is that? Because JavaScript can use things like document.write() to change the shape of the document, it changes the entire DOM structure. This is why HTML parsers must wait for JavaScript to run before they can continue parsing HTML documents.
CSS analytical

Having the DOM is not enough to know what the page will look like, because we can style the page elements in CSS. As with HTML files, browsers cannot directly understand the CSS styles of plain text, so when the rendering engine receives CSS text, it will perform a conversion operation to convert the CSS text to styleSheets that the browser understands, which we can retrieve with Document.stylesheets.

Now that the browser understands the structure of the CSS stylesheet, by standardizing CSS property values and adding CSS inheritance and cascading rules, we can calculate the specific style of each node in the DOM tree.
layout

Now the renderer process knows the structure of the document and the style of each node, but this is not enough to render the page, there is a layout phase to go through. Layout is a process of finding the geometry of elements. The main thread traverses the DOM and evaluates styles, and creates a layout tree that contains information such as xy coordinates and bounding box sizes. A layout tree may have a similar structure to a DOM tree, but it contains only information related to visible elements on the page. If display: None is applied to a node, the element is not part of the layout tree. Similarly, if you apply something like p::before{content:”Hi!” }, which is included in the layout tree even if it is not in the DOM.
Paint

Having DOM, style, and layout trees is still not enough to render the page; we also need to know the order in which these nodes are drawn. For example, we might set the Z-index for some elements, in which case drawing the elements in the order they were written in HTML would render incorrectly. In this drawing step, the main thread traverses the layout tree to create the drawing record. The painting record is the record of the painting process of “background first, then text, then rectangle”.

One of the most important aspects of the rendering pipeline is that at each step, new data is created using the results of the previous operation. For example, if something in the layout tree changes, the drawing order needs to be regenerated for the affected portions of the document. And that brings up the idea of redrawing and rearranging, which we’ll talk about later.
synthetic

Now that the browser knows the structure of the document, the style of each element, the geometry of the page, and the order in which it draws it, how does it draw the page? The easiest way to deal with this problem is to rasterize parts within the viewport (rasterization can be understood as converting layout information into pixels on the screen). If the user scrolls the page, the raster frame is moved and the missing parts are filled in with more rasters (pixels). That’s how Chrome handled rasterization when it was first released. Modern browsers, however, run a more complex process called compositing.

Compositing is a technique for dividing parts of a page into layers, rasterizing them individually, and compositing them into a single page in a separate thread called a compositing thread. If scrolling happens, because the layer has been rasterized, all it has to do is compose a new frame.
- layered
  
  To figure out which elements need to be in which layers, the main thread creates a LayerTree by traversing the layout tree. CSS transform animations, page scrolling, or page nodes that use z-Index generate dedicated layers.
- Rasterization operation
  
  Once the layer tree is created and the drawing order determined, the main thread submits this information to the composite thread. Composite threads and then raster each layer. A layer can be as large as the entire length of the page, so the composition thread divides them into blocks and sends each block to the raster thread. The raster thread rasterizes each graph block and stores them in GPU memory.
The composite thread can prioritize different raster threads so that things in (or near) the viewport can be rasterized first
- Synthetic display
  
  Once all the blocks have been rasterized, the composition thread collects information about the blocks (location in memory and location on the page) to generate a composition frame (that is, a frame of the page that contains information about all the blocks).
  
  This composite frame is then submitted to the browser process via IPC (interprocess communication), and multiple composite frames are then sent to the GPU for display on the screen. If there is a scrollevent, then the compositing thread creates the next compositing frame and sends it to the GPU.

From HTML, JS and CSS parsing to compositing of page frames, we’ve seen the rendering pipeline of a page. So how do we get performance optimizations for page rendering from this process?

Optimization methods derived from the rendering pipeline

We can take a few key points from the rendering pipeline above:

Each step in the rendering pipeline uses the results of the previous operation to create new data. For example, if something in the layout tree changes, the drawing order needs to be regenerated for the affected portions of the document.
Layout is a process of finding the geometry of elements. The main thread iterates through the DOM and evaluates the style and creates the layout tree.
The composition of page frames is done without reference to the main thread. The composite thread does not need to wait for style calculations or JavaScript execution.
When changes are made to a single layer, the rendering engine handles the transformations directly through the composite thread, and these transformations do not involve the main thread.

With these key points in mind, let’s look at rearrangement, redraw issues, and why CSS animations are more efficient than JavaScript animations:

Rearrangement: Updated element geometry (height, etc.). This means that reordering requires updating the rendering pipeline from the layout stage.
Redraw: Update the elements’ drawing attributes (font color, etc.) to enter the drawing phase directly, eliminating layout and layering.
Why CSS animations are efficient: If we animate elements with JS, the browser must run these operations between each frame. Most of our monitors refresh the screen 60 times per second (60 FPS); Only when each frame moves an object on the screen does the animation appear smooth to the human eye. If our animations were constantly changing the geometry of elements with JS, we would no doubt be triggering rearrangements frequently. Even if our animation rendering operation kept up with the screen refresh and JS calculations were running on the main thread, it might block our page. However, if we use CSS transform to animate, the browser will separate the animation element into a layer, and the subsequent transformations will be performed directly on the compositing thread and submitted to the GPU. Compared to JavaScript animation which requires JavaScript execution and style calculation, CSS animation is undoubtedly very efficient.

Also, there’s the WILL-change property of CSS. Will-change gives Web developers a way to tell browsers how that element will change. Using it, the browser creates a separate layer for each element, and when these transformations occur, the rendering engine handles them directly through the composition thread, improving the render. One of the things we should not abuse is that layer information is stored in memory, and too many layers can cause the page to be slow or consume too many resources.

Streaming rendering

The source of the rendering pipeline is when the rendering process receives a byte stream response as text/ HTML from the network process. The rendering process and the network process create a shared data pipeline. The network process receives the data and places it in the pipeline, while the renderer process reads the data from the other end of the pipeline and parses it into the DOM. This means that when the browser parses the DOM, it doesn’t wait for the entire HTML to start rendering, which is the browser’s progressive HTML rendering capability.

To take full advantage of this capability of the browser, we use server-side rendering (SSR) with streaming rendering on the server side. Streaming server rendering allows us to send HTML as blocks that the browser can render progressively upon receipt. This can greatly improve FP (First Paint) and FCP (First Contentful Paint) metrics. In React SSR, we used renderToNodeStream to implement asynchronous streams, which also greatly improved the first-byte response of the page. In my open source project React SSR based implementation of MOO music style music site also implemented the React renderToNodeStream pipe created by the response object, specific code portal.

In addition to streaming rendering on the server side, we can also use the Service Worker to implement streaming responses.

self.addEventListener('fetch'.event= > {
  var stream = new ReadableStream({
    start(controller) {
      if (/* there's more data */) {
        controller.enqueue(/* your data here */);
      } else{ controller.close(); }}); });var response = new Response(stream, {
    headers: {'content-type': /* your content-type here */}}); event.respondWith(response); });Copy the code

Once event.respondwith () is called, the page whose request triggered the fetch gets a stream response and continues to read from the stream as long as the Service Worker continues to attach data to enqueue(). The response from the Service Worker to the page is truly asynchronous and we have complete control over the fill flow. Here’s a full demo.

That is, if we combine the dynamic data of the streaming response on the server side with the cached data of the streaming response implemented by the Service Worker, we can really achieve fast streaming response.

Article Reference:

Web Vitals
Assessing Loading Performance in Real Life with Navigation and Resource Timing
Navigation Timing Level 2
Custom metrics
Working Principle and Practice of browser – Li Bing
Browser-level image lazy-loading for the web
Make use of long-term caching
Service worker caching and HTTP caching
Preload, Prefetch And Priorities in Chrome
Inside look at modern web browser (part 3)
Stream Your Way to Immediate Responses
Rendering on the Web