When we open a web page, the browser will present the content of the web page to us. Understanding the process of a web page from request to full rendering can help us develop a better performance of the web site

A more responsive Web site provides a better user experience, waiting times for resources to load, and single-threaded browser execution in most cases are the two main causes of Web performance

The request phase

When we enter the URL from the browser’s navigation bar, the browser performs a series of actions to retrieve the document’s contents

The DNS

When accessing a website, the browser must search for the IP address corresponding to the domain name

  1. The browser checks whether the local hosts file contains the corresponding domain name. If yes, the browser directly obtains the corresponding IP address
  2. If the DNS cache is not configured locally, the system checks whether the DNS cache has the IP address of the domain name. If so, the SYSTEM obtains the IP address stored in the cache (the cache is retained locally only for a period of time).
  3. If the DNS server does not exist in the cache, obtain the domain name IP address from the DNS server configured on the local host

Establish a connection

After finding the IP address, the browser attempts to communicate with the server using the SYN-ACK/SYN-ACK three-way handshake message

The TLS encryption

If we are using HTTPS, in addition to establishing a connection through a three-way handshake, we also need to negotiate TLS to establish an encrypted connection

  1. The client requests a digital certificate from the server
  2. The server transfers the certificate public key to the client
  3. The client verifies the validity of the certificate. If the certificate is trusted (or if the user chooses to trust an untrusted certificate), the browser generates a random string of keys that are encrypted with the public key and sent to the server (asymmetric encryption)
  4. After receiving the encrypted key, the server decrypts the key using the private key
  5. The server uses the key to encrypt the transmitted content (symmetric encryption) and sends it to the client. The client decrypts the content using the key

Request and response

Once the connection is established, data is transferred and TCP links are routed through congestion control algorithms (see article). The size of the first response packet is usually 14KB. After the client receives the response packet, the size of the next packet is twice that of the previous packet until the threshold is reached or packet loss occurs. By repeatedly exploring network transport capabilities to determine the appropriate transport speed, page content (HTML only) less than 14KB is an important point for optimizing Web performance

Parsing stage

When the browser receives the first chunk of data, it begins to parse the received content. Even if the page is larger than 14KB, the browser will parse it based on the data it has to hand over, so it is important for optimization to include what the browser needs in the first 14KB. Although parsing begins as soon as it receives the data, the browser needs to parse all the HTML, CSS, and JS before it can display the page on the screen

Build a DOM tree

The browser converts an HTML string to a DOM number in roughly three steps:

  1. HTML text is converted to token
  2. Convert a token to a node that carries its attributes and their values
  3. Nodes are connected and organized into a DOM tree

Good HTML document format will improve the efficiency of DOM tree construction. The more nodes, the longer the DOM construction time.

During HTML parsing, if non-blocking resources (images, links, external async/defer script, etc.) are encountered, the wait will be skipped and the subsequent HTML will continue to be parsed, but the following points need to be noted:

  • Although link CSS does not block THE parsing of HTML, if ordinary (without async/defer) script tags are encountered in the parsing process, because JS will block the parsing of HTML, and JS has apis for manipulating CSS, so it needs to wait for the previous CSS parsing to be completed
  • The script that added the defer attribute does not block HTML parsing and executes js after the HTML parsing is complete, after which the browser will trigger the DOMContentLoaded event
  • Script with async property will not block HTML parsing, but js code will be executed immediately after the JS file is downloaded. If the JAVASCRIPT file is not downloaded after the HTML parsing is completed, the browser will trigger DOMContentLoaded and will not wait for JS to download

If you encounter a blocking resource (without script tags with async/defer, etc.), the browser will pause parsing of the HTML in favor of parsing the current resource

Preloading scanner: when the browser builds a DOM tree, the preloading scanner will retrieve resources in the background and initiate resource requests in advance according to the priority. When the HTML is parsed to the location of resources, resources may have started to download or have been downloaded, reducing browser blocking

Build CSSOM tree

CSSOM is built like A DOM in that the browser iterates over CSS rules, adding style attributes to each associated node, starting with the most general rules that apply to the node, and gradually applying more specific rules to recursively optimize the style of the computation

Rendering phase

Once the HTML and CSS are parsed (DOM and CSSOM are built), the browser starts rendering the page, merging the previously built DOM and CSSOM, and drawing it to the screen

Render tree

DOM and CSSOM are combined to generate a rendering tree, which traverses from the root node of the DOM and applies the corresponding style rules on CSSOM to each visible node (display attribute not equal to None) to determine the computational style of each node

Visibility: The node hidden will also appear in the rendering tree, and will take up space on the page, although the style will not be displayed

Calculate the Layout

Most nodes on a page are a box model, and layout is the process of determining the width, height and position of each node in the rendering tree on the page.

Once the rendering tree is built, the browser starts traversing from the root node to determine the size and position of elements based on the viewport size and box model attributes of each element. If it encounters elements of uncertain size (images with undefined width and height), it will provide placeholders for them and recalculate when the size is determined

After the layout calculation is complete, the browser recalculates the node size and location if any action occurs that affects the layout of the page. This process is called backflow.

Paint

When the layout calculation is complete, the browser needs to combine the viewport size and page scrolling to determine the elements that need to be displayed on the page and convert the elements into actual pixel data

  • In order to achieve a 60 FPS drawing effect, the style calculation, layout calculation and drawing process need to be completed in 1000ms / 60 = 16.67ms
  • The drawing process can be layered. Some specific elements (video, canvas, CSS property including 3D transform /will-change/opacity element, etc.) will elevate the drawing process to its own layer. Although the memory management cost is increased, the drawing performance is also improved

Compositing layers

If the page changes, only redraw the corresponding layer. Ideally, the layers should not interact with each other, reducing the redraw area, but if redrawing occurs, the browser will start over from Layout