This post was originally posted on my GitHub blog

If there’s only one question an interviewer can ask to gauge the breadth and depth of a person’s front-end knowledge, it’s from typing urls to showing what’s going on on the page. Why do you say so? Because from the input URL to the page display is the front end of the “core line”, figure out this problem, can stand in a higher dimension to understand the front end. Today, I will combine the recent period of learning and accumulation, and try to explain this main line as clearly as I can, to help you at the same time to help yourself to figure out what happened from the input URL to the page display.

Multi-process architecture for browsers

Before we get started, I thought it would be worthwhile to take Chrome as an example and introduce the multi-process architecture of modern browsers, as shown below:

The main parts are:

  • The Browser Process is also called the main Process

    • UI threads, which control UI such as buttons and input fields on the browser
    • NetWork threads are responsible for downloading resources
    • Storage threads are responsible for accessing cached files locally
  • The Renderer Process, also known as the browser kernel

    • JS engine, which is responsible for executing JavaScript, and that’s where JS is called single thread
    • GUI rendering thread, responsible for rendering resources, mutually exclusive with JS engine (one run, one suspend)
    • The event triggers the thread, manages the event loop (mouse click, setTimeout, Mutation Observer, Ajax, and so on), and places the events sequentially on the JS execution queue
    • Timer thread,setTimeoutIt is not a JS function, just an interface that the browser opens to JS
    • Asynchronous request thread, processingAJAXRequest, notified via the callback functionEvent triggered process
  • GPU process that communicates with gpus

  • The third party plugin process is the browser plugin we installed

With the multi-process architecture of the browser out of the way, let’s start talking about what happens when the browser enters a URL and displays a page.

1. Build the request

After entering the URL, the UI thread in the main process receives the USER’s URL and determines whether the user entered a Query or a URL. If it’s a URL, the URL is forwarded to the network thread, which builds the request line, and when it’s built, the browser is ready to make the web request.

// The request method is GET, the path is the root path, and the HTTP version is 1.1
GET / HTTP/1.1
Copy the code

2. Search for strong cache

Before making a real network request, the browser checks the browser’s strong cache and returns a copy of the resource file if a match is made. Otherwise, go to the next step.

2.1 What is strong Caching

The browsing cache strategy is divided into strong cache and negotiated cache. The fundamental difference between them is whether a request is required. In simple terms, a strong cache is your local file (stored on your hard drive or in memory) that you can access immediately; To negotiate the cache, you need to send a request to the server to ask whether the resource is updated. If no update is made, you need to access the local cache. If updated, the server returns the updated resource file.

2.2 Implementation of strong cache

In the HTTP/1.0 era, strong caching was implemented through the Expires field in the HTTP response header. Expires means expired, so the content of this field is an absolute expiration date, For example, Expires:Wed, 05 Apr 2020 00:55:35 GMT. The browser determines whether to read a copy of the resource in the cache by comparing it to the user’s local time. This brings up the obvious problem that the user can modify the local time to invalidate the cache.

The cache-control field was added to HTTP/1.1 to solve this problem by setting cache-control: Max-age =XXX, which can realize the cache expiration after XXX seconds (relative time), so as to avoid the user can modify the local time to make the cache invalid.

Cache-control takes precedence when both cache-control and Expires exist.

3. The DNS

To send a real network request, you need to perform DNS resolution first to find the server IP address corresponding to the URL. About the DNS resolution process, I briefly introduce, roughly is to search the local DNS cache, ask the local DNS server, then ask the root DNS server, level 1 DNS server, level 2 DNS server, finally found the IP address layer by layer back. Here’s another picture to make it a little bit more intuitive.

4. Establish the TCP connection

After you know the IP address of the server, you can establish a connection with the server. You can establish a connection with the server in two modes: reliable TCP and unreliable UDP. HTTP is based on TCP, so you need to establish a TCP connection with the server. How do you set it up? Through three handshakes. The flow of the three-way handshake is shown below.

So have you ever wondered, why do you have to shake hands three times instead of two or one?

The browser and server need to ensure that the other side can send and receive data properly. If you shake hands twice, the client knows if the server can send or receive data, but the server only knows if the client can send data, not if the client can receive data.

5. Send a request and receive a response

When a TCP connection is established, the browser can communicate with the server, and HTTP data is transferred during this communication. Here is a complete example of an HTTP request.

When the server receives an HTTP request, it returns an HTTP response to the browser. Here is a complete example of an HTTP response.

The server will tell the browser the result of its processing through the status code in the response line. Common status codes are as follows:

  • 2XX: Success, the most common is 200 OK
  • 3XX: Further operations are required, such as 301 permanent redirection, 302 temporary redirection, 304 unmodified
  • 4XX: Request error, such as the most common 404 resource not found, and 403 forbidden request
  • 5XX: server error, such as 500 server internal error, 502 gateway error

6. Search the negotiation cache

In the previous step, if the status code in the HTTP response line is 304 (Not Modified) and the content is empty, it tells the browser “The resource on the server is the same as a copy of your local cache, just take it from the cache.” This is how the negotiated cache works. When the strong cache expires, or when cache-control sets no-cache, the negotiated cache takes place. The browser sends a request to the server and determines whether to read from the cache based on the status code in the response header.

6.1 Implementation of negotiated cache

Negotiated caching is implemented through last-Modified in HTTP/1.0 and e-Tag in HTTP/1.1.

Last-modified verification rule: The first time the browser sends a request, the server puts last-Modified in the response header and returns the resource. The next time the browser sends the same request, it puts the last-Modified content in the if-Modified-since field in the request header. The server will compare the last modification time of the requested resource with the received one and return HTTP 304 if they are the same. If they are different, return HTTP 200 with the latest resource. The detailed process can be seen below:

E-tag validation rule: The validation process is similar to last-Modified, except that the browser receives eTag the first time and if-none-match the second time. The difference with last-Modified is that the e-tag content is the unique identifier of the resource. Last-modified content is the Last modification time and the E-tag has a higher priority than last-Modified.

Why, you may wonder, did HTTP/1.1 introduce e-tag for negotiated caching?

  • Some resources are periodically rewritten, but the content is exactly the same
  • Some resources may be modified, but the changes do not require the user to re-download them (modify comments or spellings)
  • Some resources change in less than a second (such as a live monitor), so last-Modified time granularity is insufficient

In these cases, e-tag, which uniquely identifies the resource, works better than Last-Modified.

7. Disconnect the TCP connection

After the browser receives the resources from the server, disconnect the TCP connection. Four waves are required to disconnect a TCP connection. The picture below shows the process of four waves.

So why four waves, not three?

Because of the third wave, the server notifies the client that the data has been sent. If there is no third wave, but the client directly acknowledges the fourth wave to close the connection, the client cannot receive the data that the server has not sent, resulting in data loss. Robert Kahn and Vinton Cerf, who invented TCP more than 40 years ago, invented such a rigorous data transmission protocol.

8. Parse HTML and construct a DOM tree

After completing the network request, the browser’s renderer process parses and renders the resource. First, for HTML files, the browser generates a DOM tree (a browser-understood tree, or Document Object Model) from it.

So how exactly do browsers build DOM trees? Through the following four steps.

  1. Translation (Conversion). The browser reads HTML in the form of raw bytes and translates those bytes into individual characters in the specified format, such as UTF-8.
  2. Serialization (Tokenizing). The browser converts the string from the first step into different tokens, for example<html>.<body>Etc., each tag has its own meaning and rules.
  3. Lexical analysis (Lexing). These tags are converted into “objects” to define their properties and rules.
  4. Building the DOM (DOM Construction). Because HTML tags have specific inclusion rules, such as HTML contains body and body contains div, we know the parent-child relationship between tags from the objects generated in the previous step, so we can build a DOM tree.

Each time a browser processes an HTML file, it goes through the above four processes. When HTML is more complex, the whole process can be time-consuming.

9. Style calculation and construction of CSSOM tree (Style calculation)

The purpose of style calculation is to calculate the specific style of each element in the DOM node above. This stage can be roughly divided into three steps:

1. Convert CSS into styleSheets that browsers understand

As with HTML text, browsers cannot directly understand plain TEXT styleSheets, so when the renderer receives CSS text, it performs a conversion operation that converts it into styleSheets, a structure that browsers can understand.

In the Chrome console type Document. styleSheets to see the following structure:

2. Transform property values in the stylesheet to standardize them

3. Calculate the specific style of each node in the DOM tree

Now that the style properties have been standardized, the next step is to calculate the style properties for each node in the DOM tree. How?

This is where CSS inheritance and cascading rules come in.

The first is CSS inheritance, where each DOM node inherits its parent’s style. The following figure shows the style inheritance process more clearly:

The second is cascading styles, or CASCADING style sheets. CSS selector priority:

  • Inline style > ID > class > tag

  • Inline style > Internal style > external CSS file

  • Generally speaking, the more specific the higher the priority,! Important is the highest priority, but use it with caution.

After calculation, CSSOM (CSS Object Model) tree will be generated, as shown in the following figure:

Let’s review how CSS is handled by browsers. It is similar to HTML, starting with bytes, translating into characters, serializing, generating nodes, and eventually generating CSSOM (CSS Object Model).

Layout (Layout) 10.

We have the DOM tree and the style of each node in the DOM tree, but we don’t yet know the geometry of the DOM elements, so we need to figure out the geometry of the visible elements in the DOM tree. We call this process layout. The layout phase can be divided into two sub-phases, creating the layout tree and calculating the layout.

The layout tree is constructed like this:

We can observe that all display: None nodes in the DOM tree do not appear in the layout tree. So the process of building a layout tree can be summarized as follows:

  • Walk through all the visible nodes in the DOM tree and add them to the layout tree
  • Invisible nodes are ignored by the layout tree, such asheadAll content under the tag, and styles asdisplay: noneThe elements of the

After building the layout tree, it’s time to calculate the actual coordinates of the nodes in the layout tree. As for the specific calculation process is very complicated, we temporarily skip, waiting for the supplement in the future.

Dividing into layers (Dividing into layers)

Now that we have a layout tree, and we have calculated the exact location of each element, can we start drawing? It’s not. Before drawing, there is a Layer Tree generation process.

Why do I need a layer tree? Because modern front-end pages have very complex and diverse effects, such as page scrolling, sorting in the Z-index direction, etc., in order to achieve these effects more easily, the rendering process also needs to generate a Layer Tree for a specific node. The mapping between the layout tree and the layer tree is roughly as follows:

What does it take for the rendering process to create a new layer for a particular node?

1. Elements with cascading context attributes are promoted to a separate layer

A page is a two-dimensional plane, but a cascading context gives HTML a three-dimensional concept. These HTML elements are distributed according to their priority along the Z axis perpendicular to the two-dimensional plane. The specific priorities are as follows:

Positive Z-index > Z-index = 0 > inline > float > Block > Negative Z-index > border > background

2. Areas that need to be cropped will also be created as layers

Clipping means that the content to be displayed is larger than its container (such as a 200 x 200 pixel div with 1000 words in it), and if there are scrollbars, the scrollbars will be raised to separate layers, as shown below.

12. Painted (Paint)

After building the layer tree, the rendering engine draws each layer in the tree. This is done by dividing the rendering of a layer into smaller instructions, which are then grouped into a list of instructions to draw, as shown in the following image.

You can open the Layers TAB of the Chrome Developer Tool and select the “Document” layer to actually experience the process of drawing a list. To give you a schematic diagram for reference:

The drawing list of document is circled in the figure. Dragging the progress bar on the right can reproduce the drawing process of the list. Isn’t it amazing?

13. Raster

After the draw list is generated, it is rasterized. A draw list is simply a list of draw orders and draw instructions that are actually done by the compositing thread in the rendering engine. You can see the relationship between the render main thread and the composition thread in the following image:

As shown in the figure above, when the drawing list of layers is ready, the main thread will draw the listsubmit(commit) to the composite thread, so how does the composite thread work next? Here needs to first introduce a H5 mobile page development will be more familiar with the tag:

<meta name="viewport" content="width=device-width, initial-scale=1">

The viewport is what the user can actually see.

Most of the time, the page is much longer than the height of the screen, so the layers are large, but the user can only see part of them through the viewport, so drawing all the layers at once is expensive and unnecessary.

For this reason, the composition thread will divide the layer into tiles, which are usually 256×256 or 512×512 in size, as shown below:

The composition thread then prioritizes bitmap generation based on the blocks near the viewport, and the actual bitmap generation is performed by rasterization. The so-called rasterization refers to the transformation of a map block into a bitmap. The graph block is the smallest unit for rasterization. The renderer maintains a rasterized thread pool, where all rasterization is performed, as shown below:

Generally, GPU is used to accelerate the generation of bitmaps during rasterization. The process of using GPU to generate bitmaps is called fast rasterization, or GPU rasterization. The generated bitmaps are stored in GPU memory.

14. Composite and Display

Once all the tiles have been rasterized, the composition thread generates a command to draw the tiles — “DrawQuad” — and submits the command to the browser process.

The browser process has a component called viz that receives DrawQuad commands from the compositing thread, draws its page contents into memory, and displays them on the screen. This process can be represented as follows:

At this point, through this series of stages, the HTML, CSS, JavaScript, etc., written by the browser will display a beautiful page.

Summary of rendering pipeline

The following diagram summarizes the entire rendering and display process after receiving resources from the server, which we call the rendering pipeline.

Combined with the above image, a complete rendering process can be summarized as follows:

  1. The renderer transforms the HTML content into a readable DOM tree structure.
  2. The rendering engine converts CSS styleSheets into browser-understandable styleSheets, computes and generates CSSOM trees.
  3. Create a layout tree and calculate the layout information for the elements.
  4. Layer the layout tree and generate the layer tree.
  5. Generate a draw list for each layer and submit it to the composition thread.
  6. The composite thread divides the layer into blocks and converts the blocks into bitmaps in the rasterized thread pool.
  7. The composite thread sends the DrawQuad command to the browser process.
  8. The browser process generates the page from the DrawQuad message and displays it on the monitor.

conclusion

The whole process from entering URL to page display is probably like this, each of which can extend a lot of knowledge. If you want to understand some of these parts more thoroughly, check out the references below. After continuous polishing, the biggest gains are two:

  1. Through the in-depth excavation of the main line of knowledge in a certain field, we can string together the discrete knowledge and form a system
  2. In fact, learning any field is a process of thickening the knowledge first and then thinning it, but never have thick without thin

‘.


Refer to the link

  1. Caching in HTTP
  2. Inside look at modern web browser (part 1)
  3. Inside look at modern web browser (part 3)
  4. Constructing the Object Model
  5. Interviewer, stop asking me for three handshakes and four waves
  6. Rendering Flow (Part 2) : How do HTML, CSS, and JavaScript become pages?