Multi-process architecture for browsers

Browsers are multi-process, and they work because the system allocates resources (CPU, memory) to its processes.

Take Chrome as an example to introduce the multi-process architecture of modern browsers:

The main parts are:

  1. The Browser Process, also known as the main Process, has only one.
  • UI Threads: Controls UI such as buttons and input fields on the browser
  • Network threads: Download local resources
  • Storage threads: Access cached files locally
  1. Renderer (browser kernel) (internally multithreaded)
  • JS engine: Responsible for executing JavaScript. That’s why JS is a single thread
  • GUI rendering thread, responsible for rendering resources, mutually exclusive with JS engine (one run, one suspend)
  • Event-triggering thread: Manages the event loop, placing events in sequence on the JS execution queue
  • Timer thread: setTimeout is not a JS function, just an interface that the browser opens to JS
  • Asynchronous request thread: Handles Ajax requests and notifies the event triggering process through callback functions
  1. GPU process: Communicates with gpus. A maximum of one GPU process is used for 3D drawing.

  2. Third-party plug-in processes. For example: the installed browser plug-in

As mentioned above, let’s take a look at the relationship between processes and threads

Let’s start with a figurative metaphor:

- PROCESS is a factory, the factory has its independent resources - factories are independent of each other - threads are workers in the factory, multiple workers collaborate to complete a task - one or more workers within the factory - shared space between workersCopy the code

To refine the concept:

- Factory resources -> system allocated memory (a separate block of memory) - factories independent of each other -> processes independent of each other - multiple workers collaborating to complete a task -> multiple threads collaborating to complete a task within a process - one or more workers within a factory -> a process consisting of one or more threads - Shared space between workers -> The memory space of a program (including code segments, data sets, heaps, etc.) shared between threads of the same processCopy the code

Official definition:

  • A process is the smallest unit of CPU resource allocation (the smallest unit that can own resources and run independently)
  • A thread is the smallest unit of CPU scheduling.

Opening a web page in the browser is like starting a new process (the process has its own multithreading)

Open the Task Manager in Chrome and you can see:When multiple web pages are opened, the browser process, network process, and GPU process (if Audio and video content is available, the Audio Service process is also provided) are shared. Each plug-in starts a process independently. By default, Enable a separate render process for each page (if the two sites have the same root domain and protocol (HTTPS or HTTP) and open one page from the other, the render process will be reused)

With the multi-process architecture of the browser in mind, what happens when the browser enters a URL and displays a page?

What happens when the browser goes from entering the URL to presenting the page?

1. Build the request

After entering the URL, the UI thread of the main process receives the USER’s URL and determines whether the user entered a query or a URL.

If it’s a URL, the URL is forwarded to the network thread, which builds the request line, and when it’s built, the browser is ready to make the web request.

2. Search for strong cache

Before making a real network request, the browser checks the browser’s strong cache and returns a copy of the resource file if a match is made. Otherwise, go to the next step.

2.1 What is strong Caching

What is caching?

You can think of it as a resource copy. When we request resources from the server, we will copy a copy of the resources locally for the convenience of reading them next time.

It is different from localStorage localStorage, cookies, etc. :

  • Local storage is more data records, storage capacity is small, for local operation.

  • And cache is more to reduce resource requests, mostly used to store files, storage capacity is relatively large.

As you can see, the primary purpose of caching is to reduce unnecessary requests.

  • For example, if the user’s profile picture is changed once for a long time, they have to request the same picture every time. Communication increases the display time of the page, and too many unnecessary requests also increase the pressure on the server. If the image is cached locally, it can be read and loaded locally each time, and no request is made.

The benefits are obvious:

  • Reduced time to optimize the user experience
  • Reduce flow consumption
  • It also reduces server stress

As far as browsers are concerned, caches generally fall into four categories, in order of browser read priorities:

  • Memory Cache
  • Service Worker Cache
  • HTTP Cache
  • Push Cache

The following is HTTP Cache, which is divided into strong Cache and negotiated Cache.

The fundamental difference is whether the request needs to be sent.

  • Strong cache: read from the local copy (saved in disk or memory), immediately accessible, no request to the server, return status code 200;

  • Negotiation cache: You need to send a request to the server for comparison, asking whether the resource is updated, if not, access the local cache. If updated, the server will return the updated resource file with the status code 304.

2.2 Implementation of strong cache

Strong caching mainly includes Expires and cache-control.

  1. expires

Expires is a cache field defined in HTTP1.0. When we request a resource and the server returns, we can add the Expires field to Response Headers to indicate the expiration date of the resource.

expires: Thu, 03 Jan 2019 11:43:04 GMT
Copy the code

It is a timestamp (Greenwich Mean time). When the client requests the resource again, the client time will be compared with the timestamp. If the time is longer than the timestamp, the cache resource will be used directly.

However, there is a big problem that sending requests is using client time to compare. On the one hand, the time of the client and the server may be inconsistent. On the other hand, the time of the client can be modified by itself (for example, the browser follows the system time, and changing the system time will affect the system time), so it may not meet the expectation.

That is, users can change their local time to invalidate the cache

  1. cache-control

The cache-control field was added to HTTP/1.1 to solve this problem. Max-age =XXX, which can realize the cache expiration after XXX seconds (relative time), so as to avoid the user can modify the local time to make the cache invalid.

This field is a length of time, in seconds, indicating the number of seconds after the resource becomes invalid.

This cache is used when a client requests a resource and finds that the resource is still in the valid time, independent of the client time.

  1. Cache-control takes precedence when both cache-control and Expires exist.

3. The DNS

To send a real network request, you need to perform DNS resolution first to find the server IP address corresponding to the URL.

The process of DNS resolution is described as follows: Search the local DNS cache first, ask the local DNS server if you cannot find it, then ask the root DNS server, level 1 DNS server, and level 2 DNS server, and finally transfer the IP address found layer by layer back.

Here’s another picture to make it a little bit more intuitive.

4. Establish the TCP connection

After you know the IP address of the server, you can establish a connection with the server in either of the following modes: reliable TCP or unreliable UDP.

The HTTP protocol is based on TCP, so you need to establish a TCP connection with the server. How do you set it up? Through three handshakes. As shown below:

The three-way handshake is to confirm the sending and receiving capabilities of the client and server.

Both parties are initially in the CLOSED state, and then the server listens for a port and goes into LISTEN.

  • The client initiates the connection and sends the SYN. Then the client becomes syn-sent.
  • When the server receives the SYN, it returns the SYN and ACK (corresponding to the SYN sent by the client) and becomes syn-recd.
  • The client then sends an ACK to the service end and changes to the ESTABLEISHED state. After receiving the ACK, the service end changes to the ESTABLEISHED state.

Where SYN requires peer acknowledgment, ACK does not, so SYN consumes a sequence number that ACK does not. (Remember this rule: The SEQUENCE number of a TCP packet must be consumed if the peer end needs to confirm the packet.)

Why not shake hands twice?

The browser and server need to ensure that the other can send and receive data. If you shake hands twice, the client knows that the server can send data, but the server only knows that the client can send data, not that the client is ok to receive data.

Is it ok to carry data during a three-way handshake?

Yes, but only for the third time. The reasons why the third time was not possible are as follows:

  • Prevent hackers. It increases the risk of server attacks and prevents hackers from placing a large amount of data in the SYN packet during the first handshake, which consumes a large amount of time and memory space on the server.
  • The Established state is relatively safe. For the third time, this state has confirmed that the server’s receiving and sending capabilities are normal.

5. Send a request and receive a response

When a TCP connection is established, the browser can communicate with the server, and data in HTTP is transferred during this communication. Here is a complete example of an HTTP request:When the server receives an HTTP request, it returns an HTTP response to the browser. Here is a complete example of an HTTP response.The server will tell the browser the result of its processing through the status code in the response line. Common status codes are as follows:

  • 2XX: Success, the most common is 200 OK
  • 3XX: Further operations are required, such as 301 permanent redirection, 302 temporary redirection, 304 unmodified
  • 4XX: Request error, such as the most common 404 resource not found, and 403 forbidden request
  • 5XX: server error, such as 500 server internal error, 502 gateway error

6. Search the negotiation cache

In the previous step, if the status code in the HTTP response line is 304 (Not Modified) and the content is empty, it tells the browser “The resource on the server is the same as a copy of your local cache, just take it from the cache.”

This is how the negotiated cache works. When the strong cache expires, or when cache-control sets no-cache, the negotiated cache takes place. The browser sends a request to the server and determines whether to read from the cache based on the status code in the response header.

6.1 Implementation of negotiated cache

The negotiated cache is mainly implemented through last-Modified and E-tag.

  1. last-modified

Last-modified Indicates the time when the resource was last modified. When enabled, a last-Modified field is added to the response after the resource is requested, as follows:

last-modified: Thu, 20 Dec 2018 11:36:00 GMT
Copy the code

When the resource is requested again, the request header will carry the if-modified-since field, which is the last-modified value returned previously, such as if-Modified-since :Thu, 20 Dec 2018 11:36:00 GMT.

The server will compare the last modification time of the field and the resource. If they are the same, it proves that the field has not been modified, and the browser can directly use the cache and return 304. If they are inconsistent, the modified resource is returned and last-modified to the new value.

The process can be seen in the following figure:

But last-Modified has two disadvantages:

  • As soon as an edit is made, regardless of whether the content actually changed, it is judged by the time of the last modification and returned as a new resource, resulting in an unnecessary request response, which is exactly what caching is supposed to do, to avoid unnecessary requests.
  • The timing is only accurate to the second, and if a change within a second does not detect an update, the browser is still told to use the old cache.
  1. etag

To solve these problems, eTAG was created. Etags will generate a unique identification string based on the content encoding of the resource, and will generate different ETags whenever the content is different. With eTAG enabled, an eTAG field will be added to the response returned after requesting the resource, as follows:

etag: "FllOiaIvA1f-ftHGziLgMIMVkVw_"
Copy the code

When the resource is requested again, the request header contains the if-none-match field with the previously returned ETag value, for example, if-none-match:” FlloiAIVA1F-fthGzilGMIMvkvw_ “.

The server will generate the corresponding identifier string according to the current content of the resource and compare it with the field. If it is consistent, the local cache can be directly used and 304 can be returned. If not, return a new resource (status code 200) and modify the returned ETAG field to the new value.

Etag priority is also higher.

Why did HTTP/1.1 introduce ETAG for negotiated caching?

  • Some resources are periodically rewritten, but the content is exactly the same
  • Some resources may be modified, but the changes do not require the user to re-download them (modify comments or spellings)
  • Some resources change in less than a second (such as a live monitor), so last-Modified time granularity is insufficient

7. Disconnect the TCP connection

After the browser receives the resources from the server, disconnect the TCP connection. You have to do four waves. As shown below:

  • First wave: The client sends a FIN to close the data transfer from the client to the server, and the client enters the FIN_WAIT_1 state.
  • Second wave: After receiving the FIN, the server sends an ACK to the client. The ACK sequence number is +1 (the same as SYN, one FIN occupies one sequence number). The server enters CLOSE_WAIT state.
  • Third wave: The server receives the FIN to disable data transfer from the server to the client and enters the LAST_ACK state
  • Fourth wave: The client receives the FIN. The client enters the TIME_WAIT state and then sends an ACK to the server, confirming that the number +1 is received. The server enters the CLOSED state and completes the four wave motions.

Popular terms

  • Client: I’ve covered everything

  • Server: I’ve heard everything, but wait for me, I’m not finished yet

  • Server: Ok, I’m done

  • Client: Ok, then our communication is over

Why do you wave four times, not three?

Because of the third wave, the server notifies the client that the data has been sent

If the client directly confirms the fourth wave to close the connection instead of the third wave, the client cannot receive the incomplete data sent by the server, resulting in data loss.

Transmission of data has to finish where it starts.

8. Parse HTML and build a DOM tree

After completing the network request above, the browser’s renderer process parses and renders the resource.

First, for HTML files, the browser generates a DOM tree (a tree structure that the browser understands, Document Object Model) from it.

So how do browsers build DOM trees? Through the following four steps.

  1. Conversion: The browser reads HTML in the form of raw bytes and translates those bytes into individual characters in a specified format, such as UTF-8.
  2. Serialization (Tokenizing): The browser converts the string from the first step into different tokens, for example<html>Etc., each tag has its own meaning and rules.
  3. Lexing: Convert these tags into “objects” to define attributes and rules.
  4. DOM construction: Because HTML tags have specific inclusion rules, such as HTML contains body, body contains div, and we know the parent-child relationship between tags from the objects generated in the previous step, we can build a DOM tree.

Each time a browser processes an HTML file, it goes through the above four processes. When HTML is more complex, the whole process can be time-consuming.

9. Style calculation and CSSDOM tree construction

The purpose of the Style calculation is to calculate the specific Style of each element in the DOM node above. This stage can be roughly divided into three steps:

  1. Convert CSS to styleSheets that browsers can understand

As with HTML text, browsers cannot directly understand plain TEXT styleSheets, so when the renderer receives CSS text, it performs a conversion operation that converts it into styleSheets, a structure that browsers can understand.

In the Chrome console type Document. styleSheets to see the following structure:

  1. Transform property values in the stylesheet to standardize them

  1. Figure out the specific style of each node in the DOM tree

Now that the style properties have been standardized, the next step is to calculate the style properties for each node in the DOM tree.

CSS inheritance rules and cascading rules are involved here:

  • CSS inheritance means that each DOM node inherits its parent’s style

  • After calculation, CSSDOM tree will be generated, as shown in the following figure:

  1. To summarize, the browser process for CSS is similar to HTML, starting with bytes, translating into characters, serializing, generating nodes, and eventually generating a CSSDOM.

10. The Layout of the building

Although we have the style of the DOM tree and each node in the SOM tree, we don’t yet know the geometry of these DOM elements, so we need to figure out the geometry of the visible elements in the DOM tree. We call this process layout.

The layout phase can be divided into two sub-phases: creating the layout tree and calculating the layout.

The layout tree is constructed like this:

We can observe that all display: None nodes in the DOM tree do not appear in the layout tree. So the process of building a layout tree can be summarized as follows:

  • Walk through the visible nodes in the DON tree and add them to the layout tree
  • Invisible nodes are ignored by the layout tree, such asheadAll content under the tag, and styles asdisplay: noneThe elements of the

After building the layout tree, the next step is to calculate the actual coordinates of the nodes in the layout tree. (Skip it for now)

Dividing into layers (Dividing into layers)

You have a layout tree, and you also calculate the exact location of each element. Before drawing, there is a layer tree generation process.

Why do I need a layer tree?

Because modern front-end pages have very complex and diverse effects, such as page scrolling, sorting in the Z-index direction, etc., in order to achieve these effects more easily, the rendering process also needs to generate a Layer Tree for a specific node. The mapping between the layout tree and the layer tree is roughly as follows:

So what does it take for the rendering process to create a new layer for a particular node?

  1. Elements with cascading context attributes are promoted to a separate layer

A page is a two-dimensional plane, but a cascading context gives HTML a three-dimensional concept. These HTML elements are distributed according to their priority along the Z axis of the vertical two-dimensional plane. The specific priorities are as follows:

Positive Z-index > Z-index = 0 > inline > float > Block > Negative Z-index > border > background

  1. Areas that need to be cut will also be created as layers

Clipping means that the content to be displayed is larger than its container (such as a 200 x 200 pixel div with 1000 words in it), and if there are scrollbars, the scrollbars will be raised to separate layers, as shown below.

12. Paint

After building the layer tree, it’s time for the rendering engine to draw each layer in the tree.

To do this, the rendering engine breaks down a layer’s rendering into smaller instructions, which are then sequentially assembled into a list of instructions to draw, as shown in the image below.

You can open the Layers TAB of the Chrome Developer Tool and select the “Document” layer to actually experience the process of drawing a list. To give you a schematic diagram for reference:The drawing list of document is circled in the figure. Dragging the progress bar on the right can reproduce the drawing process of the list. Isn’t it amazing?

13. Raster

After the draw list is generated, it is rasterized. A draw list is simply a list of draw orders and draw instructions that are actually done by the compositing thread in the rendering engine. You can see the relationship between the render main thread and the composition thread in the following imageAs shown above, when the drawing list of layers is ready, the main thread commits the drawing list to the composition thread. How does the composition thread work next? Here needs to first introduce a H5 mobile page development will be more familiar with the tag:

<meta name="viewport" content="width=device-width, initial-scale=1">
Copy the code

The viewport is what the user can actually see.

Most of the time, the page is much longer than the height of the screen, so the layers are large, but the user can only see part of them through the viewport, so drawing all the layers at once is expensive and unnecessary.

For this reason, the composition thread will divide the layer into tiles, which are usually 256×256 or 512×512 in size, as shown below:

thenThe compositing thread prioritizes bitmap generation based on the blocks near the viewport, and the actual bitmap generation is performed by rasterization. The so-called rasterization refers to the transformation of a map block into a bitmap. The graph block is the smallest unit for rasterization. The renderer maintains a rasterized thread pool, where all rasterization is performed, as shown below:

Generally, GPU is used to accelerate the generation of bitmaps during rasterization. The process of using GPU to generate bitmaps is called fast rasterization, or GPU rasterization. The generated bitmaps are stored in GPU memory.

14. Composite and Display

Once all the tiles have been rasterized, the composition thread generates a command to draw the tiles — “DrawQuad” — and submits the command to the browser process.

The browser process has a component called viz that receives DrawQuad commands from the compositing thread, draws its page contents into memory, and displays them on the screen. This process can be represented as follows:

At this point, through this series of stages, the HTML, CSS, JavaScript, etc., written by the browser will display a beautiful page. (Applause 👏🏻)

Summary of rendering pipeline

The following diagram summarizes the entire rendering and display process after receiving resources from the server, which we call the rendering pipeline.

Combined with the above image, a complete rendering process can be summarized as follows:

  1. The renderer transforms the HTML content into a readable DOM tree structure.
  2. The rendering engine converts the CSS styleSheet into a styleSheet that the browser can understand, calculates and generates the CSSOM tree.
  3. Create a layout tree and calculate the layout information for the elements.
  4. Layer the layout tree and generate the layer tree.
  5. Generate a draw list for each layer and submit it to the composition thread.
  6. The composite thread divides the layer into blocks and converts the blocks into bitmaps in the rasterized thread pool.
  7. The composite thread sends the DrawQuad command to the browser process.
  8. The browser process generates the page from the DrawQuad message and displays it on the monitor.

Refer to the link

  • Figure out the front end “core thread” at once — what happens from entering the URL to presenting the page
  • Strong and negotiated caching
  • From browser multi process to JS single thread, JS running mechanism is the most comprehensive combing
  • TCP soul asks a three-way handshake to establish a connection

Write in the last

This article is not original, 90% of the content is from the front end of the “core thread” — from entering the URL to the page to show what happened, in the middle of the encounter do not know some articles for sorting. However, the whole text is basically completed by myself through two days of word-by-word typing, although a little slow, but the process of typing is also a learning process, people feel that compared with browsing the article, more impression, can find a lot of details.

After writing this article, I was strongly interested in Teacher Li Bing’s “Browser Working Principle and Practice”. Then I wrote this course and planned to update my study notes every day, recording them in the browser study notes and updating them successively.

Like a quote from the author in the linked article:

Study knowledge for what it is, not what it looks like

With your mutual encouragement.