preface

Modern browsers are complex enough to be an “operating system” running on top of an operating system, and we’ll try to summarize their main working logic in as simple and understandable an example as possible.

Directory:

  1. Process and thread overview;
  2. Browser architecture;
  3. Input from the browser perspective;
  4. How pages are rendered;
  5. How to interact;

Part 1. Overview of processes and threads

The core of the computer is the CPU, which undertakes almost all the computing tasks.

You can think of the CPU as a factory, running all the time.

Suppose the plant has limited power and can only be used by one workshop at a time. This means that while one workshop is being used, the other workshops will not be used.

Processes, like workshops, are tasks to be performed in a factory. The implication is that a single CPU can only run one task at a time.

A workshop can have many workers working together to accomplish the same task.

A thread is a worker in a workshop.

Assume that the workers are power-hungry robots that rely on the power given to them by the factory to perform their tasks, each time with just enough power to complete the task, and the factory can only power one robot at a time.

This is almost how a single-core CPU works: only one job can be done at a time.

But you still feel that many different tasks are running “simultaneously”, because when you switch tasks fast enough, you lose the sense that the CPU can only do one job at a time:

Our CPU is racing like this.

Every time we open an application, we start a process. The program also creates one or more threads to help it do its job.

The operating system provides a “block” of memory for processes to use, like a factory floor, in which all application state information is stored. When the program closes, the process disappears and the operating system frees memory.

Processes can request the operating system to start another process to perform different tasks. At this point different areas of memory are allocated to the new process.

If two processes need to talk, they can do so through interprocess communication (IPC).

Many applications are designed so that if a worker process becomes unresponsive, that process can be restarted by another process without stopping the application.

Part 2. Browser architecture

So how do you build a Web browser with processes and threads?

While there are no clear standards for how web browsers should be built, having a navigation bar, input field, TAB page and the like is now a tacit choice among browsers.

Browser architectures generally fall into two categories:

It’s hard to see a single-process architecture anymore, because single-process browsers have so many things to do (networking, rendering, managing plug-ins, etc.) that they are extremely unstable and insecure. As a result, the major browsers in the market have been upgraded to multi-process.

Chrome, for example, uses the following architecture:

  • At the top level is the browser process, which coordinates tasks for other process modules.
  • UI process is responsible for controlling address bar, TAB page, etc.
  • The renderer process controls the presentation of web sites within tabs.
  • The plug-in process controls any plug-ins used by the site, such as Flash.
  • The GPU process separately processes drawing requests from different applications.
  • .

The benefits of multiple processes are obvious. For example, when you have three tabs open and one crashes, you can close it without affecting the other two tabs:

And because the process data is private, there is a degree of security.

But the drawbacks are also obvious. We used the workshop analogy to process above, and the worker analogy to thread above. It is clear that “building a workshop” consumes much more resources than “hiring a worker” — even if there is only one worker in the workshop — which is obviously memory consumption.

To avoid excessive memory consumption, Chrome has aggregated some services:

This can reduce memory overhead to some extent.

Part 3. Input from the browser perspective

When you type a URL into the browser, what does the browser do?

Step 1: Process the input

We’re used to a link opening to an external website, but it can also be the browser’s own Settings page (e.g. Chrome :// Settings /), or the address of the local hard drive (e.g. Mac \) :

So our first step is to determine what the input really is:

Step 2: Start navigation

As the user finishes typing and presses Enter, the UI thread knows to enable the network to retrieve the site’s information. The network thread is responsible for contacting the target host and retrieving information:

A lot of things happen in the process of a network thread retrieving information, such as DNS domain name resolution, TLS connection establishment, etc. If you are not familiar with this series of articles, you can check it out.

Step 3: Read the response

Anyway, the web thread gets us a response from the site that looks something like this:

The response consists of header and payload. Headers are similar to information about the copyright and author of a book, and payload is the actual data content.

The browser needs to distinguish the corresponding Content Type based on the content-Type in the response header. For example, in text/ HTML, the browser will parse the Content in HTML, and in image/ PNG, the image renderer will be invoked.

However, it is not possible to trust the content-type of the site response completely, because unknown errors can occur if the content-type is not specified or has an incorrect value.

So when the payload is received, the network thread checks the first few bytes of the data, if necessary, to ensure that the Content matches the content-Type identified in the header. If not, then MIME type sniffing is required to guess the type of the data.

When the response is an HTML file, the SafeBrowsing check is also performed at this point. If the domain name and data appear to match a known malicious site, the web thread displays a warning page.

In addition, a Cross Origin Read Blocking (CORB) check occurs to ensure that sensitive cross-domain data is not passed to the renderer process.

Step 4: Find the renderer process

Once all the checks have been performed and the network thread is sure that the browser will navigate to the requested site, the network thread tells the UI thread that all the data is ready. The UI thread looks for the renderer process to start rendering the Web page.

Because network requests can take hundreds of milliseconds to get a response back, an optimization measure can be applied.

In step 2, when the UI thread is sending a URL request to the network thread, it already knows which site they will navigate to. In parallel with the network request, the UI thread tries to actively find or start a rendering process.

This way, if everything goes as expected, the rendering process is already on standby when the network thread receives the data.

Step 5: Submit the navigation

Now that the data and renderer are ready, the browser process sends an IPC (inter-process communication) to the renderer process to submit the navigation.

The address bar is updated, the TAB’s history is updated, and the forward/back buttons go to the site you just navigated. The renderer process starts parsing and rendering the page based on the HTML content. Finally, you will see the web site designed by the web designer.

Part 4. How do pages render

The rendering process involves many aspects of Web performance, and the process is so complex that we only need to understand it. If you want to dig deeper, you can find resources at web.dev.

The renderer process contains the main thread, worker thread, composite thread, and raster thread.

Before going into details, imagine this scenario: You are standing in front of a simple painting. How can you let your friend know what the painting looks like by calling him or her?

If you really want to do this, look at the HTML parsing process to give you some advice.

First of all, the elements in the graph and the attributes of specific elements are described separately (for example, a circle in the graph is an element, how big the circle is and where it is located are attributes) :

The advantage of this is that it makes it easier to read which elements are present and which attributes of the elements are obvious, as well as easier to maintain and modify separately. (Similar to the contents of books and corresponding contents)

Alternatively, you can refine some general attributes to reduce the description:

Then, it is best to describe in layers, because the picture has layers, and it is not enough to have the size and position of the elements:

An element is actually what we call an HTML file, which contains CSS style files that describe the attributes of the element. Each browser has a default style for common styles.

The browser actually needs to know what elements to draw and how each element’s attributes are divided into three steps: 1) Draw the element tree through HTML (commonly known as DOM tree); 2) Draw style trees (commonly known as CSSOM trees) through CSS files; 3) Synthesize two trees to draw the Render Tree (commonly known as the Render Tree);

Now that the browser knows the structure of the document, the style of each element, the geometry of the page, and the order in which it draws the page, how does it draw the page? Converting this information into pixels on the screen is called rasterization.

A simple way to handle this situation is to start with the screen in the raster window, and if the user scrolls the page, move the raster frame and raster the missing parts. This is how Chrome handled rasterization when it was first released.

However, modern browsers run a more complex process called compositing.

Composition is a technique for layering parts of a page, rasterizing them separately, and composing them into pages in separate threads called composition threads. If scrolling happens, all it has to do is compose a new frame, since the layer is rasterized. Animations can be done in the same way (moving layers and compositing new frames).

The other thing to note is that there’s a lot of skill in how you describe it. For example, “there is a circle of radius 2 in the center” is quite different from “there is a circle of diameter 50% of the page width in the center” :

How to organize the description requires the experience of the website builder.

Part 5. How to interact

In the browser’s eyes, everything the user does is input. It’s not just about scrolling the mouse wheel, or clicking on the screen, pressing buttons, etc.

There are only events and coordinates for the browser process, and only the renderer knows what the page looks like and how to handle events. The browser process is only responsible for sending events and coordinates to the renderer process.

We can also write our own logical files (JS files) to listen for an event for corresponding processing. It is then synthesized by the render process. For smooth browsing, the browser needs to keep the rendering process running at the screen refresh rate (about 60 frames per second).

Chrome also merges sequential events to reduce the number of calls passing through the main thread.

The browser process listens and sends events to the renderer process for rendering, which is basically how browsers interact.

Afterword.

The complexity of browsers is more than one article can explain, and this article is intended to give you an understanding of the basic process and principles of browsers. Try to use giFs as clearly as possible. I hope everyone has a good time.

This article draws a lot from the series of articles shared by Chrome official Developer (part 2). If you want to know more about it, you can also read the article explaining how the core browser works (Part 4).

At this point, we know quite a bit about browsers. In the future, I will continue to learn the basic knowledge of computer network with you and try to follow the steps of the back-end learning roadmap.

Here is I don’t have three hearts, welcome to pay attention to the public account wmyskxz, 2021, and grow together with you on the road of Be Better!

The resources

  1. Process and thread simple explanation – www.ruanyifeng.com/blog/2013/0…
  2. Xyz /blog/2020-0…
  3. A simple browser rendering principle – blog.fundebug.com/2019/01/03/…
  4. The browser works behind the revelation – www.html5rocks.com/zh/tutorial…

(after)