Draw 20 pictures to explain how the browser rendering engine works

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

Today we are going to learn about the working principle of the browser rendering engine.

Let’s take a look at the architecture of Chrome:Usually, we write HTML, CSS, JavaScript and so on, when the browser runs, it will display the page. So how do they turn into pages? What’s the rationale behind this? This is done by the browser’s rendering process. The main task of the browser’s rendering process isConverting static resources into visual interfaces: For the middle browser, it’s a black box, and here’s how the black box translates static resources into a front-end interface. Due to the complexity of the rendering mechanism, the rendering module will be divided into many sub-stages in the execution process, and the input static resources will pass through these sub-stages, and finally the output page. We call a process called the rendering pipeline, which is roughly shown in the figure below:There are five main processes:

DOM tree construction: The rendering engine uses an HTML parser (calling an XML parser) to parse HTML documents and convert each HTML element into a DOM node one by one to generate a DOM tree.
CSSOM tree building: THE CSS parser parses the CSS and converts it into CSS objects, which are assembled to build the CSSOM tree;
Render tree construction: After both the DOM and CSSOM trees are built, the browser builds a render tree from these two trees.
Page layout: After the rendering tree is built, the position of the elements and the styles to be applied are determined, and the browser calculates the size and absolute position of all the elements.
Page rendering: Once the page layout is complete, the browser will convert each page layer into a pixel and decode all the media files based on the resulting processing.

For these five processes, each stage has corresponding products, namely: DOM tree, CSSOM tree, render tree, box model, interface.

Below are the modules corresponding to each step in the rendering engine workflow:

As can be seen from the figure, the rendering engine mainly contains modules as follows:

HTML parser: Parsing HTML documents, the main function is to convert HTML documents into DOM trees;
CSS parser: calculates each element object in the DOM to obtain style information, which is used to construct the rendering tree;
JavaScript interpreter: Use JavaScript to modify the content of a web page, CSS rules, etc. The JavaScript interpreter can interpret JavaScript code and modify web content and style rules through DOM interface and CSSOM interface, thus changing the rendering results.
Page layout: After the DOM is created, the rendering engine combines the element objects in the DOM with the style rules to create a render tree. Layout is to render the tree, calculate the size of each element, location and other layout information.
Page rendering: Use the graphics library to render the rendered tree after layout calculation into visual image results.

Let’s take a look at what each of these processes does.

1. DOM tree construction

Before we talk about building a DOM tree, we need to know why we build a DOM tree in the first place. This is because HTML is not directly understood and used by browsers, so you need to turn HTML into a structure that browsers can understand — a DOM tree.

If you are familiar with data structure, you are familiar with tree structure. Tree is a data structure composed of nodes or vertices and edges without any rings. A non-empty tree consists of a root node and additional nodes, all of which form a multi-level hierarchical structure. Here is a diagram to see what a tree structure is:

For the three structures above, the first two are trees, they each have a single root node and no ring structure. And the third one has a ring, so it’s not a tree.

With tree structure behind us, let’s get back to what a DOM tree is. On a page, each HTML tag is parsed by the browser into a document object. HTML is essentially a nested structure. During parsing, each document object is organized in a tree structure, and all document objects are hung on the Document. This organization is the most basic structure of HTML — document Object Model (DOM).

In the rendering engine, DOM serves three purposes:

From a page perspective, DOM is the basic data structure for generating pages;
From the perspective of JavaScript script, DOM provides an interface for JavaScript script operation. Through this interface, JavaScript can access DOM structure, thus changing the structure, style and content of documents.
From a security perspective, the DOM is a line of security, and unsafe content is rejected during the DOM parsing phase.

Inside the rendering engine, the HTML parser is responsible for converting the HTML byte stream into a DOM structure as follows:

1. Character stream → Word (token)

The HTML structure first splits the byte stream into tokens through a word splitter. Tokens are classified into Tag tokens and text tokens. Here’s how an HTML code can be split:

<body>
    <div>
        <p>hello world</p>
    </div>
</body>
Copy the code

For this code, we can break it into words:StartTag and EndTag are used to create a Tag Tag.<body>,<div>,<p>Is the StartTag,</body>,</div>,</p>This is the EndTag, which corresponds to the blue and red blocks in the figure, and the text Token corresponds to the green block.

The characters are divided into tokens through a state machine. The so-called state machine is to divide the features of each word into independent states one by one, and then combine the features of all words to form a connected graph structure. So why use a state machine? Because every time a character is read, a decision has to be made about the current state.

In fact, the state machine is used for lexical analysis, breaking character streams into tokens.

2. Word (token) →DOM tree

The next step is to parse the Token into a DOM node and add the DOM node to the DOM tree. This process is implemented through the stack structure, which is mainly used to calculate the parent-child relationship between nodes. The tokens generated in the above steps are pushed into the stack in sequence. The rules of this process are as follows:

If the tokenizer resolves to a StartTag Token, the HTML parser creates a DOM node for the Token and adds the node to the DOM tree. Its parent node is the node generated by the neighboring element on the stack.
If the tokenizer is resolved to be a text Token, a text node will be generated and added to the DOM tree. The text Token does not need to be pushed into the stack. Its parent node is the DOM node corresponding to the current top Token.
If the parser returns an EndTag Token, such as an EndTag div, the HTML parser checks to see if the element at the top of the Token stack is a StarTag DIV. If so, it pops the StartTag div from the stack to indicate that the div element has been parsed.

New tokens generated by the tokenizer are thus pushed and pushed, and the whole parsing process continues until the tokenizer has split all the byte streams.

Here’s how the Token stack works, with the following HTML structure:

<html>
    <body>
        <div>hello juejin</div>
        <p>hello world</p>
    </body>
</html>
Copy the code

To start, the HTML parser creates an empty DOM structure with root document, pushes the Token of StartTag Document, and then pushes the first parsed StartTag HTML. Create an HTML DOM node and add it to the document. Then the Token stack and DOM tree are as follows:

The body and div tags are then pushed in the same way as above:

The text Token in the div tag is then parsed, and the rendering engine creates a text node for the Token and adds the Token to the DOM. Its parent node is the node corresponding to the top element of the current Token stack:

Next comes the first EndTag div, at which point the HTML parser determines whether the current top element is a StartTag div, and if so, pops the StartTag div from the top of the stack, as shown below:

After that, the process is similar to the above, and the final result is as follows:

2. CSSOM tree construction

You’ve seen the basic DOM building process above, but the DOM structure only contains nodes and does not contain any style information. Let’s take a look at how browsers apply CSS styles to DOM nodes.

Similarly, browsers cannot understand CSS code directly, so they need a CSSOM tree that their browsers can understand. In fact. While the browser builds the DOM tree, if the style is loaded, the CSSOM tree is built simultaneously. A CSSOM tree is similar to a DOM tree in that it has two main functions:

Provides JavaScript with the ability to manipulate styles
Provides basic style information for rendering tree composition.

However, the CSSOM tree and the DOM tree are separate data structures and do not correspond to each other. The DOM tree describes the hierarchy of HTML tags, and the CSSOM tree describes the hierarchy of selectors. This can be done in the browser console, viadocument.styleSheetsCommand to view the CSSOM tree:So what are the sources of CSS styles?As you can see, there are three main sources of CSS styles:

External CSS style files referenced by link;
<style>CSS styles within the tag;
Element’s style property is embedded with CSS.

Before converting CSS to tree objects, you also need to standardize the property values in the stylesheet, for example, when encountering the following CSS styles:

body { font-size: 2em }
p {color:blue; }div {font-weight: bold}
div p {color:green; }div {color:red; }
Copy the code

As you can see, the CSS has many property values, such as 2em, Blue, Red, bold, and so on. These values are not directly understood by the browser. Therefore, all values need to be translated into standardized computed values that the browser rendering engine can easily understand. This process is called attribute value normalization. After normalization, the code above would look like this:

body { font-size: 32px }
p {color: rgb(0.0.255); }div {font-weight: 700}
div p {color: (0.128.0); }div {color: (255.0.0); }
Copy the code

As you can see, 2em is resolved to 32px, Blue is resolved to RGB (255, 0, 0), and bold is resolved to 700. Now that the style properties have been standardized, it’s time to calculate the style properties for each node in the DOM tree, which involves CSS inheritance and cascading rules.

(1) Style inheritance

There is a style inheritance mechanism in CSS, CSS inheritance means that each DOM node contains the style of its parent node. For example, set “font-size:20px;” in HTML. Then almost all tags in the page can inherit this attribute.

In the CSS, there are mainly the following inheritance properties:

Font family properties

Font-family: “font”
Font-weight: the weight of a font
Font size: the size of a font
Font-style: normal

Text series attributes

Text-indent: indicates the indentation of text
Text-align: align text horizontally
The line – height: line height
Word-spacing: Spacing between words
Letter-spacing: The spacing between Chinese characters or letters
Text-transform: Uppercase, capitalize, capitalize
Color: text color

Element visibility

Visibility: Control elements display hidden

List layout properties

List-style: indicates the list style, including list-style type and list-style image

The cursor attributes

Cursor: Indicates the type of the cursor

(2) cascading styles

The second rule in style calculation is style cascade. Cascading is a fundamental feature of CSS. It is an algorithm that defines how to combine property values from multiple sources. It is at the heart of CSS, which is highlighted by its full name, cascading style sheets. I will say no more here.

In short, the purpose of the style calculation stage is to calculate the specific style of each element in the DOM node. In the calculation process, two rules of CSS inheritance and cascading need to be observed. The final output of this phase is a sample of each DOM node, stored in a ComputedStyle structure.

For the following code:

<html>
	<head>
		<link href="./style.css">
        <style>
            .juejin {
                width: 100px;
                height: 50px;
                background: red;
            }

            .content {
                font-size: 25px;
                line-height: 25px;
                margin: 10px;
            }
        </style>
	</head>
    <body>
        <div class="juejin">
    	    <div>CUGGZ</div>
        </div>
        <p style="color: blue" class="content">
            <span>hello world</span>
            <p style="display: none;">The browser</p>
        </p>
    </body>
</html>
Copy the code

The resulting CSSOM tree is roughly as follows:

3. Render tree construction

After both the DOM tree and the CSSOM tree are rendered, the construction phase of the render tree is entered. A render tree is a combination of a DOM tree and a CSSOM tree, resulting in a data structure that knows what style will be applied to each node. The process of this combination is to traverse the ENTIRE DOM tree and look up the matching style in the CSSOM tree.

The process of building a render tree varies from browser to browser:

In Chrome, the attach() method is used on each node, and the nodes of the CSSOM tree are attached to the DOM tree as the render tree.
A separate new structure is constructed in Firefox to connect the mapping between the DOM tree and the CSSOM tree.

So why build a render tree? As you can see in the example above, the DOM tree may contain some invisible elements, such as the head tag, using display: None; Attribute elements, etc. So before displaying the page, build an additional render tree that contains only visible elements.

Let’s look at the process of building a render tree:As you can see, the invisible nodes in the DOM tree are not included in the render tree. To build the render tree, the browser does roughly the following: Traverses all visible nodes in the DOM tree and adds them to the layout, while invisible nodes are ignored by the layout tree, such as the entire content under the head tag, or the p.p element, which is not included in the render tree because its attribute contains dispaly: None. If the element is set to visibility: hidden; Property, the element will appear in the render tree because elements with this style are placeholders, but do not need to be displayed.

In the search process here, for the sake of efficiency, the search will start from the leaf node of the CSSOM tree, corresponding to the CSS selector, that is, from the far right of the selector to the left. Therefore, it is not recommended to use label selectors and wildcard selectors to define element styles.

In addition, the same DOM node can be matched to multiple CSSOM nodes, and the final effect is determined by the CSS rules, which is a matter of style priority. When a DOM element is controlled by multiple styles, the styles take precedence in the following order: inline Style > ID selector > Class selector > Tag selector > Universal selector > Inherited Style > Browser Default Style

The priorities of common CSS selectors are as follows:

The selector	format	Priority weight
The id selector	#id	100
Class selectors	.classname	10
Property selector	A [ref = “eee”]	10
Pseudo class selector	li:last-child	10
Label selector	div	1
Pseudo element selector	li:after	1
Adjacent sibling selectors	h1+p	0
Child selectors	ul>li	0
Descendant selector	li a	0
Wildcard selector	*	0

For selector priority:

Label selector, pseudo-element selector: 1;
Class selector, pseudo-class selector, attribute selector: 10;
Id selector: 100;
Inline style: 1000;

Note:

! The style declared by important has the highest priority;
If the priority is the same, the last style to appear takes effect;
Inherited styles have the lowest priority;

Fourth, page layout

After the above steps, a render tree is generated, and this tree is the key to displaying the page. So far, you have the structural relationships between all the nodes that need to be rendered and their style information. Now you need to lay out the page.

By calculating the style of each node in the render tree, you can figure out how much space each element occupies and where it sits. Once you have the size and position of all the elements, you can draw the borders of the elements in the page area of the browser. This process is layout. In this process, the browser traverses the render tree and writes the nested relationships between elements into the document stream as a box model:

The box model calculates the exact size and positioning of elements during layout. After the calculation, the corresponding information is written back to the rendering tree, forming the layout rendering tree. At the same time, each element box also carries its own style information, which can be used as a basis for subsequent drawing.

Five, page rendering

1. Build layers

After layout, we have the position and size of each element. Is it time to start drawing the page? The answer is no, because there can be a lot of complex scenes on the page, such as 3D changes, page scrolling, z-sorting using z-index, etc. So, in order to achieve these effects, the rendering engine also needs to generate special layers for specific nodes and a corresponding layer tree.

So what is a layer? Those of you who have used Photoshop are familiar with layers. You can also select the Layers TAB in Chrome’s Developer tools (if not, you can find the Layers TAB in more tools) to see the layering of the page. For example, the homepage of gold digging has the following layering:As you can see, the rendering engine assigns many layers to the page, and these layers are stacked together in a certain order to form the final page. Here, the action of breaking up the page into multiple layers becomesLayered,The final operation to combine these layers into one layer becomesThe synthesis,Layering and composition are usually used together. Chrome introduced layering and compositing mechanisms to improve per-frame rendering efficiency.

In general, not every node in the rendering tree contains a layer, and if a node has no corresponding layer, it will be its parent’s layer. What kind of node can the browser engine create a new layer for? One of the following conditions must be met:

(1) An element that has a cascading context attribute

The page we see is usually a two-dimensional plane, and the cascading context gives the page a three-dimensional concept. The HTML elements are distributed along the Z-axis perpendicular to the two-dimensional plane according to the priority of their attributes. Here are the cascading rules for the box model:

For the figure above, from top to bottom:

Background and border: Creates the background and border of the current cascading context element.
Negative Z-index: The element in the current cascade context whose Z-index attribute value is negative.
Block-level box: non-inline non-positioned descendant elements in the document flow.
Float box: Unpositioned float element.
Inline box: inline level non-positioned descendant elements within the document flow.
Z-index :0: indicates the location element whose cascade level is 0.
Positive Z-index: Positioning element whose z-index attribute value is positive.

Note: When positioning the z-index:auto element, the generation box has a level of 0 in the current context, and no new context is created, except for the root element.

(2) The elements that need to be cropped

What is tailoring? If you have a div box with a fixed width and height, and the text inside it exceeds the height of the box, clipping will occur, and the browser rendering engine will use a portion of the clipped text content to display in the div area. When clipping occurs, the browser’s rendering engine creates a separate layer for the text section, and if the scroll bar appears, the scroll bar is promoted to a separate layer.

2. Draw layers

After building the layer tree, the rendering engine will render each layer in the tree. Let’s see how the rendering engine does this.

When rendering a layer, the rendering engine will divide the drawing of a layer into a number of instructions, and then combine these instructions into a list of instructions to be drawn in order:

As you can see, the instructions in the draw list are a series of draw operations. Typically, drawing an element requires multiple drawing instructions because each element’s background, border, and other attributes require separate instructions to draw. So in the layer drawing phase, the output is the drawing list.

In Chrome’s Developer Tools, use the Layer TAB to see the list of layers and the drawing process:Draw list is only used to record the draw order and draw instructions list, andThe rendering operation is done by the compositing thread in the rendering engine. When the layer drawing list is ready, the main thread submits the drawing list to the composition thread.

Note: The composition is done on the composition thread, so the execution of the composition does not affect the main thread.

In many cases, the layer may be large, such as a long article for nuggets that requires a long scroll to get to the bottom, but the user can only see the viewport, so there is no need to draw the entire layer. Therefore, the composition thread will divide the layer into blocks, which are usually 256×256 or 512×512. The compositing thread will generate bitmaps of blocks near the viewport first. The actual bitmap generation is performed in the rasterization phase, where the image is generated following the instructions in the draw list.

When all the blocks have been rasterized, the composition thread generates a command to draw a block. The browser process receives this command and draws its page contents in memory, which is then displayed on the screen, completing the page drawing.

At this point, the entire rendering process is complete, and the process is summarized as follows:

Build HTML content into a DOM tree;
Build the CSS content into a CSSOM tree;
DOM tree and CSSOM tree are combined to render tree;
Layout of page elements according to render tree;
Hierarchical operation of rendering tree and generation of hierarchical tree;
Generate a draw list for each layer and submit it to the composition thread.
The composition thread divides the layer into different blocks and transforms the blocks into bitmaps through rasterization.
The compositing thread sends instructions to the browser process to draw a block.
The browser process generates the page and displays it on the screen.

Sixth, other

1. Rearrange and redraw

With the rendering flow of the browser engine out of the way, there are two important concepts: Reflow and Repaint.

As we know, the render tree is built dynamically, so changes to DOM nodes and CSS nodes can cause the render tree to be rebuilt. Changes to the render tree result in page rearrangement or redrawing. Let’s take a look at these two concepts, the conditions they trigger, and the actions you can take to reduce them.

(1) Rearrangement

When our operation causes a geometry change in the DOM tree (changing the size, position, layout, etc.), the nodes in the render tree that changed and the nodes it affected are recalculated. This process is called rearrangement, also known as reflux. When changes are made, you have to go through the entire page rendering process again, so it can be expensive.

The following operations will cause the page to be rearranged:

Render the page for the first time
The browser window size changes.
The content of the element changes;
The size or position of an element changes;
The font size of the element changes.
Enable CSS pseudo-classes.
Query certain properties or call certain methods;
Add or remove visible DOM elements.

When reordering is triggered, because the browser renders the page based on a streaming layout, when reordering is triggered, it causes the surrounding DOM elements to be rearranged in two ways:

Global scope: rearrange the entire render tree, starting at the root node;
Local scope: Rearrange a part of the render tree or a render object.

(2) redraw

When changes to the DOM result in a style change but do not affect its geometry (such as changing the color or background color), the browser does not need to recalculate the element’s geometry and simply draw a new style for the element (skipping the rearrangement), a process called redraw. In simple terms, redraw is triggered by a modification of the element’s draw attribute.

When we change the element draw attribute, the page layout phase does not execute because there is no geometry change, so we go straight to the draw phase and then execute the following series of sub-phases. Redraw eliminates layout and layering, so it is more efficient than rearrange.

The following attributes cause backflow:

Color, background related attributes: background-color, background-image;
The outline related attributes: the outline color, outline width, the text – decoration;
Border-radius, visibility, and box-shadow.

Note: when reordering is triggered, redrawing is always triggered, but redrawing is not necessarily reordering.

Relatively speaking, the consumption of the rearrangement operation will be relatively large, so in the operation as little as possible to cause the page rearrangement. To reduce rearrangements, optimization can be done in the following ways:

Whenever possible, use CSS3 animation, which can call the GPU to perform rendering.
When manipulating the DOM, try to operate on low-level DOM nodes
Don’t usetableLayout, a small change can make the wholetableRearrange
Use CSS expressions
Do not manipulate the style of the element too often. For static pages, change the class name, not the style.
Use absolute or fixed to remove elements from the document flow so that their changes do not affect other elements
To avoid frequent DOM manipulation, create a document fragmentdocumentFragmentApply all DOM operations to it, and finally add it to the document
Set the element firstdisplay: none, and then display it after the operation. Because DOM operations on elements with the display attribute none do not cause backflow and redraw.
Multiple reads (or writes) of the DOM are grouped together, rather than the reads and writes interspersed with the writes. This is thanks to the browser’s render queue mechanism.

The browser optimizes the backflow and redraw of pages by putting all backflow and redraw operations in a queue. When the number of operations in the queue reaches a certain amount of time, the browser will batch the queue. This will turn multiple backflow and redraw into a single backflow redraw.

2. The impact of JavaScript on DOM

Finally, let’s look at the impact of JavaScript scripting on the DOM. When the parser parses HTML, if encountered

Take a look at this code:

<html>
    <body>
        <div>hello juejin</div>
        <script>
            document.getElementsByTagName('div') [0].innerText = 'juejin yyds'
        </script>
        <p>hello world</p>
    </body>
</html>
Copy the code

Here, when the div tag is parsed, the script tag is parsed and the DOM structure looks like this:

At this point, the HTML parser will pause and the JavaScript engine will start working and execute the script content in the Script tag. Because this script modifies the content of the first div, the text in the div becomes “juejin YYds” after the script is executed. When the script is finished, the HTML parser resumes parsing and continues parsing until the final DOM is generated.

The JavaScript script we talked about above is embedded directly into the HTML through the script tag. Things get complicated when you introduce JavaScript scripts into your pages. Such as:

<html>
    <body>
        <div>hello juejin</div>
        <script type="text/javascript" src='./index.js'></script>
        <p>hello world</p>
    </body>
</html>
Copy the code

In fact, the execution process is the same as above: when a script tag is encountered, the HTML parser pauses and executes the script file. However, to execute the JavaScript script here, you need to download the script first. DOM parsing is blocked during the script download process, which is usually time consuming and can be affected by network environment, JavaScript script file size, and other factors.

As can be seen from the above analysis, JavaScript threads will block DOM parsing. We can speed up the loading of JavaScript scripts through CDN and compressed scripts. If you don’t have code in your script file to manipulate the DOM, you can set your JavaScript script to load asynchronously by adding async or defer properties to the script tag. Both can be used as follows:

<script async type="text/javascript" src='./index.js'></script>
<script defer type="text/javascript" src='./index.js'></script>
Copy the code

The following figure shows the difference between asynchronous loading and direct loading:Blue represents JavaScript script load time, red represents JavaScript script execution time, and green represents HTML parsing.

Both defer and async properties load external JS script files asynchronously, and neither of them blocks page parsing. The differences are as follows:

Order of execution: multiple tags with async properties, no guarantee of loading order; Multiple tags with the defer attribute, executed in the loading order;
Whether the script is executed in parallel: async property, indicating that the loading and execution of subsequent documents and the loading and execution of JS scripts are carried out in parallel, that is, asynchronous execution; The defer attribute, the loading of the subsequent document and the loading of the JS script (which is now only loaded but not executed) are parallel (asynchronous), the JavaScript script is executed after all elements of the document have been parsed, and before the DOMContentLoaded event is triggered.

For another case, here is the example code:

<html>
    <head>
  	<style src='./style.css'></style>
    </head>
    <body>
        <div>hello juejin</div>
        <script>
            const ele = document.getElementsByTagName('div') [0];
            ele.innerText = 'juejin yyds';    / / DOM operation
            ele.style.color = 'skyblue';      / / CSSOM operation
        </script>
        <p>hello world</p>
    </body>
</html>
Copy the code

In the above code, line 9 manipulates the DOM and line 10 manipulates CSSOM, so you need to parse all the CSS styles on top of the JavaScript statement before executing the JavaScript script. Therefore, if the code references an external CSS file, it needs to wait for the external CSS file to be downloaded and the CSSOM object to be parsed before executing JavaScript. The JavaScript engine doesn’t know if JavaScript is manipulating CSSOM until it parses JavaScript, so the rendering engine will perform a CSS file download whenever it encounters a JavaScript script, regardless of whether the script is manipulating CSSOM. Parse the operation, then execute the JavaScript script.

So JavaScript blocks DOM generation, and style files block JavaScript execution, and we need to be aware of this when developing.

Finally, take a look at the following example code for a case:

<html>
    <head>
        <style src='./style.css'></style>
    </head>
    <body>
        <div>hello juejin</div>
        <script type="text/javascript" src='./index.js'></script>
        <p>hello world</p>
    </body>
</html>
Copy the code

This HTML code contains CSS external references and JavaScript external files. In the pre-parsing process after receiving HTML data, the HTML pre-parser recognizes that there are CSS files and JavaScript files that need to be downloaded, and will initiate the download request of the two files at the same time.

That’s the end of this article, give a “like” if you find it useful!