The profile

  • Overview of browser architecture
    • Process, thread
    • Site isolation
  • Overview of the rendering process
    • Navigation phase
      • UI process 👉 assemble URL
      • The network process 👉 obtains data
        • redirect
        • Data is processed according to the Content-Type
      • Call up the render process
      • Update Tab status –> End of navigation phase
    • Rendering phase
      • Compile processing
      • BNF
      • HTML parser
        • DOM
          • Labeling algorithm
          • DOM tree building algorithm
            • Processing subresources
      • Add CSS attachment to DOM node ==> generate Render Tree
        • CSS parser
        • Creating a renderer
          • Attribute standardization
          • The style is computed and saved to ComputedStyle
      • Layout (Layout)
      • Paint – Generates the order in which elements are drawn
      • Compositing -> Layering, rasterization -> Compositing
        • layered
        • Raster operation

If the profile

In this yearThe first paperIn the article, we have described that our future writing route will follow the front endRoadmap.

Today we will take a brief look at Browser and How they work – how browsers work.

According to authoritative statistics, the proportion of conventional mainstream browsers in the world is shown in the figure below.

Therefore, this article uses Chrome to show the workflow of the browser.

Overview of browser architecture

Process, thread

To begin our introduction to the browser workflow, we need to say a few words: processes, threads.

Process: The executing program of an application. Thread: An executive program that resides in a process and is responsible for some of its functions.

When you start an application, the corresponding process is created. A process may create threads to help it do some of its work, and creating a new thread is optional. When a process is started, the operating system (OS) also allocates memory for the process to store private data. This memory space is independent of other processes.

When the application is shut down, the process is shut down, and the OS frees the memory occupied by the process.

This is actually a GIF, since the Nuggets can’t be embeddedsvgFormat pictures,videoAlso not line. So, if you want to see the flow, you can go throughportalGet to know.

Where there are people, there will be rivers and lakes. If you want to make many people work together to do a good thing, you need a person to coordinate these work, and then tell each other the appeal of each person through the loudspeaker. For computers, the OS system is responsible for the overall work, and the OS transmits messages through the Inter Process Communication (IPC) mechanism.

This is actually a GIF, and since nuggets can’t embed SVG images, neither can video. So, if you want to see the flow you can go through the portal.

Let’s take a look at how the Chrome architecture is organized.

Chrome’s default strategy is to have a Render Process for each TAB. However, if a new page is opened from one page and belongs to the same site as the current page, the new page will reuse the parent page’s rendering process. The default policy is officially called process-per-site-instance at the same site: A.fake.com www.fake.com B.fake.com:789789 These three domain names are the same site.

Site isolation

We introduced browser renderers earlier -> In general, one Tab page corresponds to one renderer. There is a loophole where there is a cross-domain IFrame in the page, and that iframe has access to the renderer’s memory, which violates the same origin policy. So, in the latest Chrome architecture, if there are cross-domain iframes in a page, those cross-domain IFrames will also call up a rendering process of their own.


Overview of the rendering process

Although, this article is explained in chromeWorkflow of the browserBut we still need it for other browsersRendering processMake a summary of the configuration.


Navigation phase

The browser navigation process covers all the intermediate stages from the user initiating the request to submitting the document to the rendering process.

UI process 👉 assemble URL

When a user enters a query keyword in the address bar, the address bar determines whether the entered keyword is the search content or the requested URL.

  • If you are searching for content, the address bar will use the browser defaultSearch engineCome,syntheticNew URL with search keywords.
  • If the input content complies with THE URL rules, the address bar combines the content and protocol into a complete URL according to the rules.


The network process 👉 obtains data

When the final URL is assembled, the UI process notifies the browser process via IPC, and the browser process passes the message to the network process. (The browser process acts as a message relay.) When the network process receives a message from the UI process requiring a network connection. A network connection is then initialized.

This section is mainly about how the browser renders the data, so how the browser establishes a connection to the server, and so on.

First, the network process looks up whether the local cache has cached the resource. If there is a cached resource, it is returned directly to the browser process. If the resource is not found in the cache, the network request flows directly. The first step before the request is to perform a DNS resolution to get the server IP address for the requested domain name. If the request protocol is HTTPS, you also need to establish a TLS connection.

The next step is to establish a TCP connection with the server using the IP address. After the connection is established, the browser side will construct the request line, request information, etc., and attach the data related to the domain name, such as cookies, to the request header, and then send the constructed request information to the server.

After receiving the request information, the server generates response data (including response line, response header, and response body) based on the request information and sends it to the network process. After the network process receives the response line and header, it parses the contents of the header.

redirect

Upon receiving the response header from the server, the network process parses the response header, and if the status code returned is 301 (permanent move) or 302 (temporary redirect), the server needs the browser to redirect to another URL. The network process reads the redirected address from the Location field in the response header, then initiates a new HTTP or HTTPS request and starts all over again.

During the navigation, if the status code in the response line of the server contains information such as 301 or 302, the browser redirects to the new address to continue the navigation. If the response line is 200, then the browser can continue processing the request.

Data is processed according to the Content-Type

Content-type: XX => XX is a MIME Type character format. MIME indicates the nature and format of a document, file, or byte classification. The content-type tells the browser what Type of response body data the server is returning, and the browser then uses the value of the content-type to decide how to display the Content of the response body.

The subsequent processing flow for different Content-Types is also quite different. If the value of the Content-Type field is determined by the browser to be a download Type, the request is submitted to the browser’s download manager and the navigation of the URL request ends. But if it’s Content-Type:text/ HTML, the browser will continue with the navigation process.


Call up the render process

Due to the uncertainty of how long the browser and server interact with the data, there is a lag in evoking the rendering process after the response body comes back.

So when the UI process sends the assembled URL to the network process, it already knows the following navigation information. At this point, the UI process attempts to randomly invoke a render process. In order to prepare the response body data to meet the rendering requirements, direct rendering operation.

The UI process sends urls to the network process and the UI process attempts to invoke the renderer process simultaneously.

Update Tab status –> End of navigation phase

After the response data and renderer are ready, the network process passes a message to the renderer process via IPC to submit the response body data.

  • The renderer process and the network process establish the data channel. Data from the network process will flow to the renderer process.
  • When all the data is transferred, the renderer sends a message to the UI process confirming the commit.
  • The UI process updates the browser interface state, including the security state, the URL of the address bar, the forward and backward historical state, and the Web page upon receiving the submission confirmation message.

At this point, the navigation phase officially ends, and the page rendering phase begins.


Rendering phase

Let’s focus on how Chrome assembles HTML and CSS into browser-aware information.

You’ve all seen this diagram, which shows the main rendering process of the WebKit kernel. I bring this out because Chrome/Safari/Edge’s rendering engine is based on Webkit. So, by understanding Webkit’s rendering process, we can understand how most browsers on the market run.

The diagram shown above, made in 2011, is now 1202 years old, and some of the processes and details have been changed and filled in. However, the general process and treatment are consistent.


Through the navigation phase, we get the data information from the server for the browser display – HTML/CSS. However, HTML and CSS are text information that cannot be recognized and used by browsers. Therefore, a mechanism is needed to convert HTML and other text information into a format that can be recognized by browsers. This transformation process is called parsing.

HTML and (CSS/JS) parse differently.

The compiler

Most compilers are divided into three steps: Parsing, Transformation, and Code Generation.

  • Parsing transforms raw code into an AST(abstract syntax tree) through Parsing and Parsing.
  • Transformation receives the AST that Parsing generated, and transforms the code according to the rules that are built into the Compiler.
  • Code Generation accepts the Code converted by compiler and converts the Code into the desired output format according to certain rules

Through the above three steps, most programs are compiled into object code. If you want to learn more about how the editor works, you can refer to a simple analysis of the compiler I originally wrote. I won’t be too verbose here.


BNF

Conventional context-free languages can be described in the BNF format.

BNF: a formal notation for describing the grammar of a given language. A formal representation of a grammar used to describe a formal system of grammar, typically a meta-language. Not only does it strictly represent grammatical rules, but the grammar it describes is context-free. It has the characteristics of simple syntax, explicit representation, easy to analyze and compile. The BNF is used to express grammar rules in Angle brackets. ② : The left part of each rule is a non-terminal, and the right part is a string composed of non-terminal and terminal characters, usually separated by ::=. (3) : rules can be Shared with the same left a left and the right between separated with straight shaft “|”.

For example, when we parse the expression 2 + 3-1,

Lexical rules, we can use:

INTEGER: 0|[1-9][0-9]*
PLUS: +
MINUS: -
Copy the code

Grammar rules:

expression ::=  term  operation  term
operation ::=  PLUS | MINUS
term ::= INTEGER | expression
Copy the code

Generated AST structure

Personalization of the AST results in a machine language that can be recognized by the specified machine and engine.

When talking about what BNF is, we mentioned the concept of being context-free. According to Wikipedia,



We can describe it very briefly, in general language, if it’s CFG, it can be described asAnd theIs a non-terminal symbol or identifier,Tokens are commonly known as termination symbols or indivisible elements.



And a language,You do not need to consider the context of an lvalueWhen, you can use BNF to represent.


HTML parser

Since HTML needs to be parsed in order for it to be recognized by the browser, because of the nature of the HTML language or its unique parsing process, HTML cannot be transformed using a compiler that is not available in a normal context.

The reasons are as follows:

  • The forgiving nature of language
  • Browsers have traditionally been on some commoninvalidHTML usage is inclusive
  • The parsing process needs to be repeated. The source content usually doesn’t change during parsing, but in HTML, script tags that contain Document.write add additional tags, so that parsing actually changes the input

DOM

The ultimate goal of HTML parsers is to convert HTML into a browser-aware data structure, the DOM.

DOM (Document Object Model) abbreviation, namely Document Object Model. It’s an application programming interface (API) for XML and extended for HTML. So DOM is essentially an API, an API for manipulating web content. The DOM is a standard that maps the entire page into a multi-tier node structure, and each component of an HTML or XML page is a node of some kind. With the API provided by the DOM, developers can remove, add, replace, or modify any node

Since normal parsing techniques are not available, browsers create custom parsers to parse HTML.

The algorithm consists of two phases: tokenization and tree construction.


Labeling algorithm

Tokenization is a lexical analysis process that translates inputs into tokens. HTML tags include start tags, end tags, attribute names, and attribute values.

One point to note here: in HTML2-HTML4, you declared a reference to A DTD, because HTML 4.01 is based on SGML. DTDS define the rules of markup languages so that browsers can render content correctly.
HTML 5 is not based on SGML, so there’s no need to quote a DTD.


DOM tree building algorithm

The tag generator recognizes the tag, passes it to the tree constructor, and then accepts the next character to recognize the next tag; Repeat until the end of the input.

When the parser is created, the Document object is also created. During the tree construction phase, the DOM tree with the Document as the root node is also constantly modified to add various elements to it. Each node sent by the tag generator is processed by the tree builder. The specification defines that the DOM element corresponding to each tag is created when the corresponding tag is received.

Processing subresources

Non-text resources such as images, CSS and JS are always embedded in websites, and these non-text information needs to be retrieved from the server or cache again. When such HTML tags are encountered during DOM tree building, the main thread will in turn request the corresponding data information. To speed up the build, the preload scanner runs in sync with the build DOM tree. During toggling, if a tag like < IMG > or is encountered, the preloaded scanner notifies the network process to make an asynchronous request for the corresponding tag data information. (The main thread is in sync with the network request.)

Everything is fine, but if the non-text tag is

JS blocks the building of the DOM tree -> because JS code may be mixed with operations like Document.write () that are devastating to the already built DOM tree (discard the original tree altogether).

Since

<script async src="A.js"></script>
Copy the code

With async, the loading and rendering of subsequent document elements will take place in parallel with the loading and execution of A.js (asynchronously).

<script defer src="B.js"></script>
Copy the code

With defer, the loading of subsequent document elements will take place in parallel (asynchronously) with the loading of B.js, but the execution of B.js will be completed after all elements have been parsed and before the DOMContentLoaded event is triggered.

From a practical point of view, first of allAll the scriptsAll into the</body>It was a best practice, because forOlder browsersIt’s beenThe onlyThis ensures that all non-scripted elements can be loaded and parsed as quickly as possible. Next, let’s look at a picture:The blue line represents the network read, the red line represents the execution time, both of themIn view of the script; The green line represents HTML parsing.

The figure tells us a few key points: Defer and Async are the same on the network read (download) side. Both are asynchronous (as opposed to HTML parsing). The difference between them is when the script has been downloaded and when it executes

Note: Defer executed the scripts in the order they were loaded, async out of order.

At the same time, we can preload resources with . For details about how to use it, please refer to link preload for content preloading. Or interested in performance optimization, you can refer to foreign language Fast Load Times (later planned to do sorting and translation, please look forward to).

After some manipulation, the HTM parser finally converts the HTML into a DOM structure that the browser can recognize.

Through the baidu home page rendering process to take a look.

At the bottom is the call stack shown with a brown line and small circles, representing the DOM building process. As you can see from the figure, the DOM is generated in the HTM parsing phase.


Add CSS attachment to DOM node ==> generate Render Tree

The process of parsing styles and creating renderers is called “attaching.” Each DOM node has an “Attach” method. Attaching is done synchronously, and inserting a node into the DOM tree requires calling a new node “Attach” method.

Processing the HTML and body tags builds the render root node. This root node rendering object corresponds to what the CSS specification calls a container block, which is the topmost block that contains all the other blocks. Its size is the viewport, the size of the display area of the browser window. WebKit calls it RenderView. This is the render object to which the document points. The rest of the rendering tree is built in the form of DOM tree node inserts.

CSS parser

Since both HTML parsing and CSS parsing are in the renderer process, and there is only one main thread in the renderer process, this means that the main thread can only do one thing at a time. –> Single thread feature.

Then the DOM build is complete and will be placed in the front<script>CSS parsing steps will be started later.

Because CSS is a context-free language, you can parse CSS using a regular compiler. CSS in the W3C defines the related lexicon and syntax.

The parser parses the CSS file into StyleSheet objects, and each object contains CSS rules. CSS rule objects contain selectors and declaration objects, as well as other objects that correspond to CSS syntax.


Creating a renderer

This is a tree of visual elements in their display order and a visual representation of the document. Its job is to draw the content in the correct order.

class RenderObject{
  virtual void layout(a);
  virtual void paint(PaintInfo);
  virtual void rect repaintRect(a);
  Node* node;  //the DOM node
  RenderStyle* style;  // the computed style
  RenderLayer* containgLayer; //the containing z-index layer
}
Copy the code

Each renderer represents a rectangular area, usually corresponding to a CSS box for the node in question, containing geometric information such as width, height, and position.

The type of box is affected by the “display” style property associated with the node. For example, the rectangle of the display:block element is an exclusive line by default, while the display:inline element is wrapped. (In fact, for CSS box model is a very big topic, this can refer to zhang Xinxu’s explanation. At the same time, I will also have a certain documentation, in the recent summary and sorting, please look forward to!

Attribute standardization

Now we have resolved the CSS node to RenderObject, but when we write the CSS, we will write conditions such as font-size:2em, and these em are relative values, not fixed values, so we need to change the values such as 2em, blue, bold, These values that are not easily understood by the rendering engine are converted into standardized calculated values that are easily understood by the rendering engine. This process is called attribute value normalization.

If the following style information exists

body { font-size: 2em }
p {color:blue; }span  {display: none}
div {font-weight: bold}
div  p {color:green; }div {color:red; }
Copy the code

The style is computed and saved to ComputedStyle

After the styling is standardized, the rendering engine can recognize the real data information carried in each RenderObject, but DOM nodes and renderObjects can have one-to-many relationships. Therefore, we need to merge this information so that the final style information can be applied to the DOM nodes.

One of the reasons a DOM node is affected by multiple styles is that CSS files come from different sources

CSS stylessourceThere are three main types:

(1) :linkRefer to theexternalThe CSS file

(2) :<style>CSS inside the tag

③ : The style attribute of the elementembeddedThe CSS



The declaration of a style property may appear in more than one style sheet, or it may appear multiple times in the same style sheet. This means that the order in which rules are applied is extremely important. This is called a “cascading” order. According to CSS3 specification, the sequence of cascading is (The priority is sorted in descending order from highest to lowest) :And we can simplify it a little bit

  1. The range oftransition–> Has the highest priority
  2. User Agent stylesheet is present in the stylesheet! important
  3. User important statement -> user, is written directly in the browser with! importantThe properties of the
  4. Important Statement by author –><link>/<style>/styleProperty with! importantThe properties of the
  5. The animation properties
  6. Author General Statement –><link>/<style>/styleattribute
  7. User general declaration –> Custom styles for user Settings
  8. Browser declaration –>user agent stylesheetBrowser Default properties

For related connections, see www.w3.org and CSS-CasCADE-4.

Also, if different styles are applied to the same DOM node, there needs to be a rule for weight calculation.

A picture is worth a thousand words.

Through weight calculation and other operations, we can finally determine the final style information carried by the specified DOM node, and this information will be savedComputedStyleIn the structure.If in actual development, you’ve usedstyle = window.getComputedStyle(element); This method returns all the attributes of the specified Element. And the data that this method returns, in fact, is a series of calculations ComputedStyleStructure.

And finally, we have a tree that goes through the DOMadditionalStyle-informationRender Tree.


Layout (Layout)

Through HTML parsing and CSS parsing, YOU have fused HTML and CSS information together and know the individual appearance and style information of each node. But having style information alone is not enough to arrange nodes where they really need to be rendered on the page. You also need position and size information for the element.

Renderers do not contain location and size information when they are created and added to the rendering tree. The process of calculating these values is called layout or rearrangement.

HTML uses a stream-based layout model, where elements at the back of the stream usually do not affect the geometry of elements at the front, so the layout can traverse the document from left to right and top to bottom.

The coordinate system is established with respect to the root frame, using upper and left coordinates.

Layout is a recursive process. It starts with the root renderer (the element corresponding to the HTML document) and recursively traverses some or all of the framework hierarchies, computing geometric information for each renderer that needs to be computed.

To the left of the root renderer position is 0,0, and its size is viewport (that is, the visible area of the browser window). All renderers have a “Layout” or “reflow” method, and each renderer calls the Layout method of its offspring that needs to be laid out.

Layout is a process of finding the geometry of elements. The main thread iterates through the DOM and evaluates styles and creates a layout tree that contains information such as x and Y coordinates and border sizes. Renderers correspond to DOM elements, but not one-to-one. Non-visual DOM elements are not inserted into the layout tree, such as the “head” element. Elements whose display attribute value is “None” will not be displayed in the render tree (elements whose visibility attribute value is “hidden” will still be displayed).

There are rendering objects that correspond to DOM nodes but are located differently in the tree. Such is the case with floating and absolute positioning elements, which are outside the normal flow, placed elsewhere in the tree and mapped to the real frame, with the placeholder frame in place.

The main thread traverses the Render Tree and generates a Layout Tree.


Paint – Generates the order in which elements are drawn

We already know the size and position of all the elements through layout processing. However, it is still not possible to do a step-by-step rendering of the page. Although HTML uses a flow-based (left to right, top to bottom) layout model, it can be styled out of the default flow direction and rendering order.

For example, we can go throughz-indexWill do some things in the z-axis direction, here involves a new concept – cascading context (this thing is also a very big topic, if interested in understanding, or can refer to zhang Xinxu big guy wroteIn-depth understanding of cascading context and cascading order in CSS )

Directly above, specific implementation and explanation, will not discuss first.

So the renderer traverses from the root of the layout tree, validates the final render order for each dimension, and generates a paint Record of the elements.

Drawing an element usually requires several drawing instructions, because each element’s background, foreground, and borders require separate instructions to draw. So in the layer drawing phase, the output is these lists to draw.


Compositing -> Layering, rasterization -> Compositing

Page composition is a technique in which parts of a page are separated into layers, rasterized separately, and composed into pages in separate threads called composite threads. If scrolling happens, because the layer is rasterized, all it has to do is compose a new frame. Animation can also be done by moving layers and compositing new frames.

layered

Now that we know the order in which elements are drawn, it would be a lot of work to start rendering from the root node, so the rendering engine generates layers for specific nodes and a corresponding LayerTree –> divide and conquer

The browser page is actually divided into many layers, and these layers are superimposed to create the final page layering based on the layering context of the page into different layers.

Once the layer tree is split, the main thread traverses the layer and generates a series of render records -> that instruct the rendering engine to render the layer in the order in which it should be rendered.

Raster operation

Rasterization: Convert the list to be drawn into pixels on the screen

A draw list is simply a list of draw orders and draw instructions that are actually done by the compositing thread in the rendering engine.

Usually a page may be large, but the user can only see part of it. We call the part that the user can see a viewport.

In some cases, some layer can be very big, such as some pages you use the scroll bar to scroll to scroll to the bottom for a long time, but through the viewport, users can only see a small portion of the page, so in this case, to draw out all layer content, will generate too much overhead, but also it is not necessary.

For this reason, the composition thread will divide the layer into tiles, which are usually 256×256 or 512×512

The compositing thread prioritizes bitmap generation based on the blocks near the viewport, and the actual bitmap generation is performed by rasterization. Rasterization refers to the transformation of a map block into a bitmap. The graph block is the smallest unit for rasterization. The renderer process maintains a rasterized thread pool, where all rasterization of blocks is performed

After processing a frame of data, the compositing thread returns the processed data to the browser process via IPC for display. It does not occupy the main process of the renderer.

Repeat until the rasterized thread pool in the composite thread has been consumed and the page is rendered.


Note: This article is a hodgepodge of references. If you are interested in reading the original text, you can refer to it directly.

  • How Browsers Work: Behind the scenes of modern web browsers
  • Inside look at modern web browser
  • w3c
  • Geek time class