This paper is mainly explained from the following aspects:

  1. Domain name resolution

  2. The initiating

  3. HTML parsing

  4. CSS analytical

  5. layout

  6. draw

Domain name resolution

In a computer network, we can only access specific hosts through IP addresses. We can’t access it directly by domain name. Our front-end static resources, etc., are stored on the server. When entering a domain name, the first thing we do is convert the domain name into an IP address. In the process of transformation, there are the following steps:

  1. First, the browser queries its cache to see if the domain name has been resolved, and returns the resolved address if so.

  2. If the browser does not find the IP address corresponding to the domain name in the cache, the browser checks the cache of the operating system to see if the domain name is resolved.

  3. If not found in the operating system, then we need to use DNS(domain name system) to help us resolve.

  4. If the browser does not match the domain name in its cache or in the cache of the operating system, it looks for the preferred DNS server set in the TCP/IP parameter, which is called the local DNS. If the local DNS server matches the domain name in the cache, it returns the resolution of the domain name. If the resolution fails, the DNS server checks whether the forwarding mode is set based on the Settings of the local DNS server. If the forwarding mode is set, the DNS server searches for the forwarding mode level by level until it is found. If it is not found, and the DNS server is not in forward mode at this time, a query request is made to the root DNS server.

  5. When requests the DNS root server by root server will return the current he had known the top-level domain name server, and then, and then to the top-level domain name server to initiate the request, if a top-level domain name server parsed, belongs to the category of his, then will return to his secondary DNS domain name server, and so on, Until you find it or you don’t.

We talked about DNS, so what is DNS?

The full name of DNS is domain Name System. He works in the application layer, the main role is to help us complete the domain name to IP transformation. His architecture is a distributed cluster, with the root server at the top, followed by the top-level DNS server, followed by the secondary DNS server. The structure is roughly as follows:

The reason DNS is designed as a distributed cluster, rather than a single site, is that if one DNS server fails, the entire Internet will not collapse. If there is only one server, if the server is located in the United States, then the farthest place from the United States, first of all, the DNS client to the DNS server will go through the network for a long time. In addition, if all the domain name information is not placed on the same server, So query speed and storage is a big problem. Hence the need to design a distributed cluster.

DNS queries for domain name resolution are classified into iterative queries and recursive queries.

Recursive query: When the PC DNS client initiates a search to the local DNS server, the local DNS server functions as the DNS client to initiate resolution to the upper-layer DNS server, root DNS server, or top-level DNS server if the search fails. And so on, all I need to do is return the correct domain name from the previous query or not find it.

Iterative query: When computer DNS client to find the local DNS server by, if not, to find the local DNS server will be returned to the DNS client computer my local DNS servers at the next higher level, root name servers or top-level domain name server, then DNS in the query to the DNS server by computer, until found or not found, As is shown in

So generally in the computer, the computer DNS client to the local DNS query process is recursive query, the local DNS to the superior root DNS server initiated search process is called iterative query.

Initiate a request:

When the domain name is resolved, a request is made. We are assuming that this domain name has never been accessed. Then it will go through the following stages:

If the request is made for the first time, the response will contain some strong and weak cache fields, such as:

Strong cache field:

Expires: The Expires value is an HTTP date. When the browser makes a request, it compares the system time to the Expires value. If the system time exceeds the Expires value, the cache Expires. A problem with this field is that if the system time is inconsistent with the server time, pseudo invalidation may occur, or the cache may have expired, but the latest resource is not requested

Cache-control: an attribute added to HTTP/1.1. The value of the attribute has the following values:

  1. Max-age: indicates the time since the field appears in the first response header, in seconds. If time runs out, the request is re-initiated.
  2. No-cache: no strong cache is used and files freshness is checked with the server each time.
  3. No-store: Does not use any cache and requests the latest resources from the server every time.
  4. Private: A cache dedicated to individuals. The intermediate proxy, CDN cannot cache this response.
  5. Public: The response can be cached by the CDN, intermediate proxy.

Pragma: Does not use a strong cache and needs to verify that the cache is fresh.

The priority for strong caching is pragma > cache-Control > Expires.

Weak cache field:

When the server first responds to the request, it may or may not have a strongly cached field. It is also possible to have weakly cached fields:

  1. Last-modified: indicates the time when the file was last modified. The units are seconds. The next time the browser makes a request, it adds the if-moditied-since field to the request header. The server then compares whether the time is the same, tells it to continue using the cache if it is the same, returns the latest resource if not, and updates the last-Modified field in the response header. This is a problem: because in seconds, assumes that the server within 1 s first response to the request, return to a time, then I on a second update server resources, so this time, according to the logic, the browser the next request, the server will allow it to continue to use the cache (time). This causes the file update to fail.

  2. Etag: Introduced to prevent last-modified/if-modified-since file update failures. Etag is a hash string that represents the identifier of a resource. When the server file changes, its hash code changes with it. The next time the browser requests an if-not-match field, the server determines whether it matches the current resource hash string. If so, the browser continues to adapt to the cache. If not, it sends the latest resource to the browser and updates the Etag field in the response header. If the hash string starts with “W/”, the negotiated cache check is weak. Resources are requested only when the file difference on the server (determined by the ETag calculation method) is enough to trigger a change in the hash suffix.

If you are interested in hash, you can go to what hash is.

To sum up with two pictures:

The first time the browser initiates a request:

The browser makes a second request:

Parsing HTML documents

After requesting the resource, the browser needs to parse the HTML to generate a DOM tree, a CSSOM tree. The combination forms the Render tree and then renders.

When a browser parses HTML, it does two main things: lexical analysis and syntax analysis.

Lexical analysis:

The so-called lexical analysis is to parse a large section of string into a minimum meaningful unit according to the rules, and then generate a token object according to the corresponding data of the minimum meaningful unit.

The algorithm adopted in the lexical analysis stage is as follows: tokenization algorithm (read HTML characters from left to right, use state machine internally to assert the current state, match decomposed htmlToken according to the grammar rules, and finally provide this htmlToken to the grammar analysis stage)

The smallest types of meaningful units in HTML are: tag start, tag end, comments, text, attributes, and CDATA nodes.

type describe
<xx The start of the start tag
/> End of start tag
name=’byeL’ attribute
End tag
I am a text Text node
annotation
CDATA node
attr=”xxxx” attribute

Before we talk about specific lexical analysis, we need to understand what the data structure we need to generate after word segmentation looks like. The specific data structure generated is shown in the figure below:

Specifically explained as:

Type m_type;    The value can be DOCTYPE, StartTag, EndTag, Character, or Comment

   Range m_range; // Starts at zero

   int m_baseOffset;        

   // "name" for DOCTYPE, StartTag, and EndTag

   // "characters" for Character

   // "data" for Comment

   DataVector m_data;  / / data

   // For DOCTYPE

   OwnPtr<DoctypeData> m_doctypeData; // Document type

   // For StartTag and EndTag

   bool m_selfClosing;  // Whether it is self-enclosed

   AttributeList m_attributes;   // Attribute list

   // A pointer into m_attributes used during lexing.

Attribute* m_currentAttribute;   // The current property
Copy the code

Let’s take the following as an example to illustrate:

/ * < a href = "w3c.org" > w3c < / a > 1. The initial state is DataState. 2. If the data is read, enter < and enter the TagOpenState state. Read "A", enter the TagNameState state, and initialize an HtmlToken of type StartTag, 4. After reading a space, it enters the BeforeAttributeNameState state and stores the name stored in the previous TagNameState phase into the name of the HtmlToken. 5. Hit "h" and enter AttributeNameState state, 6. Continue reading "r", "e", "f" until "=" is read, going to BeforeAttributeValue state 7. Continue to read, encounter ""," enter AttributeValueDoubleQuotedState state. "W ", "3"......" R ", "g", hold the state, extract the attribute value. 9. After reading "", the AfterAttributeValueState state is entered. 10. When ">" is displayed, the system starts DataState. And so on: three HTMLTokens are generated, respectively: */
{
  	m_type: 'StartTag', m_attributes: [{href: 'w3c.org',}]m_data:'a'.m_selfClosing: false}, {m_type: 'character'.m_data: 'w3c'.m_attributes: [].m_selfClosing: false}, {m_type: 'EndTag', m_attributes: [{href: 'w3c.org',}]m_data:'a'.m_selfClosing: false,},Copy the code

Detailed procedures for parsing are available in WEBKIT with HTML lexical parsing.

In the process of lexical analysis, grammatical analysis also begins simultaneously.

Grammatical analysis:

The function of grammar analysis is to convert htmlToken generated in the lexical analysis stage into a tree structure, which is called DOM tree.

To convert these sorted words into a DOM tree, we need a data structure: the stack.

To start, push the root element to the top of the stack, and when parsing is complete, the root element becomes the final DOM tree.

When a word is parsed, it is pushed to the stack. There are several possible operations:

  1. If it is a start node, it is pushed directly. Do nothing

  2. If the previous one is a text node, and the current one is also a text node, the last text node is merged with the previous one. It is first added to the array of child nodes of the current top element on the stack, and then pushed.

  3. If it is a comment node, it is added directly to its own group at the top of the current stack.

  4. If it is an attribute, it is added directly to the attribute of the current top element on the stack.

  5. When an end node is encountered, the first matching start node is found forward and the stack is unloaded.

  6. If the node is not a text node and the previous node is a text node, you need to push the text node off the stack before pushing the current node.

Here’s an example:

<div>
  <p>
  1234   45678   789
  </p>
</div>
Copy the code

      
,

, 1234,45678,789,

, ; ,>
,>
// After word segmentation, the following data structure is generated: { m_type: 'StartTag', m_attributes:[{ }] m_data:'div'.m_selfClosing: false,}// Every time a word is generated, it needs to be pushed into analysis. The analysis steps are as follows: // The logic is as follows: class HTMLDocument { constructor () { this.isDocument = true this.childNodes = [] } } class Node {} class Element extends Node { constructor (token) { super(token) for (const key in token) { this[key] = token[key] } this.childNodes = [] } [Symbol.toStringTag] () { return `Element<The ${this.name}> `}}class Text extends Node { constructor (value) { super(value) this.value = value || ' '}}function HTMLSyntaticalParser () { const stack = [new HTMLDocument] // Instantiate a stack and push a root element into it. When done, the top of the stack is a complete DOM tree this.receiveInput = function (token) { if (typeof token === 'string') { // If it is a text type, if (getTop(stack) instanceof Text) { // If the top of the stack is a text node getTop(stack).value += token / / merge } else { Otherwise, add it to the child node at the top of the stack let t = new Text(token) getTop(stack).childNodes.push(t) stack.push(t) } } else if (getTop(stack) instanceof Text) { // If it is not a text node and the previous one is a text node, it needs to be removed from the stack stack.pop() } if (token instanceof StartTagToken) { // If it is a start node, place it in the child node of the top element and insert it at the top of the stack let e = new Element(token) getTop(stack).childNodes.push(e) return stack.push(e) } if (token instanceof EndTagToken) { // If it is an end node, then it is not pushed and an element is pushed, which must match it (provided that the document structure is correct). return stack.pop() } } this.getOutput = () = > stack[0]}function getTop (stack) { return stack[stack.length - 1]}Copy the code

After the above steps, the HTML document will be converted into a DOM tree.

Note that:

If a JS script exists, the render thread will be suspended while the JS script is being parsed. See browser threads and processes for details

CSS analytical

CSS is also parsed when the DOM tree is generated, and the two are executed in parallel. CSS styles (including but not limited to inline styles, external styles introduced, and so on) are parsed and marked according to the syntax specification. After parsing, it generates a stylesheet object that contains the parsed CSS rules, which are made up of selectors and declaration objects.

Such as:

.btn-style {
  font-size: 12px;
  background-color: yellow;
}
Copy the code

After the above CSS is parsed, it produces:

Selector name attribute value
.btn-style font-size 12px
.btn-style background-color yellow

Render tree generation

After the CSS rule tree and DOM tree are parsed, the final render tree will be generated from these two trees.

The generation of render tree is to traverse the currently generated DOM tree, and finally generate one or more Render child nodes according to the child node information of the current DOM tree and the corresponding CSS rules.

In WebKit, all reder children inherit from RenderObject, where there are concrete methods for redrawing and rearranging declared virtual methods. As well as DOM nodes, style style information, etc.

class RenderObject{
    virtual void layout();
    virtual void paint(PaintInfo);
    virtual void rect repaintRect();
    Node* node;  //the DOM node
    RenderStyle* style;  // the computed style
    RenderLayer* containgLayer; //the containing z-index layer
}
Copy the code

In a render child, if the style is set to a specific size, for example: width:12px; , the specific width will be used directly in the layout. When the node width is not defined or the width is defined as a percentage, for example: width:50%; Then you need to calculate its size at layout time.

Note that:

Nodes in the Render tree are not equivalent to nodes in the DOM tree, because some nodes with display none will not be added to the Render tree when the render tree is generated. For example, the “Select” element has three render tree child nodes: one for the display area, one for the drop-down list box, and one for the button.

Layout stage:

Traverse the Render tree to determine the size and position of the elements based on the type of render node.

Drawing stage

In the rendering phase, the system iterates over the Render tree and calls the “paint” method on the children of the Render tree to display the contents of the children of the render tree on the screen. Drawing is done using user interface infrastructure components.

The CSS2 specification defines the order in which a process is drawn. The order in which the elements are drawn is the order in which they enter the stack style context. These stacks are drawn from back to front, so this order affects the drawing. The stack order of the block renderer is as follows:

  1. The background color
  2. The background image
  3. A border
  4. Their offspring
  5. outline

Finally, a complete process is completed.