I. Process preview

  1. Enter the address

  2. The browser looks up the IP address of the domain name

  3. The browser sends an HTTP request to the Web server

  4. The server’s permanent redirection response

  5. The server processes the request

  6. The server returns an HTTP response

  7. Browser displays HTML

  8. Browsers send requests for resources embedded in HTML (such as images, audio, video, CSS, JS, and so on)

Second, detailed explanation of the process

(1) Enter the URL

When we start typing urls into the browser, the browser is already intelligently matching possible URLS. It will look at history, bookmarks, etc., to find possible URLS for the entered string, and then give you an intelligent prompt to complete the URL. Google’s Chrome browser even displays the page directly from the cache, meaning that the page appears before you press Enter.

** (2), **The browser looks up the IP address of the domain name

(3) The browser sends an HTTP request to the Web server

The three-way handshake establishes a TCP connection

Why the three-way handshake is needed: In case an invalid connection request segment is suddenly sent to the server because of an error.

– extension –

Four times to wave

Why do you wave four times?

This is because in LISTEN state, the server receives a SYN packet for establishing a connection and sends the ACK and SYN packets to the client. And close the connection, when I received the other side of the FIN message just said to each other can no longer send data but also receives the data, the board may not all data are sent to each other, so their can immediately close, also can send some data to each other, then send the FIN message now agreed to close the connection to the other side, therefore, Your ACK and FIN are usually sent separately.

(4) the server’s permanent redirection response

The server sends the browser A 301 permanent redirect response, which means that the resource at address A has been permanently removed. The resource is no longer accessible. The search engine will also swap the old url for the redirected url while fetching new content.

302 indicates that the resource at the old address A is still there. This redirection is only A temporary jump from the old address A to address B. The search engine will grab the new content and save the old url.

Redirection reasons:

  • Web site adjustments (e.g. directory structure changes)
  • The page is moved to a new address;
  • Page extension changed

(v) The server processes the request

After receiving A TCP packet on a fixed port, the backend processes the TCP connection, parses the HTTP protocol, and encapsulates the PACKET into an HTTP Request object for upper-layer use.

Some of the larger sites will send your requests to a reverse proxy server, because when the site is very heavily visited and the site is getting slower and slower, one server is no longer enough. The same application is deployed on multiple servers and requests from a large number of users are distributed across multiple machines.

In this case, the client does not directly access a web application server through HTTP. Instead, Nginx first requests the application server, and Nginx then requests the application server, and then returns the result to the client. Here, Nginx acts as a reverse proxy server. As a bonus, if one of the servers is down, users will not be affected as long as other servers are up and running.

Through Nginx’s reverse proxy, we get to the Web server, where the server script processes our request, accesses our database, fetches what we need to fetch, etc. Of course, this process involves a lot of complex operations of the back-end script, etc.

(6) The server returns an HTTP response

1. Composition: THE HTTP response is similar to the HTTP request. The HTTP response also consists of three parts: the status line, the response header, and the response body.

(1) Status line: it is described by protocol version, status code in digital form and corresponding status, separated by space between each element.

1) Protocol version: http1.0 or other version

2) State description: The state description provides a short text description of the status code. For example, a status code of 200 is described as OK

3) Status code: The status code consists of three digits. The first digit defines the category of the response and has five possible values, as follows:

1xx: information status code: indicates that the server has received the client request and the client can continue to send the request.

  • 100 Continue
  • 101 Switching Protocols

2xx: success status code, which indicates that the server has successfully received and processed the request.

  • 200 OK
  • 204 No Content Is successful, but does not return the body of any entity
  • 206 Partial Content A Partial Content request was successfully executed

3xx: redirection status code, indicating that the server requires client redirection.

  • 301 Moved Permanently redirected Permanently. The Location header of the response packet should contain the new URL of the resource
  • 302 Found Temporary redirection. The URL in the Location header of the response packet is used to locate the resource temporarily
  • 303 See Other The requested resource has another URI. The client should use the GET method to obtain the requested resource
  • 304 Not Modified The server content is Not updated and can be read directly from the browser cache
  • 307 Temporary Redirect Temporary redirection. Same meaning as 302 Found. 302 Do not allow POST to be changed to GET, but this is not always the case. 307 More browsers may follow this standard, but it depends on the browser implementation

4xx: indicates the client error status code, indicating that the client request contains invalid content.

  • 400 Bad Request Indicates that the client Request has syntax errors and cannot be understood by the server
  • 401 Unauthonzed indicates that the request is unauthorized. This status code must be used with the WWW-Authenticate header field
  • 403 Forbidden Indicates that the server receives a request but refuses to provide the service. The reason for not providing the service is usually given in the response body
  • 404 Not Found The requested resource does Not exist, for example, an incorrect URL was entered

5xx: indicates the server error status code. It indicates that an unexpected error occurs because the server fails to properly process requests from clients.

  • 500 Internel Server Error Indicates that an unexpected Error occurs on the Server. As a result, the client request cannot be completed
  • 503 Service Unavailable Indicates that the server cannot process requests from clients. The server may recover after a period of time

(2) Response header ** : **

(3) Response text

Contains the specific information we need, such as cookies, HTML, image, request data returned from the back end, and so on.

(7) The browser displays HTML

1. Main functions of the browser

Make a request to the server to display the network resource of your choice in a browser window.

2. Main components of the browser:

_ (1), user interface _ ~ Includes address bar, forward/back button, bookmark menu, etc. All parts of the display belong to the user interface, except for the page you requested displayed in the browser’s main window.

_ (2), the browser engine _ ~ transmits instructions between the user interface and the rendering engine.

_ (3), rendering engine _ ~ is responsible for displaying the requested content. If the requested content is HTML, it is responsible for parsing the HTML and CSS content and displaying the parsed content on the screen.

_ (4), network _ ~ Used for network calls, such as HTTP requests. Its interfaces are platform independent and provide an underlying implementation for all platforms.

_ (5), the user interface back end _ ~ is used to draw basic widgets, such as combo boxes and Windows. It exposes a common interface that is platform-independent, while underneath it uses the operating system’s user interface approach.

_ (6), JavaScript interpreter _ ~ is used to parse and execute JavaScript code.

_ (7), data store _ ~ This is the persistence layer. Browsers need to keep all kinds of data, such as cookies, on their hard drives. The new HTML specification (HTML5) defines a “web database,” which is a complete (but lightweight) in-browser database.

3. Present the basic flow of the engine

The rendering engine will initially fetch the content of the requested document from the network layer, which is generally limited to 8000 blocks. Then proceed to the basic process as follows: parse HTML build DOM tree -> build Render Tree -> Render Tree layout -> draw render tree

Mainstream examples:

Webkit main process:

The main flow of Mozilla’s Gecko rendering engine:

Detailed process overview:

1. Parsing and DOM tree building

(1) Analysis – Overview

The analytical process is divided into lexical analysis and grammatical analysis.

Lexical analysis is the process of dividing input into a large number of tags. Markers are words in a language, units of content, which in human language are equivalent to words in a language dictionary.

Grammatical analysis is the process of applying the lexical rules of a language.

Parsers typically divide the parsing work between two components: a lexical analyzer and a parser

Parsing process: Document -> Lexical analysis (using lexical analyzer) -> Syntax analysis (parser) -> Parse tree

translation

Parsing is usually used in translation, which is to convert the input document into another format.

Source Code -> Parsing -> Parse Tree -> Translation -> Machine Code

(2) HTML parser

The HTML parser’s job is to parse HTML tags into parse trees.

The parser’s output “parse tree” is a tree structure made up of DOM elements and attribute nodes. DOM stands for Document Object Model. It is an object representation of an HTML document and an interface between external content (such as JavaScript) and HTML elements.

The root node of the parse tree is the “Document” object

(3) CSS parsing

Unlike HTML, CSS is a context-free syntax that can be parsed using the various parsers described in the introduction

(4) The order in which scripts and stylesheets are processed

Script: The model of the network is synchronous. Web page authors expect the parser to encounter

Pre-parsing: WebKit and Firefox have both made this optimization. As the script executes, other threads parse the rest of the document to find and load additional resources that need to be loaded over the network. In this way, resources can be loaded on parallel connections, increasing overall speed. Note that the pre-parser does not modify the DOM tree, but hands that job off to the main parser; The pre-parser only resolves references to external resources, such as external scripts, stylesheets, and images.

Style sheets: Style sheets, on the other hand, have different models. In theory, applying a stylesheet does not change the DOM tree, so there seems to be no need to wait for the stylesheet and stop parsing the document. One problem with this, however, is that the script requests style information during the document parsing phase. If the style has not been loaded and parsed at that time, the script will get the wrong reply, which can obviously cause a lot of problems. This may seem like an atypical case, but it’s actually quite common. Firefox disables all scripts during stylesheet loading and parsing. WebKit, on the other hand, disallows a script only if the style property it is trying to access may be affected by an unloaded stylesheet.

2. Rendering tree construction

(1) The relationship between rendering tree and DOM tree

While the DOM tree is being built, the browser builds another tree structure: the rendering tree. This is a tree of visual elements in their display order and a visual representation of the document. It lets you draw the content in the correct order.

Rendering trees correspond to DOM elements, but are not one-to-one. For example, the ‘head’ element, or the element whose display attribute is None.

(2) The process of constructing the presentation tree

(3) Style calculation

When building a rendering tree, you need to calculate the visual properties of each rendering object. This is done by evaluating the style attributes of each element.

Styles include stylesheets from various sources, inline style elements, and visual attributes in HTML (such as the “BGColor” attribute). The latter will be transformed to match the CSS style properties.

(4) Progressive processing

3, layout,

(1) Dirty bit system

(2) Global and incremental layout

(3) Asynchronous and synchronous layouts

(4) Optimization

(5) Layout processing

(6) Width calculation

(7) Line break

4, drawing

(1) Global rendering and incremental rendering

(2) Drawing sequence

(3) Firefox display list

(4) WebKit rectangular storage

(8) The browser sends requests to obtain resources embedded in HTML (such as images, audio, video, CSS, JS, etc.)

In fact, this step can be juxtaposed with Step 8, when the browser displays the HTML, it notices that it needs to fetch tags for the content of other addresses. At this point, the browser sends a fetch request to retrieve the files.

Refer to the article: zhuanlan.zhihu.com/p/133906695