Enter url to page render full link analysis

First, site loading overview

When you enter the URL and load the page, what does the browser do?

First type in a URL, and you will see a loading icon appear in the browser TAB. It will rotate counterclockwise at first, then clockwise. The current page will disappear, showing a blank page, and then a new page will appear showing our request. If the network is bad at this point, you might see a brief DOM page and then a normal rendered page, which is a superficial loading process, but the actual browser does much more.

Why do browsers play so much? Direct display is not good, of course not, like drinking porridge, can directly eat rice, drink water?

When the user enters the URL to display the page, the browser first obtains front-end resources from the server, and then converts the bytes returned by the server into the corresponding page. Each stage requires the corresponding capabilities of the browser to process.

As a front-end development, it is very important to understand the whole process, only know the whole process of browser loading page, in order to avoid the possible pit in the development, in order to quickly locate the problem after discovering the problem, in order to put forward more solutions in the performance optimization.

For example, the following question, read the article believe you know why

Why is there a blank page? (To solve this problem, various factories are implementing various schemes.)
Why was JS introduced after Dom
Why does one page crash cause multiple pages to crash
What is the browser cache strategy?
Why do we often change host to access the corresponding domain name
, etc.

Second, the browser multi-process architecture

In order to understand the following, it is necessary to understand the multi-process architecture of modern browsers and the relationship between processes and threads.

Browsers have also iterated from a single-process architecture to a modern multi-process architecture.

As shown above, we can see the relationship between threads and processes

Usually a program instance is a process, and the browser allocates memory for it.
Data is shared between processes. Single-threading means that a process contains one thread, and one thread handles all tasks
Multithreading is a process in which multiple threads can perform tasks at the same time, sharing data and memory
A thread cannot exist alone, but must be attached to a process. If a thread fails, the process fails to execute, the process is destroyed, and the memory is immediately reclaimed.

1, processes,

The following figure shows that we open a gold digging home page and then open task manager. It is observed that the browser includes many processes at this time, including browser main process, GPU process, network process, storage process, audio process, rendering process and multiple plug-in processes.

These processes are responsible for the following functions:

process	instructions
Browser process	Responsible for the communication of each sub-process of the browser, dealing with the browser interface, including the address bar, etc
Rendering process	This is the TAB page process we see in the picture, which is often referred to as the browser kernel, and v8 is in this process. Mainly responsible for parsing HTML, JS, CSS rendering pages, etc
Network process	Responsible for initiating network requests and parsing return headers
GUI process	Is responsible for converting blocks generated by the renderer process into bitmaps

The renderer runs in a sandbox and can execute JS, but cannot obtain system permissions and communicate with the browser process via IPC. This is to ensure the security of the browser process, locked into the sandbox, even malicious code, can not break through the sandbox to read or write system information.

2, thread

From the figure above, we already know that threads cannot exist independently and must be attached to processes. We’ll focus on the rendering process here, because its main job is to complete the rendering and presentation of the page.

As you can see, the renderer process consists of many threads. In addition to the renderer process, there are composite threads and so on, which have their own functions. We will discuss this in a later chapter, the analysis of browser rendering principles.

3. Specific process of browser request

The process from the user entering the domain name to the browser rendering page can be divided into the following parts:

1. Input information processing
2. Network request
3. The server returns a request for the resource
4. Browser rendering

There is a lot of knowledge involved in each of these steps, and caching also plays a large role in each of these steps. Expand to describe the first three stages:

1. Address bar input information processing

When entering a URL, the browser determines whether the input information retrieves information or requests the URL

If the information is retrieved, the requested URL is built and the browser’s default search engine is invoked to retrieve it.
If the URL format is correct, the browser main process sends the URL to the network process through IPC communication.

2. The network process initiates a network request

1) The network process will first search the browser cache to determine whether the cache exists and whether it has expired

If it exists and is not expired, the cache information is returned directly. See the browser caching strategy below for details

2) If there is no cache or expiration, start DNS resolution

The DNS resolution process is also complicated. The ultimate purpose is to obtain the IP address of the target host. For details about the resolution process, see the following domain name resolution

3) Establish an HTTP connection or HTTPS connection

HTTP establishes a connection through a three-way handshake
HTTPS requires a TLS connection, and the browser will verify that the website’s digital certificate is valid, expired, secure, etc. HTTPS has its own set of authentication logic, which we will not focus on.
Here two questions are often asked, do not understand can learn
- What’s the difference between HTTP and HTTPS?
- Briefly describe three handshakes, four waves?

4) Send the request

The network process builds the HTTP request header and sends the actual request to the server

5) Network transmission, server processing, return the corresponding resources

The request is sent from the application layer, through the transport layer, network layer, physical layer, and data link layer to find the server, the server gets the request information, and returns the corresponding resources
The server may be a proxy server or a CDN node. It will determine whether the current data is cached. If the current data is cached and valid, it will be returned directly (depending on the configured cache policy).
Network transmission data is also a very complex process, specific can see the following simple introduction of computer network system model

3. The server returns the corresponding resource

1) Process the returned information

The browser receives the resource information returned by the server, and the network process first parses the returned header to see if the Location field is present. If so, it makes a request again, most commonly for an HTTP site, and then redirects to HTTPS.
As shown below, we inputhttp://www.taobao.com/, the interface returns 307 internal redirection, and the browser requests againhttps//www.taobao.com/

By returning the header fieldContent-TypeDetermine the file type, call a different process if it is of another type, and continue processing if it is HTML.

When the network process receives the returned resource information, it sends a message “Submit navigation” to the browser main process, which sends a message “Submit document” to remind the renderer process to prepare to receive the returned resource information
The renderer process and the web process build the channel, receive the resource information, and send a message “submit document” to the browser main process, telling the browser main process I’m ready, and the browser main process starts refreshing the page, URL, security, etc.
The page fires a beforeUnload event that allows the user to choose to terminate the process before the page exits (often applied to form submission pages). If this method is not monitored, the browser simply replaces the current page.
Then there will be white space on the page before the rendering thread draws the page. This is the technical point that the various technical teams are working on, how to minimize the white space, depending on the current page rendering time, so we still need to understand how the browser renders the resources returned by the server.

8) Four waves

The resource transfer is complete, and the connection is disconnected

4. Multiple Tab pages share the rendering process

We open the task manager and see something like the figure below

We find that many tabs are a single process, but some of them share the same render process. Why is this?

In fact, the browser will do some optimization for the new Tab page opened at the current site, if they are the same origin, and the execution environment is the same, will directly reuse the current site renderer process.

This improves rendering performance and allows parent Windows to associate with child Windows, but there are pitfalls

If a thread in the current process has a problem, the current process will crash, and the page shared by the same process will crash.
If there is a malicious script can attack the newly opened page, in the newly opened page we can passwindow.openerObtain the operation rights of the parent page

If there is no correlation

If there’s a connection

This is the phenomenon that we may encounter in reality when one page crashes and all other homologous sites exit.

So how does this co-use of a rendering process occur? In everyday coding we often use these three methods:

1. A label

Under normal circumstances, we use a tag in the project jump, if the adjustment is the same origin page, will appear

<a href="http://www.baidu.com"></a>
Copy the code

Using the latest version of Google tests, this method found that opening a homogenous site within the current page is using a separate process, which is not as expected and will be tested later.

But in general, we can give the a tag an attribute rel=”noopener norefferrer” to ensure that different pages use different processes.

2, the window open

window.open("http://www.baidu.com")
Copy the code

If you open the same site using window.open, you will almost certainly use the same rendering process. To avoid this, add the following code to remove the association between the two

let newWin = window.open("http://my.dome.com")
newWin.opener = null
Copy the code

3, Iframe

If the iframe frame is used in the page to introduce other pages, the IFrame will become an independent auxiliary frame and have its own rendering process. If the same source is used, the same rendering process will be used.

Five, network request

1. Browser cache strategy

Storage policy
Strong cache
Negotiate the cache

The browser cache strategy helps to improve the speed of web page loading and reduce the pressure on the server.

The specific caching process is shown in the figure below:

Details:

1. When we enter the URL, the browser will check whether it has cache. If it does not have cache, the browser will directly request the server to obtain the resource, and cache a copy of the resource to the browser.

ETag is the hash value calculated by the file. If the file is not changed, the value will not change. Last-modified is the time when the file was Last Modified. If the file is updated or overwritten, the latest time is displayed

2. If the browser has a cache, check the CACHE-Control and Expires fields in the HTTP request header to check whether the cache expiration date has passed. If not, the 200 status code and the cache data are returned.

This is a strong cache

Expires is the http1.0 definition that returns an absolute time GMT, the expiration date. This results in invalidation of the cache if the server time and browser time are inconsistent.
Cache-control is an HTTP1.1 definition that can define values as follows
- Max-age =600 Indicates that the maximum validity period is 600 seconds
- No-cache Does not use the browser cache, and the browser negotiates the cache each time
- No-store requests the latest resource every time
- Private Indicates that it is cached only on the user terminal and not on the CDN or proxy server
- Public Public, can be cached on all nodes
If both exist, cache-control has a high priority

3, If the browser cache has expired, the server will retrieve the resource with if-none-match and if-modified-since headers. If the server finds these two fields consistent with the current server resource, it will return the cache and status code 304. The server usually validates if-none-match /ETag first, and then if-modified-since/last-modified If it does not change

Where if-modified-since is last-modified and if-none-match is ETag

If the first request is not cached by default, the browser will cache. The browser cache is derived files, such as CSS, JS, IMG and other files that do not change often. The memory cache must be relatively small, so it will cache JS, and the page will be cleared after closing, disk memory will be longer.

2. Domain name resolution

The DNS is the domain name system, which resolves domain names into corresponding IP addresses. Specific analysis process

1. Enter a URL. The browser first parses the URL and retrieves the host name
2. Then search the DNS cache of the browser and return the corresponding IP address
If no, check whether the Host file has the corresponding IP address (Host file is the mapping between domain name and IP address)
3. If no, the local DNS server sends query packets to DNS servers at different levels to obtain corresponding IP addresses

During each search, the browser, application, and DNS server cache the domain name. If a match is found in the cache, the DNS returns the corresponding IP address. If no match is found, the DNS continues to search for the corresponding DNS server to locate the IP address

Graph TD client --> Client cache --> local Host file --> Local DNS server --> Manager DNS server --> Other DNS servers --> Top DNS server --> Local DNS server --> Root DNS Server

The analysis shows that at this stage, the methods we can optimize are limited, and the common methods are as follows:

1. Add DNS cache tags to HTML files
2. Perform DNS load balancing by resolving domain names to multiple IP addresses

<link rel="dns-prefetch" href="//g.alicdn.com" />
Copy the code

The above code will prefetch g.test.com parsing

<meta http-equiv="x-dns-prefetch-control" content="on">
Copy the code

Above code, set automatically enable DNS resolution function

3. HTTP request process

For network requests, we default to HTTP requests.

1. If it is the first request, the domain name is resolved by DNS to obtain the mapped IP address
2. The client sends HTTP packets to the corresponding server. In this process, the application layer, transport layer, network layer, data link layer and physical layer process the data packets
3. The client and the server (or proxy server) establish a TCP connection through a three-way handshake
4. The server then returns the corresponding resource to the client
5. If it is the second or later request, the browser or server will use the HTTP header parameter to determine whether the resource is expired. If it is not, the server will use the cache

4. Five-layer computer system model

In the second step, the client sends HTTP packets to the corresponding server. The specific process is as follows:

1. Firstly, the application layer provides many protocols, including HTTP, FTP, POP3 and IMAP. Here, the browser uses HTTP protocol.
2. The main protocols of the transport layer are TCP and UDP. After receiving the data packet, the transport layer will first check whether to establish a connection with the destination host. If the connection is successful, the datagram is further encapsulated, the source host port number and destination host port number are added, error detection is performed, and then transmitted to the network layer.
3. The main protocols of the network layer are: IP protocol, which receives datagrams of the transport layer, then adds the IP address of the destination host, encapsulates it into a segment of IP datagrams that conform to the IP protocol, and reaches the data link layer through several routes
4. The data link layer has THE ARP protocol. The MAC address of the destination address can be resolved through the IP address, forwarded through the physical layer, and received by the destination host after reaching the destination LAN through broadcast

Vi. Server processing stage

The server gets the request, negotiates the cache to see if the resource has changed, and returns the cache resource if there is no change.

CDN cache

If there is a proxy server or CDN node, it is equivalent to adding a cache node. First, the request will be forwarded to the latest CDN node. After receiving the request, the CDN node will judge whether the current resource is expired or not.

The existence of CDN solves the delay problem of cross-region request. The server pressure is shunted.

Four wave: If the request ends, the server and client shake hands four times to disconnect.

7. Page rendering stage

The exact rendering is complicated and space is limited, so continue in the next article.

Eight, the soul torture

Why should the browser parse url? What are the coding rules and how to parse url
Domain name resolution process, recursive query, iterative query
Network request three handshake, four wave principle, why three handshake, two not
Network request process, computer network system
How does Ping work
HTTP cache classification, strong and negotiated cache, heuristic cache
How do I set up HTTP caching
The difference between HTTP and HTTPS, the difference between different versions
Browser rendering principle, rendering order
Page rendering optimization
What is redraw and reflow
Three handshakes, four waves