What happens from entering the URL to loading the page

Based on a large number of references on the Internet, the understanding of this loading process is sorted out. Well aware of their own cognitive understanding is limited, please clap brick.

Browser behavior representation of this process (step decomposition) :

  1. Enter the URL and press Enter (resolve the specific IP address through DNS)
  2. Browser requests (browsers send HTTP requests over TCP)
  3. The server returns data (the server processes the request and returns HTTP packets)
  4. Browser render page

Enter the URL and press Enter

The main thread of the browser parses the URL and finds the specific IP address by the server address (DNS resolution)

DNS resolution process:

  • Step 1: The browser will check the cache to see if the domain name corresponds to a resolved IP address. If so, the resolution process will end. – Browser cache is limited (Chrome caches 60 seconds per domain by default)
  • Step 2: If the user’s browser does not have the URL mapping in the cache, the operating system checks whether the local hosts file has the URL mapping. If yes, it invokes the IP address mapping to complete domain name resolution. (Reason for configuring hosts in the development test environment)
  • Step 3: If there is no mapping to the domain name in the hosts, check the local DNS parser cache to see if there is a mapping to the domain name. If there is, return to complete domain name resolution.
  • Step 4: If no mapping exists between hosts and the local DNS parser cache, the hosts accesses the recursive DNS server (zonal, carrier-root). The recursive DNS server searches the domain name records recursively and returns the result

Note:

  1. In the beginning, the Internet Information center manages a hosts file as a whole. Due to the continuous expansion of the network, it has to produce a system (NDS system) that can effectively manage the corresponding relationship between host names and IP addresses. DNS is a distributed database in the Internet
  2. We optimize DNS loading speed according to DNS resolution principlePreload DNS:<link rel="dns-prefetch" href="/img.alicdn.com">

How does the browser initiate a request?

Browsers make requests over HTTP, which is a sub-layer of the TCP/IP protocol family. The most important aspect of the TCP/IP protocol family is the layer (five)

  1. Application layer: Parse to IP address and send HTTP request (NDS and HTTP belong to application layer)
  2. Transport layer: establish TCP connection with three handshakes (provide reliable fragment data transmission for application layer)
  3. Network layer: IP addressing (provides transport route for transport layer)(The role of the network layer is to select a transport route among many options for transport to and from a server over multiple computers and network devices)
  4. Link layer: NIC
  5. Physical layer: physical transmission (hardware category bit stream, twisted-pair, electromagnetic wave..)

What is TCP?

TCP is responsible for establishing connections, sending data, and disconnecting connections. TCP provides reliable transmission of data sent from the application layer to the server. To achieve this purpose, a TCP header (source port number, target port number, serial number, and parity) needs to be added to the data front end of the application layer and then transmitted to the IP layer. The IP layer provides the transmission route

Web developers must be familiar with HTTP. That is, HTTP interaction (HTTP packet structure, caching, cross-domain, cookie, security…) between the front and back ends. . HTTP is based on TCP, so…

How does TCP achieve reliable transmission? (Three handshakes)

TCP uses the three-way handshake policy (to ensure that both parties can receive the packet). The TCP connection is full-duplex. Data reads and writes between the two parties can be performed through the same connection.

A figurative description:

// client (hair egg) : two dogs two dogs, I am hair egg, I am hair egg, hear please answer, hear please answer!
// Server (2 dog) : 2 dog Roger, 2 dog Roger, hairy egg go ahead, hairy egg go ahead, over!
// Client (hairy egg) : Hairy egg received...
Copy the code

Four times to wave

Wave four times: ensure that the double data has been sent, both considered to be disconnected. If the client sends a close request and the server does not have information to pass to the client, it can be merged into a single data to send to the client.

// Client (first wave) : Initiates the close request (FIN packet segment), and the client enters the FIN_WAIT_1 state
// Server (second wave) : Replies the client (set ACK packet segment). The server enters the CLOSE_WAIT state. After receiving the ACK packet, the client enters the FIN_WAIT_2 state
// Server (third wave) : The server checks whether it has no data to send to the client. If yes, the server sends the data and then the FIN packet. If no, the server directly sends the FIN packet to the client and enters the LAST_ACK state
// Client (fourth wave) : The client receives the FIN packet from the server and sends an ACK packet to the server. Then the client enters the TIME_WAIT state. The server closes the connection after receiving the ACK packet from the client. At this point, the client waits for 2MSL and still does not receive a reply from the server, it proves that the server is normally closed, the client can also close the connection
Copy the code

Note:

  1. Why does the client finally need to wait state (TIME_WAIT)? The FIN packet segment may be lost
  2. When to do four waves: According toConnectionRequest header, if yesKeep-alive, the server is keptTCPLink, if notKeep-aliveOr take the initiative toclose, then the serverResponseActive shutdown after completion of transmissionTCPLinks (http1.1The default openKeep-aliveWhen the browser TAB is closed,TCPLink closed)

Server returns data

The browser reaches the server through http-TCP-IP-link-physical. The server performs step by step (link-IP-TCP-HTTP) reverse lookup through the protocol. After receiving the HTTP information, the server processes it (reverse proxy, security interception, cross-domain authentication, service logic…). After the program is executed, it is sent to the front end through layer upon layer encapsulation to complete the interaction

All kinds of HTTP headers

General head

14 representative status codes

  • 200: The request is processed successfully and the requested resource is sent back to the client
  • 204: The request is successfully processed, but the response message does not contain the body content (used when the client wants to send a message to the server without returning the new information content)
  • 206: Content-range Request
  • 301: Permanent redirection (jingdong old domain www.360buy.com)
  • 302: Temporary Redirects (Short links)
  • 304: Cache (detailed below)
  • 307: Temporary redirection (similar to 302) – more rigorous (doesn’t change from POST to GET)
  • 400: The client request is incorrect (the request packet contains incorrect syntax and is resend after modification).
  • 401: Unauthorized request
  • 403: Access forbidden (no access permission, access permission problem…)
  • 404: Resource not found
  • 500: Server internal errors (errors, bugs, temporary failures…)
  • 503: Server unavailable (server is overloaded or down for maintenance)

Common request and response headers

The cache

Two types: strong cache (200 from cache) and negotiated cache (304)

  1. Strong caching http1.1(cache-Control/max-age), HTTP1.0 (Pragma/Expires)
  2. Negotiation cache http1.1(if-none-match/e-tag), http1.0(if-Modified-since/last-modified) http1.1 method is always superior to http1.0, A better solution is to eliminate 304, which is a strong cache except for the home page. If you need to update the static resource, you can modify the MD5 stamp of the HTTP resource cache

Browser render page

The browser gets the HTML document,

  1. DOM parsing (tokenization and tree building) – DOM tree
  2. CSSS parsing – CSSOM tree
  3. Merge THE DOM Tree with the CSSOM Tree and remove the Render Tree – think of the root HTML as a layer promotion (just as javascript as a whole is itself a macro task) – Paint
  4. Layer Tree: Create a separate Layer (level up) for some nodes in the render Tree – Paint
  5. Layer merge processing – Form Graphics Lyaer(GPU hardware Acceleration) – GPU direct drawing
  • Bonus 1: HTML cannot be parsed in a conventional top-down or bottom-up manner, because browsers have a history of tolerating invalid HTML usage and the parsing process needs to be repeated. Source content during parsing usually won’t change, but in the structure of JS tend to add additional tags (rearrangement | | reflux).
  • Attachment 2: tokenization: lexical analysis (start tag, end tag, attribute name, attribute value). Build tree: Once tags are generated, they are passed to the component tree, building tags one by one, and so on
  • Note 3: CSS parsing and DOM parsing are synchronized (DOM parsing structure, CSS parsing style, the two do not conflict)
  • Note 4: the default DOM and CSS loads are mutually exclusive with JS (JS may access our CSS styles && structures).
  • Add5: Generate Graphics Lyaer elements (video, Canvas, flash, CSS3D, perspective, opacity, transfrome, animation, transition…)

Speaking of the rendering process (again, basic optimization manual)

Request relevant optimization tools

  1. Resource placement (parsing CSS blocks JS execution, parsing JS blocks CSS parsing and page rendering, CSS at the head and JS at the bottom)
  2. Asynchronous script tag

It is inevitable to load the JS file in the header or body. Defer asynchronously load the JS – execute it immediately after DOM parsing (the effect is equivalent to the one at the bottom of the body, the main entry file is ok) and async asynchronously load the JS – execute it immediately after the JS is loaded (separate code is recommended, such as: Baidu statistics).

  1. Avoid using css@import

I have never written much in my CSS code, causing extra requests, if really want to use, using Sass, will automatically merge (introducing placeholders will not cause code duplication)

  1. Note the empty SCR

The a tag, empty href, will redirect to the current page address and the Form to the empty method, will submit the Form to the current page address

  1. We optimize DNS loading speed according to DNS resolution principlePage loading optimization:<link rel="dns-prefetch" href="/img.alicdn.com">

DOM related optimization

  1. Cache the DOM
Let CacheEl = {container: document. QuerySelectorAll (' container '), / / only querySelectorAll returned NodeList as static divCollection: Div document. GetElementsByTagName (), / / HTMLCollection for dynamic $Box: $(' # Box ')};Copy the code
  1. Batch operation DOM
  1. After the stitching is complete, the DOM is updated with innerHTML
  2. The DocumentFragment object can actually theoretically be optimized, but
  3. The second option takes longer than the first, because what modern browsers can optimize will be optimized by the browser
  1. DOM read/write separation (significant performance impact)

Modification and DOM access are done separately in batches (reading the DOM triggers the browser to render once)

  1. Event broker (reduce memory footprint)

In fact, the DOM manipulation era, the most familiar is the event broker (dynamic capture of added nodes)

  1. Minimize global impact (programming thinking, reducing memory)

Life cycle concepts: clear timer when not in use, clear event listener when not in use, create minimum scope (GC), clear object references

Resource loading optimization

  1. Use browser cache (images, JS, CSS, fonts, SWF, audio, Ajax GET requests) before first screen loading
  2. Ajax preloading scenarios (short term fixed data such as – option list). GET requests can be cached. If the request is POST, it can be used with localStorage as localStorage. Go local first, and use it if you get it, because the page is already requested during rendering. This can achieve the purpose of saving time.
  3. Dynamically request resources: dynamically create an Image object, or a script object, set SRC, append the node to the page (Image objects do not need to), as soon as the SRC property is set, the browser will issue the request. Even the simplest reports are done that way
  4. In idle time, preload for the next screen of SPA (in idle time, quietly load the content of the next screen)
  5. Anticipate user actions, pre-load data (e.g. Amazon’s secondary menu) – high technical requirements
  6. Lazy loading of resources, lazy loading of images, JS loading on demand (for systems with complex business logic, the corresponding loading webpack is triggered after clicking), and rolling performance improvement: function throttling

Conclusion:

  1. The DNS
  2. TCP three handshakes four waves
  3. Various HTTP headers
  4. 14 representative status codes
  5. The cache
  6. Page rendering process
  7. Basic optimization (Webpack + three frameworks default optimization)

Much fine point, have the opportunity to continue (⊙﹏⊙) B (⊙﹏⊙) B (⊙﹏⊙) B

  • Cross-domain, Cookie, CSS (BFC), class, JS preprocessing, execution context, scope, this, ES6…
  • Web security, HTTPS, HTTP2.0…
  • Event Loop(browser, Node), Promise

A DNS cache disaster

The principle behind the magic of BFC

@ $@ $@ $@ $@ $@ $@ $@ $@ $@ $@ $@ $@ $@ $@ $@ $@ $