Browser input URL to request process and corresponding performance optimization

Recruitment advertisement

Resumes can be sent directly to [email protected]

Please provide resume name format: name – position – Years of service – location (e.g. : Zhang SAN – front-end development – five years – Changsha. PDF)

Company: Tencent

Location: Changsha

Position: Web front-end Development engineer

Job Responsibilities:

Responsible for the system research and development of Tencent Cloud DNSPod product, completed the front-end function and back-end logic code implementation of the system, and ensured product quality and development progress.

Job Requirements:

1. Bachelor degree or above, major in computer science, more than 2 years working experience;

2. Proficient in Javascript, HTML, CSS and other front-end development technologies, with a solid foundation;

3. Familiar with current mainstream front-end frameworks (React/Vue, etc.), experience in react and Redux development is preferred;

4. Familiar with HTTP and TCP/IP protocols; Have good security awareness, familiar with common network security attack and defense strategy;

5. Good analytical and problem-solving skills and enthusiasm for learning;

Node.js/PHP development experience is preferred;

7. Plugin developers with WP or DZ experience are preferred

Note: This position is prepared by a wholly-owned subsidiary of Tencent Group”

Company: Tencent

Location: Tencent headquarters in Shenzhen

Position: Senior Web Front-end Development Engineer

Job Responsibilities:

Responsible for the system architecture design and r&d of Tencent Cloud domain name product (DNSPod)

Job Requirements:

1. Bachelor degree or above, major in computer science, at least 5 years working experience;

2. Proficient in Javascript, HTML, CSS and other front-end development technologies, with a solid foundation;

3. Familiar with current mainstream front-end frameworks (React/Vue, etc.), experience in react and Redux development is preferred;

4. Familiar with HTTP and TCP/IP protocols; Have good security awareness, familiar with common network security attack and defense strategy;

5. Good analytical and problem-solving skills and enthusiasm for learning;

Node.js/PHP development experience is preferred;

preface

The understanding of HTTP, has been in the reading of the material, a few days later, I forgot the following article is just a comb of the whole process of HTTP, and from HTTP requests, to a simple performance optimization comb. The main design is as follows:

HTTP Request Process
Dns-prefetch, preconnect, preload, prefetch, def, async
TCP connection (three handshake, four wave (why three handshake, four wave))
Cache-control (max-age, public/private, no-cache, no-store,Pragma, must-revalidation) Last-modify (if-modify-since), etag(if-none-match)), browser cache (memory cache), strong cache, negotiated cache (from disk cache)
PageSpend and LightHouse for performance analysis

The sorting of the above links is based on the analysis results of predecessors, and all the links will be listed later.

HTTP Request Process

What actually happens when you type in a web address?

Enter the address in your browser’s address bar, for example:fecebook.com
Browsers look up IP addresses by domain name (DNS resolution)
The browser sends an HTTP request to the Web server
The Facebook server is permanently redirected

Because we typed fecebook.com instead of http://www.facebook.com/, the server automatically made a permanent redirect and returned a 301 status code

Why does the server have to redirect rather than simply send the content the user wants to see? There are many interesting answers to this question.

One reason has to do with search engine rankings. You see, if a page has two addresses, like http://www.igoro.com/ and http://igoro.com/, search engines will think of them as two sites, resulting in fewer search links for each and thus lower rankings. Search engines know what a 301 permanent redirect means, and will place all addresses with and without the WWW in the same ranking.

Another is that using different addresses makes the cache less friendly. When a page has several names, it may appear in the cache several times.

The browser tracks the redirected address
The server processes the “request
The server returns an HTML response
The browser parses the HTML and draws the page
Browsers send objects that are embedded in HTML, such as images, CSS styles, JS files, fonts, etc
The browser sends an Ajax request

Now that we’ve seen the whole process from entering the URL to presenting the page, we’ll take a closer look at some of the key steps

DNS Resolution Process

We already know above, we input a URL, initiate a request, in fact, the final server to receive the request, and each server has an IP address, so generally a domain name to an IP address (there are also corresponding to multiple IP addresses, we temporarily value analysis of the case of an IP address), But how does the browser know which IP address the domain name corresponds to? This involves domain name resolution. Domain name resolution is the following process:

If there is an IP address in the browser cache, we will not continue to search. We can use the IP address to see all DNS cached in our browser in Chrome.
If the browser doesn’t find it, we check our computer to see if the domain name is saved
If it is not saved locally, it will look it up from the router
If your router doesn’t have it, you look it up in your ISP

From the above analysis, we need to do DNS resolution to find the IP when we input a domain name, but in our code, we often put some static resources in THE CDN, and we need to do DNS resolution for each CDN address, which will waste time. We can do DNS resolution in advance, and then when we request, You don’t have to wait until DNS has resolved

<! -- In the head tag, the earlier the better --> <link rel="dns-prefetch" href="//example.com">
Copy the code

A Tcp connection

TCP three handshakes &Render Tree => from input URL to page display process?

First handshake: Establish a connection

The client sends a connection request packet segment, and sets the SYN value to 1 and Sequence Number to X. The client enters the SYN_SEND state and waits for the confirmation from the server.

Second handshake: The server receives a SYN packet segment

Context The server should acknowledge the SYN segment received from the client, and set this Acknowledgment Number to X +1(Sequence Number+1). Set the SYN value to 1 and Sequence Number to y. The server puts all the above information into a packet segment (namely, SYN+ACK packet segment) and sends it to the client. The server enters the SYN_RECV state.

Third handshake: The client receives a SYN+ACK packet

Context After receiving the SYN+ACK packet from the server, the client set the Acknowledgment Number to Y +1 and send the ACK packet to the server. After the ACK packet is sent, both the client and the server enter the ESTABLISHED state and complete the TCP three-way handshake.

After completing the three-way handshake, the client and server begin transferring data. In the process above, there are some important concepts:

Unconnected queue: In the three-way handshake protocol, the server maintains an unconnected queue that opens an entry for each client’s SYN packet (SYN =j), which indicates that the server has received a SYN packet, has sent an acknowledgement to the client, and is waiting for one. The connection identified by these entries is in the Syn_RECV state on the server. When the server receives an acknowledgement packet from the customer, the entry is deleted and the server enters the ESTABLISHED state. Backlog parameter: Represents the maximum number of unconnected queues.

Syn-ack retransmission times: After the server sends a SYN-ACK packet, if it does not receive any customer confirmation packet, the server retransmits the syn-ACK packet for the first time. After the server waits for a period of time and does not receive any customer confirmation packet, the server retransmits the syn-ACK packet for the second time. If the retransmission times exceed the upper limit, the system deletes the connection information from the unconnected queue. Note that the wait time for each retransmission may not be the same.

Unconnected survivaltime: Indicates the longest survivaltime of an unconnected entry, that is, the longest time between the service receiving a SYN packet and confirming that the packet is invalid. This value is the sum of the longest waiting times of all retransmission request packets. Sometimes we also call unconnected lifetime Timeout time, SYN_RECV lifetime.

Why three handshakes

Refer to the article

In the fourth edition of Computer Network written by Xie Xiren, the purpose of “three-way handshake” is to prevent the invalid connection request message segment from suddenly being sent to the server, resulting in errors

Invalid connection request segment Is generated in this case: The first connection request segment sent by the client is not lost, but is detained on a network node for a long time. As a result, it is delayed until a certain time after the connection is released. Originally, this is an invalid packet segment. However, after the server receives the invalid connection request packet segment, it mistakenly thinks it is a new connection request sent by the client. Then the client sends a confirmation message to agree to establish a connection. Assuming that the “three-way handshake” is not used, a new connection is established as soon as the server sends an acknowledgement. Since the client does not send a connection request, it ignores the server’s confirmation and does not send data to the server. However, the server assumes that the new transport connection has been established and waits for data from the client. As a result, many of the server’s resources are wasted. The three-way handshake prevents this from happening. For example, the client does not issue an acknowledgement to the server’s acknowledgement. When the server receives no acknowledgement, it knows that the client has not requested a connection.”

Author: wuxinliulei

Link: www.zhihu.com/question/24…

Source: Zhihu

Why four waves

TCP Four Waves – Why four waves

Author: Li Taibai is not white

Source: CSDN

Original text: blog.csdn.net/daguairen/a…

When host 1 sends a FIN packet, it only tells host 2 that it has no more data to send but can still receive data from host 2 (the first time).
When host 2 sends a message, it just tells host 1 that I have received the signal and know that you have no data to send again, but host 2 can still send data to host 1(the second time).
When host 2 really has no data to send to host 1, it sends a message to host 1, telling host 1 that I have no data to send either (third time)
After receiving the packet, host 1 sends the packet to host 2 again, indicating that the connection can be closed (fourth time).

conclusion

From the above analysis shows that each request resources need to be a TCP connection, there will be three times handshake operations, said the connection is successful, after the success of the connection, the server will send data to the client, if every request resources to connect, it a waste of time, before we can request resources, first connection beforehand, in the real request, has been connected, It is ok to send resources before, we can use the following ways:

Refer to dnS-prefetch, preconnect, prefetch, and PRERender in the Head TAB

<link rel="preconnect" href="//example.com">
<link rel="preconnect" href="//cdn.example.com" crossorigin>
Copy the code

The browser does the following:

Interpret the href value, if it’s a valid URL, and then proceed to determine whether the PROTOCOL of the URL is HTTP or HTTPS or else you’re done
If the current page host is different from the href property,crossorigin is actually set to anonymous, You can add the Crossorign attribute if you want to bring information such as cookies. Corssorign is set to the same as use-credentials

Cache handling

We have established the TCP connection, the server can already send resources to the client (browser), but if the resource has been requested once, but we refresh the page, we also need to request resources again, this is too waste of request, browser solution to request again has a cache policy, Cache is again request resources to try to get the best from the already requested resource so as to reduce the number of requests, also is not need a TCP connection again, but if the browser check every time the cache will not ever have the resources to request again, may we get to the resources that will also cause is not the latest, so for the two cases, There are two strategies for browser caching:

Strong cache
Negotiated cache (weak cache)

Let’s make a simple analysis of the two strategies.

Strong cache

What are we talking about when we talk about HTTP caching

Strong caching is primarily determined by the browser based on two fields in the request header:

expires
cache-control

Strong cache hit from memory cache & from disk cache

Cache 200 (from memory cache) cache 200 (from disk cache) cache 200 (from disk cache)

Memory cache: Stores and obtains resources from the memory.
Disk Cache: Cache resources to disks and obtain them from disks. The biggest difference is that when you exit a process, the data in memory is wiped, while the data on disk is not.

In fact, if there is a request for more than one image resource in a page, the browser will automatically process it, automatically fetch from memory cache, but we close the page or refresh the page, the memory cache is invalid, but this cache is automatically handled by the browser for us. There’s nothing we can do about it.

expires

Expires is a feature in HTTP 1.0 that determines whether a resource has expired by specifying the absolute GMT of its cache expiration. If it has not expired, the cache is used; otherwise, the resource is rerequested.

Disadvantages: Cache life cycle errors can occur if the time representation is wrong or if the correct time zone is not converted due to specific times.

cache-control

Cache-control was introduced in HTTP1.1 to compensate for the Expires flaw, which takes precedence over Expires when both Expires and cache-control exist.

Cache-control: cache-control:

attribute	describe
max-age	Sets the maximum period for which the cache is stored, after which the cache is considered expired in seconds.`cache: max-age=60`This is 60 seconds
public/private	Public means that both server side and browser side can cache,`cache: max-age=60, public`, private indicates that only the user’s browser can cache. The CDN of the router cannot cache
no-cache	If the cache is still in effect, the server will only return a 304 request body. The number of requests is not reduced, but the number of resources requested may be reduced (Express caching policy, if the request header is carried)`cache-control`And it’s set`no-cache`The new resource will only be returned. 304)
no-store	Instead of caching, negotiate caching is used
must-revalidate	The cache must verify the state of the old resource before using it, and it must not use expired resources.

If cache-control is set to an expired resource or a no-store is set, it does not mean that the cached resource is no longer usable. Browsers can also use negotiated caches in conjunction with negotiated caches

Negotiate the cache

If a cache-control resource fails, the browser invokes a negotiated cache policy, which is handled by two request headers:

Last-modified (if-modified-since) -> HTTP 1.0
Etag (if none – match) – > HTTP 1.1

last-modified

When a browser requests a server resource, the server assigns the last modified time of the file to last-moified: Fri, for example,08 Jun 2018 10:2:30: GMT

When the resource is requested again (refresh the page (not force refresh F5 + Ctrl), or reopen the page), the request header will add an if-modified-since header, which is the last-modified value, as in if-modified-since: Fri,08 Jun 2018 10:2:30: GMT, is sent to the server. The server uses this value to determine whether the cache is valid or not. If the cache is still valid, a 304 and an empty response body will be returned, and the browser will read from the cache

Etag

In fact, Etag is the same as last-Modified. It is a tag returned by the backend for the corresponding resource, but last-Modified is the last modification time of the resource, and Etag is the corresponding tag of the resource. Different servers use different strategies to generate Etag. For example, the Express framework generates eTags based on when the file was last modified – the size of the file

function stattag (stat) {// mtime time when the file was last modified // size File size var mtime = stat.mtime.getTime().tostring (16) var size = stat.sie.tostring (16)return '"' + size + The '-' + mtime + '"'
}
Copy the code

When requesting the resource again (refresh the page (not force refresh F5 + Ctrl), or reopen the page), the request header will add an if-none-match request to the server. The server will use this value to determine whether the cache is valid or not. If the cache is still valid, the server will return a 304, And an empty response body, the browser will read from the cache, otherwise it will return 200 and return the request result.

In HTTP 1.1, the concept of eTAG was introduced. Etag mainly solved the following problems:

Some files may not have changed content (just changed modification time) and we don’t want the file to reload. (Etag values trigger caching, last-Modified does not.)
If-modified-since The granularity that can be checked is second, last-modified triggers caching when changes are very frequent, and Etag values do not trigger reloading.
Some servers do not know exactly when a file was last modified.

If both are set`last-modified` 和`etag`Tag? So who has a higher priority

If both last-Modified and ETAG are set, which has higher priority? The rule is that eTAG takes precedence over Last-Modified, so why does eTAG take precedence over Last-Modified? Is it determined by the browser?

After analysis, it is not determined by the browser, but by the server. The browser simply carries the last-Modified and etag headers to the server when requesting the resource, and then the server decides whether the cache can be used. We can check the source code of express processing logic to analyze:

    if (this.isCachable() && this.isFresh()) {
      this.notModified()
      return
    }
Copy the code

This.notmodified () returns a 304:

SendStream.prototype.notModified = function notModified () {
  var res = this.res
  debug('not modified')
  this.removeContentHeaderFields()
  res.statusCode = 304
  res.end()
}
Copy the code

The main logic used by Express to determine whether caching works is implemented in the this.isfresh () method:

function fresh (reqHeaders, resHeaders) {
  // fields
  var modifiedSince = reqHeaders['if-modified-since']
  var noneMatch = reqHeaders['if-none-match']

  // unconditional request
  if(! modifiedSince && ! noneMatch) {return false
  }

  // Always return stale when Cache-Control: no-cache
  // to support end-to-end reload requests
  // https://tools.ietf.org/html/rfc2616# section - 14.9.4
  var cacheControl = reqHeaders['cache-control']
  if (cacheControl && CACHE_CONTROL_NO_CACHE_REGEXP.test(cacheControl)) {
    return false} / /if-none-match
  if(noneMatch && noneMatch ! = =The '*') {
    var etag = resHeaders['etag']

    if(! etag) {return false
    }

    var etagStale = true
    var matches = parseTokenList(noneMatch)
    for (var i = 0; i < matches.length; i++) {
      var match = matches[i]
      if (match === etag || match === 'W/' + etag || 'W/' + match === etag) {
        etagStale = false
        break}}if (etagStale) {
      return false/ /}}if-modified-since
  if (modifiedSince) {
    var lastModified = resHeaders['last-modified'] var modifiedStale = ! lastModified || ! (parseHttpDate(lastModified) <= parseHttpDate(modifiedSince))if (modifiedStale) {
      return false}}return true
}
Copy the code

We can look at express in terms of the code above and see how caching works

If the request header is not carriedif-modified-since 和if-none-matchHeader, directly determine cache invalidation

  var modifiedSince = reqHeaders['if-modified-since']
  var noneMatch = reqHeaders['if-none-match']
  if(! modifiedSince && ! noneMatch) {return false
  }
Copy the code

If the request header hascache-controlAnd there are Settingsno-cache, the cache failure is directly determined (var CACHE_CONTROL_NO_CACHE_REGEXP = /(? :^|,)\s*? no-cache\s*? (? : | $) /)

  var cacheControl = reqHeaders['cache-control'] / /if (cacheControl && CACHE_CONTROL_NO_CACHE_REGEXP.test(cacheControl)) {
    return false
  }
Copy the code

And then judgeif-none-matchAnd the way to tell is to get another oneetagAnd then judgeif-none-matchWhether or notetagEqual. If not, the cache is invalid

  // if-none-match
  if(noneMatch && noneMatch ! = =The '*') {
    var etag = resHeaders['etag']

    if(! etag) {return false
    }

    var etagStale = true
    var matches = parseTokenList(noneMatch)
    for (var i = 0; i < matches.length; i++) {
      var match = matches[i]
      if (match === etag || match === 'W/' + etag || 'W/' + match === etag) {
        etagStale = false
        break}}if (etagStale) {
      return false}}Copy the code

Var etag = resHeaders[‘etag’] specifies the Etag to be obtained at request

In the endif-modified-sinceThe logic of its judgment is, iflast-modifiedIs less than or equal toif-modified-since, the cache is directly determined to be invalid

  // if-modified-since
  if (modifiedSince) {
    var lastModified = resHeaders['last-modified'] var modifiedStale = ! lastModified || ! (parseHttpDate(lastModified) <= parseHttpDate(modifiedSince))if (modifiedStale) {
      return false}}Copy the code

From the above analysis, we can see that the Express cache implementation mechanism does not follow the eTAG priority over Last-Modified, but the failure mechanism follows the ETAG priority over last-Modified.

Continue to…