Interview selection of HTTP cache

The second most frequently asked question in a front end interview is about HTTP caching. To be honest, there are a lot of details about HTTP caching, and the common HTTP protocol versions are 1.0 and 1.1 (http2.0 is not covered in this article).

Cache related Headers

Let’s first list the cache-related request and response headers.

Expires

Response header that represents the expiration time of the resource.

Cache-Control

Request/response headers, cache control fields, precise control of cache policies.

If-Modified-Since

The request header, the last time the resource was modified, is told to the server by the browser.

Last-Modified

The response header, the last time the resource was modified, is told by the server to the browser.

Etag

The response header, the resource identifier, is told by the server to the browser.

If-None-Match

The request header, the cache resource identifier, is told to the server by the browser.

Fields used for pairing:

The if-modified-since and last-modified
The Etag and If – None – Match

Today we will focus on the browser caching mechanism, as we know, browser caching is generally for static resources, such as JS, CSS, images, etc., so we will focus on a javascript file a.js to illustrate the following example. Instead of theoretical indoctrination, we will start from the actual scenario, and improve the caching mechanism a little bit, this way, I believe you will be easier to understand.

Make some conventions for later comparison.

A. js Size is 10 KB
The request header convention is 1 KB
The response header convention is 1 KB

The original model

The browser requests the static resource A.JS. (Request header: 1KB)
The server reads the disk file a.js and returns it to the browser. (10KB (A.js) +1KB (response header) = 11KB).
The browser requests again, and the server reads the disk file A.js again and returns it to the browser.
And so on.

Perform a round trip with traffic of 10 (a.js) +1 (request header) +1 (Response header) = 12KB.

10 visits, traffic is about 12 * 10 = 120KB.

Therefore, traffic is related to the number of visits:

L (traffic) = N (number of accesses) * 12.

The disadvantages of this method are obvious:

Waste user traffic.
The server is wasting resources by reading the disk file and then sending it to the browser.
The browser must wait for A. js to download and execute before rendering the page, which affects the user experience.

The js execution time is much faster than the download time. If you can optimize the download time, the user experience will be much improved.

Add caching mechanism to browser

The browser requests a.js for the first time and caches A.js to the local disk. (1 + 10 + 1 = 12 KB)
When the browser requests a.js again, it goes directly to the browser cache (200, from cache) and does not send a request to the server. (0 KB)
.

For the first access, the traffic is 1+10+1 = 12KB. On the second visit, the traffic is 0. . On the 10,000th visit, the traffic is still zero.

So the traffic is independent of the number of accesses:

L (traffic) = 12KB.

Advantages:

Greatly reduced bandwidth.
As the download time of A. js is reduced, the user experience is improved accordingly.

Disadvantages: When a.js is updated on the server, the browser cannot perceive and get the latest JS resources.

The server and browser agree on the resource expiration time.

The server and browser agree on the expiration time, which is controlled by the Expires field. The time is the standard time in GMT format, such as Fri, 01 Jan 1990 00:00:00 GMT.

The browser requests a static resource a.JS for the first time. (1 KB)
The server sends the cache expiration time (Expires: Mon, 26 Sep 2018 05:00:00 GMT) for a.js and A.js to the browser. (10 + 1 = 11 KB)

The server tells the browser: you can cache the A.JS file I sent you to you, and don’t bother me with requests before 5pm on September 26, 2018, and just use the A.JS cached by yourself.

The browser receives a.js and remembers the expiration time.
Before 5pm on September 26, 2018, when the browser requests a.js again, it does not request the server again and directly uses the last cached A.js file. (0 KB)
At 5:01 on September 26, 2018, the browser requests a.js, but finds that the cache time of A.JS has expired, so it no longer uses the local cache, but requests the server, and the server reads the disk file A.js again, returns it to the browser, and tells the browser a new expiration time. (1 + 10 + 1 = 12 KB).
And so on…

This approach is a big improvement over the previous approach:

Within the expiration time, the user saves a lot of traffic.
Reduces the pressure on the server to repeatedly read disk files.
After the cache expires, the latest A.js file can be obtained.

The disadvantages are:

After the cache expires, the server reads the a.js file again, regardless of whether the A.js has changed, and returns it to the browser.

The server tells the browser when the resource was last modified.

To solve the problem of the previous solution, the server and the browser negotiated a solution in which the server also tells the browser the Last Modified time (GMT) of A.JS on the server every time the server returns a.JS.

The browser accesses the A.js file. (1 KB)
When the server returns a.js, it tells the browser about the A.js file. (10+1=11KB) The Last time the server was Modified (GMT) and the cache Expires (GMT)
When A.js expires, the browser requests the server with if-modified-since (equivalent to the Last Modified request). (1 KB)
The server compares the last-modified time in the request header to the Last Modified time of A.Js on the server:
- If so, it tells the browser that you can continue with the local cache (304). At this point, the server no longer returns the a.js file. (1 KB)
- If not, the server reads the a.js file on disk and sends it back to the browser, telling the browser the Last Modified time and Expires time of a.js. (1 + 10 = 11 KB)
- And so on.

This scheme is further optimized than the previous one:

After the cache expires, the server detects that the file does not change and no longer sends A. JS to the browser, saving 10KB of traffic.
After the cache expires, the server detects file changes and sends the latest A.JS to the browser. The browser can obtain the latest A.JS.

Disadvantages:

The Expires control is unstable because the browser can change the time at will, leading to inaccurate use of the cache.
The last-modified expiration time is accurate only to seconds.

There are two problems with accuracy to second:

1. If a.js changes frequently within one second and the server sets no cache for A.JS, the browser will request the server every time it accesses A.JS. At this time, the server compares the last modification time sent to the browser with the last modification time of A.JS and finds that both are at the same time (because it is accurate to the second). So a message (304) is returned to the browser to continue using the local cache, but in fact the a.js on the server has changed several times. So in this case, the browser cannot get the latest a.js file.
If a.js has been Modified on the server, but the actual contents of a.JS have not changed at all, the browser will be returned with a last-modified time mismatch.

Continue to improve by adding relative time control and introducing cache-Contorl

In order to be compatible with browsers that already implement the above scheme and add a new caching scheme, the server tells the browser not only Expires but also a relative cache-control time: max-age=10 seconds. This means that within 10 seconds, the a.js resource cached to the browser is used.

The browser checks cache-control first. If cache-control is present, cache-control takes precedence and Expires is ignored. If there is no cache-control, Expires takes precedence.

Continue to improve, add file content comparison, introduce Etag

In order to solve the problem that the file modification time can only be accurate to seconds, we introduce an Etag response header to the server. The Etag changes only when the contents of A.js change. The content and Etag remain unchanged. The Etag can be interpreted as the unique ID of the file content. At the same time, the corresponding request header if-None-match is introduced. Each time the browser requests the server, it carries the if-None-match field. The value of this field is the Etag returned to the browser by the server when the last request for A. js is made.

The browser requests A.JS.
The server returns a.js, telling the browser the absolute expiration time (Expires) and relative time (cache-control: max-age=10), the last-modified time of a.js, and the Etag of A.js.
Within 10 seconds, the browser requests a.js again. Instead of requesting the server, the browser directly uses the local cache.
At 11 seconds, the browser requests a.js again, requesting the server, with the last Modified time if-modified-since and the last Etag value if-none-match.
If the server receives if-modified-since and Etag from the browser and finds if-none-match, the server compares if-none-match with the Etag value of A.js and ignores the if-modified-since comparison.
If the contents of the a.js file are not changed, then Etag and if-none-match are consistent, and the server tells the browser to continue using the local cache (304).
And so on.

Is it over?

Is this the end of it? Yes, this is the HTTP caching mechanism, but there is still a problem:

The browser cannot proactively know that the A.js resource on the server has changed.

Either Expires or cache-control can only Control whether the Cache Expires, but the browser has no way of knowing if the resource on the server has changed until the Cache Expires. Only when the cache expires will the browser send a request to the server.

Final plan

You can imagine that we use the scene of A.js, we are generally input url, access an HTML file, HTML file will introduce JS, CSS, images and other resources.

So, let’s do something with the HTML.

We do not cache HTML files and request the server every time we access HTML. So the browser gets the latest HTML resources every time.

When the a.js content is updated, we change the version number of A.js in the HTML.

The first time you visit HTML

<script src="http://test.com/a.js?version=0.0.1"></script>
Copy the code

The browser downloads the a.js file of version 0.0.1.
The browser accesses the HTML again and finds that the a.js file is still version 0.0.1, so the local cache is used.
One day the a.js changes, and our HTML file changes as follows:

<script src="http://test.com/a.js?version=0.0.2"></script>
Copy the code

The browser visits the HTML again and finds [test.com/a.js?versio… a.JS.
And so on.

So, by setting the HTML to not cache and changing the path of the resource when the CONTENT of the HTML reference changes, you can solve the problem of not being able to update the resource in a timely manner.

In addition to the version number, it can also be distinguished by the MD5hash value. Such as

<script src="http://test.com/a. [hash value].js"></script>
Copy the code

When packaged with WebPack, it can be easily handled with plugins.

Other than that

In addition to setting the relative expiration time of max-age, cache-control can be set to the following values:

Public, resources are allowed to be cached by intermediate servers.

When the browser requests the server, if the cache time is not up, the intermediate server returns the content to the browser without having to request the source server.

Private, resources are not allowed to be cached by intermediate proxy servers.

When the browser requests the server, the intermediate server passes the browser request to the server.

No-cache: the browser does not check the cache.

Every time a resource is accessed, the browser asks the server, and if the file does not change, the server simply tells the browser to continue using the cache (304).

No-store, neither the browser nor the intermediate proxy server can cache resources.

Every time a resource is accessed, the browser must request the server, and the server returns the complete resource without checking to see if the file has changed.

Must -revalidate, which can be cached, but must be confirmed with the source server before use.
Proxy-revalidate requires the cache server to confirm the cached resource to the source server.
S-maxage: Indicates the maximum time for the cache server to cache resources.

Cache-control provides more granular Control over the Cache, including Cache Control by the Cache proxy server.

The article is introduced here, if interested, can begin to practice.