preface

In this article, you will learn how HTTP caching works, whether there is a cache in no-cache, the difference between the address bar enter, F5, CTRL +F5, and currently recommended caching schemes.

This article borrowed some excellent article ideas, here when a collation to share.

So let’s start

HTTP cache request response headers

1.Cache-Control

The request/response header, the cache control field, is arguably the highest directive governing HTTP caching, including whether to cache it or not. It has the following common values

1.1 No-store: No content is cached

1.2 no-cache: cache, but before using the cache, the browser will ask the server to determine whether the cache resource is up to date. It is a relatively noble existence, because it only uses the cache that does not expire.

1.3 max-age=x(in seconds) The number of seconds after the request is cached for which it does not initiate a request. This is an HTTP1.1 attribute, similar to the following http1.0 attribute, but with a higher priority than Expires.

1.4 s-maxage=x(units of seconds) X seconds after the proxy server requests the source cache, it will no longer initiate the request, only for the CDN cache (more on this later)

1.5 Both the public client and proxy server (CDN) can be cached

1.6 Private Only clients can cache

2.Expires

The response header, representing the resource expiration time, provided by the server’s return, the GMT format date, is an attribute of HTTP1.0 and has a lower priority in coexistence with max-age(http1.1).

3.Last-Modify

The response header, when the resource was last modified, is told by the server to the browser.

4.If-Modified-Since

The request header, the Last time the resource was Modified, is told by the browser to the server, and is compared with last-Modified.

5.Etag

The response header, the resource identifier, is told to the browser by the server.

6.If-None-Match

The request header, the cache resource identifier, is told to the server by the browser (actually the Etag given by the server last time), and it is compared with the Etag.

Why use HTTP caching

Suppose we request the server once, request header size 1KB, response header size 1KB, request file 10KB.

Traffic of one request: 12kb

Traffic of 10 requests: 120kb

N Requests: 12 x N….

This is just a hypothetical request, but the actual request is not only for files, but also for clients, so the problem is obvious:

1. The client requests the server every time, wasting traffic (e.g. mobile?). .

2. The server has to provide search, download and request every time. If the user base is large, the server will be under great pressure.

3. The client performs page rendering after each request, resulting in poor user experience.

Can we store the requested file for use, such as HTTP caching?

Use HTTP caching

1. Have the server and the browser agree on a file expiration date (IN the GMT format). Frequent request dialogue

First request

Browser: server server, I need an A.js file now, help me find it and give it to me.

Server: Find me again and again, if you don’t like it, give me the file. We can agree on a time (Expires), don’t bother me until the time is up, and return A.js and Expires.

Follow-up request…..

The browser first compares whether the current Expires date is already greater than Expires, which is to determine if the file has exceeded the agreed expiration date.

When time runs out, do not initiate the request and use the local cache directly.

When the time expires, initiate the request and continue the above browser-server conversation daily.

Question: Suppose Expires and the browser requests the server again, but A.js hasn’t changed anything since last time, is there any way to avoid this request?

2. Ask the server and browser to add last-modified and if-modified-since to the file expiration date

Daily Request dialogue

First request

Browser: server server, I now need an A.JS, you found to me, incidentally give me an expiration time, time did not arrive I guarantee not to bother you!

Server: Guwazi, ok, I will give you the expiration time, and then I will give you the latest modification time of the file last-modified, when the file Expires, we two check the modification time of the file, if you are right, don’t bother me, return A.js +Expires+ last-Modified.

Follow-up request….

Expires is not expired, and browsers cleverly use local caches to avoid being beaten up.

Expires. The server takes the Last modification date of the file, if-modified-since. The server compares if-modified-since to last-modified.

If-modified-since is not equal to last-Modified. The server looks for the latest A.js and returns Expires again with the new last-Modified

If-modified-since is the same as last-modified, the server returns the status code 304, the file has not been Modified, and you still use your local cache.

The request header and the response header file take the same time to modify, so 304 is returned, using local cache:

Problem: The browser can change Expires at will, Expires is unstable, and Last-Modified is only accurate to the second, assuming that the file was changed within 1s, and last-Modified doesn’t sense the change, in which case the browser never gets the latest file (imagine the extreme case).

3. Make the server and browser add Etag and if-none-match in addition to Expires+ last-Modified. Oh, and Expires is not stable, so here we add a max-age (one of the cache-control values) to replace it.

Daily conversations

First request

Browser: server server, you know ~~~~~~

Server: I don’t understand! A. S I give you, I also give you the expiration time, and GIVE you a max-age=60(unit of seconds), last-Modified you also give me close, and add a file content unique identifier Etag.

Follow-up request….

For 60 seconds, do not initiate a request and use the local cache directly. (Max-age =60 means that a request will not be initiated within 60 seconds after the request is successfully cached, which is similar to Expires, but max-age has a higher priority than Expires. The differences will be explained later.)

60 seconds later, the browser uses if-modified-since and if-none-match to initiate the request. The server compares if-none-match with Etag. The server does not compare if-modified-since with last-modified.

If -none-match is not equal to Etag, the a.js content has been Modified, and the server returns the latest A.js and new Etag and max-age=60 and last-Modified and Expires

If -none-match is equal to Etag, the contents of the a.js file have not changed, and 304 is returned, telling the browser to continue using the previous local cache.

As shown below, the server Etag is the same as if-none-match, so it returns the status code 304. Although there is also if-modified-since and last-Modified, there is no comparison here.

Problem: We can accurately compare server files with local cached files, but there is a major flaw in the evolution of the above solutions. Max-age or Expires do not expire, and the browser cannot actively detect changes in server files.

4. HTTP cache scheme

1. The md5 hash/cache

By adding MD5 or hash ids to static files without caching HTML, the browser can proactively detect file changes without skipping the cache expiration time.

Why do you do that? What is the implementation principle?

The HTTP cache scheme we talked about earlier, the comparison of file modification time between the server and the browser, and the comparison of file content identification, are based on the premise that the two file paths are exactly the same.

Module /js/a-hash1.js and module/js/a-hash2.js are two completely different files. The file reference becomes module/js/a-hash2.js, and the browser will simply request a-hash2.js again, because these are two completely different files. There is no comparison between HTTP cache files. We can fundamentally solve the problem of the browser not being able to request the server before the expiration time. So we just need to add a different MD5 or Hash identifier to the modified static file each release iteration of the project.

Note that it is not recommended to cache the HTML file (or better yet, leave a comment), so that every time the HTML is loaded and rendered it will be aware that the file has changed.

How to change? Do it one by one manually? That’s dead work. Webpack provides the Webpack-md5-Hash plugin to help developers automatically change file identifiers when a project is published.

Since our company uses FIS3 packaging tools, here we use FIS3 build – File fingerprint (search file fingerprint), the principle is similar.

2.CDN cache (for understanding)

In the introduction of cache-control related values at the beginning of the article, the concept of S-Maxage proxy server was mentioned. When I was sorting out the relevant knowledge of HTTP cache, I learned from my classmates that THE HTTP cache scheme is also recommended — CDN cache. Here is an extension. MD5 caching is recommended for normal caching.

2.1 What is CDN

Understand the CDN cache, first need to know what is the CDN, CDN is built on the network content distribution network, rely on the edge server deployed in all over the world, and through the platform at the centre of the load balance, content distribution and scheduling function modules, such as required for the user to get to the nearest content, reduce the network congestion, improve the response speed and user access. The key technology of CDN mainly includes content storage and distribution technology (relatively official description).

I saw a good example before, and I’m going to take CDN as an example.

Assuming that years ago we only one city railway station, every Spring Festival, the whole city have to go to the railway station to buy tickets, traffic and the demand of tickets you can imagine how, in order to alleviate this problem, different areas of cities, a train ticket outlets in each zone can be approached to buy tickets, railway station station pressure so greatly reduced.

We can call each district’s ticket office a CDN node, or proxy server. In short, we can think of the CDN as a temporary site between the browser and the server, which will handle a portion of the browser requests on behalf of the server, thereby reducing the pressure on the master server.

We can summarize the value of CDN as follows:

1.CDN greatly reduces the access pressure of the source station through the form of shunt.

2. It is just like living in a remote area. Every time you buy a ticket, you have to go to the center of the city. CDN also solves the problem of cross-regional access and fundamentally provides accelerated access.

2.2 What is CDN Cache

CDN edge nodes cache data, and when the browser requests, CDN will judge and process the request instead of the source site.

Daily Request dialogue

First request

Browser: Server brother, I need A.JS.

Server :(become angry from embarrassment) file I gave my younger brother CDN, later you want this to seek CDN, don’t seek me. Success returns a.js to the CDN, which is cached, and the CDN is returned to the browser, which also caches (cache-control public is used for this).

Subsequent requests…

Browser: server, my cache time is up, please compare the files and see if you want to return them to me.

CDN node: Stop, stop, call what, my eldest brother is busy, file to me to see, the request was acting.

Case 1: The CDN node’s own cache file is not expired, so it returns 304 to the browser, calling back the request.

Case 2: THE CDN node finds that the file cached by itself is expired. To be on the safe side, it sends a request to the server (source site), successfully retrits the latest data, and then delivers it to the browser.

In fact, CDN cache has the same problem as HTTP cache, CDN cache time does not expire, the browser is always blocked, can not get the latest file.

But let’s go back to the nature of the HTTP cache problem. The cache itself is targeted at static files that are not updated frequently. Secondly, CDN cache provides other advantages of shunting and access acceleration. I have asked students here, and the information is that CDN is similar to a platform, which can be manually updated by logging in to CDN cache, thereby solving the problem that browser cache cannot be manually controlled.

So that’s the end of the two HTTP caching schemes, and let’s talk about some of the other issues and concepts of HTTP caching.

Five, browser address bar enter, a new window, F5 refresh, CTRL+F5 refresh and other browser operations on the IMPACT of HTTP cache

When using Fiddler to capture packets and look at the cache of different websites, I found an interesting problem. Most websites’ cache-control was set to no-cache.

As I said earlier, no-cache is not a no-cache, but a cache. However, in the case of negotiation caching, browsers will unconditionally request the server to determine whether their cache is up to date. If so, they will continue to use it, and if not, they will request the latest file.

Is it necessary to set Expires and max-age? There are!

When we browse a page for the first time and close it, opening it again for the second time is still a new window behavior. If the cache time is set, the new window will strengthen the cache, which can avoid repeated file download, speed up page rendering and improve user experience.

(I didn’t think Expires and Max-age needed to be set until I used pageSpeed to do a performance score on my company’s website, and it still recommended caching, which makes sense.)

It doesn’t matter if you don’t understand the above words, here we combine the concepts of Baidu Baike and other blogs to make a summary of the impact of different browser behavior on cache.

1. When the browser hits a hit in the address bar, or hits the jump button, or moves forward or backward, or opens a new window, Expires or max-age is set to work. In other words, the browser determines when the request has expired and then decides whether to make a request.

2.F5 refreshes the browser, or uses the refresh button in the browser’s navigation bar, to override Expires, Max-age and force a request. Last-modified and Etag also work in this case.

3.CTRL+F5 is a mandatory request, all cached files are not used, all re-requests to download, so Expires, Max-age, last-Modified and Etag all expire.

But in fact, we rarely use the address bar enter, address bar jump, so to trigger the judgment of cache time, also need a specific operation, stand in my understanding, comprehensive consideration, just have so many sites cache-control set to no-cache, that is, before using the cache to determine whether the file is the latest, more reasonable.

6. Strong cache vs. negotiated cache (weak cache)

Understanding strong and negotiated caching is easy now that you know how different browser behaviors affect HTTP caching differently.

Strong caching: Use local caching without making HTTP requests, such as a return to the browser address bar or the refresh button of the browser. Strong caching is triggered when Expires or max-age is enabled.

Negotiated cache (weak cache) : Before using the local cache, negotiate with the server to check whether the cache file is up to date. For example, if cache-control=no-cache is set, no matter what you do, the request will be issued. This type of cache is called negotiation cache.

7. Difference between Max-age and Expires

When I used Fiddler to catch packets, I found that many websites set max-age and Expires at the same time. Why set both?

1. Max-age is an http1.1 attribute. Expires is an HTTP1.0 attribute. However, in a 1.1 environment, Max-age takes precedence over Expires.

2. Max-age is a relative expiration time, and Expires is an absolute expiration time. After the browser successfully caches the file, max-age only needs to compare the length of time after the request is successful to stop making a request. However, Expires always requires the server to return an accurate DATE in GMT format, and it is relatively troublesome to determine whether the cache Expires based on this date. That’s why something like Max-Age exists to take its place.

Similarly, no-cache and Pragma also exist, one of which is 1.1 and one of which is 1.0, backwards compatible, both of which are written at the same time.

Well, that’s about it, and with that in mind, browsers and servers can live happily ever after.