To make it easier to remember, here’s a summary:

The fields in the Header are used to control the caching mechanism, with strong caching taking precedence over negotiated caching

Strong caching (Expires and cache-control) : Old Expires, the absolute value of a record; Cache-control, a new method with a bunch of options, records the time max-age is relative value, fetch Cache, check whether Cache expires, not expired Cache, first from memory, second from hard disk

If it expires, the negotiation cache (last-modified/if-Modified-since and Etag/if-none-match) is performed. The server decides whether to use the negotiation cache. If the negotiation cache is invalid, the server obtains the request result again and stores it in the browser cache. If it takes effect, 304 is returned and cache is taken

1. An overview of the

The browser cache mechanism is also known as the HTTP cache mechanism. Its mechanism is based on the cache identifier of HTTP packets. Therefore, before analyzing the browser cache mechanism, we first introduce HTTP packets with pictures and pictures.

The format of a Request packet is as follows: Request line – HTTP header (general information header, Request header, and entity header) – Request packet body (Only POST has a packet body) :

HTTP Response packet. The packet format is as follows: Status line – HTTP header (general information header, Response header, entity header) – Response packet body, as shown in the following figure:

Note: The general information header refers to the header fields supported by both request and response packets, which are cache-Control, Connection, Date, Pragma, Transfer-Encoding, Upgrade and Via respectively. Entity header is the entity header field of entity information. Allow, content-base, content-Encoding, content-language, content-Length, content-location, content-MD5, content-range, and Conten respectively T-type, Etag, Expires, Last-Modified, extension-header. For ease of understanding, generic message headers, response/request headers, and entity headers are all classified as HTTP headers.

The browser cache, if mishandled, can result in server-side code updates, but the user remains on the same page. Therefore, we should make a reasonable cache strategy according to the actual situation of each resource in the project.

Advantages of caching:

  • Reduce redundant data transmission and save network costs
  • Reduce server load and improve site performance
  • Accelerated client loading web page speed

2. Cache process analysis

The browser communicates with the server in reply mode, that is, the browser initiates an HTTP request and the server responds to the request. After the browser sends the request to the server for the first time and gets the request result, it decides whether to cache the result or not according to the cache identifier of the HTTP header in the response packet. If yes, the request result and cache identifier are stored in the browser cache. The simple process is as follows:

First HTTP request:From the figure above, we can know:

  • Each time the browser sends a request, it first looks up the result of the modified request and the cache identifier in the browser cache
  • Each time the browser receives the result of the returned request, it stores the result and the cache id into the browser cache

These two conclusions are the key to the browser cache mechanism, which ensures that every request is cached and read. Once we understand the browser cache rules, all problems are solved. To help you understand, we have divided the caching process into two parts, namely mandatory caching and negotiated caching, depending on whether the HTTP request needs to be re-initiated to the server.

3. Cache rules

Strong and negotiated caching

Strong cache

Forced caching refers to the process of searching the browser cache for the request result and deciding whether to use the cached result based on the cache rule of the result. There are three main cases of forced caching (the negotiation cache process is not analyzed for the moment), as follows:

  1. If the cache result and cache id do not exist and the cache is forced to be invalid, the request is directly sent to the server (the same as the first request), as shown below:

  1. The cache result and cache identifier exist, but the result is invalid and the cache is forced to become invalid, then the negotiated cache is used (not analyzed for now), as shown below:

  1. If the cache result and cache id exist and the result is not invalid, the cache is forced to take effect and the result is directly returned, as shown in the following figure:

So what are the caching rules for strong caching?

When the browser sends a request to the server, the server returns the Cache rule in the HTTP header of the HTTP response packet together with the request result to the browser. The fields that Control the mandatory Cache are Expires and cache-Control, respectively. Cache-control has a higher priority than Expires.

Expires

Expires is a field that HTTP/1.0 controls the web cache. Its value is the expiration time that the server returns to the cache of the result of the request. That is, when the client initiates the request again, if the time is less than the Expires value, the cached result is used directly.

Expires is an HTTP/1.0 field, but browsers now use HTTP/1.1 by default. Is web caching still controlled by Expires in HTTP/1.1?

With HTTP/1.1, Expire was replaced by cache-control. The reason is that Expires caching works by comparing the client time with the time returned by the server. Client and the server one time is not accurate) error occurs, then forced the Cache will be failure directly, so the existence of forced Cache is meaningless, and then the cache-control is how to Control?

Cache-Control

In HTTP/1.1, cache-control is the most important rule, which is mainly used to Control web caching. Its main values are:

  • Public: All content will be cached (both client and proxy can be cached)
  • No-cache: the contents of the client cache, but whether to use the cache is verified by the negotiated cache (important)
  • No-store: All content is not cached, that is, neither mandatory cache nor negotiated cache is used
  • Max-age = XXX (XXX is numeric) : Cache contents will expire after XXX seconds (critical)

Note: Rules can be multiple at the same time

Max-age =0, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache, no cache.

Let’s go straight to an example, as follows:

From the above example we can know:

  • The expires time value in an HTTP response packet is an absolute value
  • The cache-control value in the HTTP response packet is max-age=600, which is relative to the HTTP response packet

Because cache-Control has a priority over Expires, cache-Control values are cached directly, meaning that if the request is made again within 600 seconds, the cached result will be used to enforce the Cache

Note: Cache-control is a better option than Expires in cases where you can’t determine whether the client’s time is synchronized with the server’s, so only cache-Control works when both exist.

Now that we understand the process of forced caching, let’s think about it more broadly:

Where does the browser cache reside, and how do I determine if the mandatory cache is in effect in the browser?

In the figure above, the request whose status code is gray indicates that the mandatory cache is used. The Size value of the request indicates the location of the cache, namely, from Memory cache and from Disk cache.

What does from memory cache and from Disk cache stand for? When is a FROM Disk cache used and when is a FROM Memory cache used?

From memory cache indicates that the cache in memory is used. From disk cache indicates that the cache in hard disk is used. The browser reads the cache in the order of memory – > Disk.

Next, let’s analyze the cache read problem in detail. Here is an example:

Visit web page – > 200 – > Close TAB – > Reopen web page – > 200(from Disk cache) – > Refresh – > 200(from Memory cache)

The process is as follows:

  • Access to web pages

  • Close TAB
  • Reopen the page

  • The refresh

In the last step of the refresh process, there are both from disk cache and from memory cache.

To solve this problem, we need to understand from memory cache and from disk cache, as follows:

  • From memory cache: The memory cache has two characteristics, namely fast access and timeliness:
  • Fast read: The memory cache directly stores the compiled and parsed files into the memory of the process, occupying certain memory resources of the process, and facilitating the fast read of the next run.
  • Timeliness: Once the process is shut down, its memory is emptied.
  • From disk cache: The cache is directly written to the disk file. To read the cache, I/O operations are performed on the disk file stored in the cache and the cache content is parsed again. The reading of the cache is complex and slower than that of the memory cache.

In the browser, js and image files are directly stored in the memory cache after being parsed. Therefore, when refreshing the page, you only need to read from the memory cache directly. CSS files are stored in disk files, so every rendering page needs to be read from disk cache.

Negotiate the cache

Negotiation cache is a process in which the browser sends a request to the server with the cache ID after the cache is invalid, and the server decides whether to use the cache based on the cache ID. There are two main situations:

The negotiation cache takes effect and 304 is returned as follows:

Negotiation cache invalid, return 200 and request result as follows:

Similarly, the identity of the negotiation cache is returned to the browser in the HTTP header of the response packet together with the request result. The fields controlling the negotiation cache are as follows: Last-modified/if-modified-since and Etag/if-none-match, where Etag/if-none-match has a higher priority than last-modified/if-modified-since.

Last-Modified / If-Modified-Since

Last-modified returns the time when the resource file was Last Modified on the server when the server responds to the request, as follows:

If-modified-since indicates that when the client initiates the request again, it carries the last-Modified value returned from the Last request and tells the server the Last Modified time of the resource returned from the Last request. After receiving the request, the server finds that the request header contains the if-Modified-since field. The server compares the value of the if-Modified-since field with the last modification time of the resource on the server. If the last modification time of the resource on the server is greater than the value of the if-Modified-since field, the server compares the value of the if-Modified-since field with the last modification time of the resource on the server. The resource is returned with the status code of 200. Otherwise, 304 is returned, indicating that the resource is not updated. You can continue to use the cache file as follows:

Etag / If-None-Match

Etag is a unique identifier (generated by the server) for the current resource file returned by the server in response to a request, as follows:

If-none-match indicates that when the client initiates the request again, it carries the unique Etag value returned by the last request. The value of this field tells the server the unique Etag value returned by the last request. After receiving the request, the server finds that the request header contains if-none-match. The server compares the if-none-match field with the Etag value of the resource on the server. If the value is consistent, 304 is returned, indicating that the resource is not updated. If no, return the resource file with the status code 200, as shown in the following figure.

Note: Etag/if-none-match has a higher priority than last-modified/if-modified-since. If both Etag/if-none-match exist, only Etag/if-none-match takes effect.

4. Other

  • How to configure resource caching rules

It can be configured on the backend server or in Nginx

Add the following code to the Java code

  • Why Etag

You might think that using Last-Modified is enough to let the browser know if the local cached copy is new enough, so why Etag? The introduction of Etag in HTTP1.1 (that is, Etag was added to address the previous if-Modified defect) was primarily intended to address several difficult last-Modified problems:

Some files may change periodically, but their contents do not change (just change the modification time). At this point we do not want the client to think that the file has been modified and GET again;

Some files are Modified very frequently, such as If they are Modified in seconds or less (say N times in 1s), and if-modified-since the granularity that can be checked is s-level. Such changes cannot be determined (or the UNIX record MTIME is only accurate to seconds).

Some servers do not know exactly when a file was last modified.

  • The difference between strong cache and negotiated cache can be expressed in the following table:

  • The impact of user behavior on caching

That is, F5 skips the strong cache rule and uses the negotiation cache directly. Ctrl+F5, skip all cache rules and retrieve the resource as on the first request

  • Item Caching policy

For vue projects, the scaffolding has already hashed the changed files, so regular JS and CSS files don’t need to be manipulated. For index. HTML, we need to do no-store processing on Nginx. That is, we do not cache index. HTML at all. It’s still linked to the old CSS.

conclusion

The mandatory-cache takes precedence over the negotiated Cache. If mandatory-cache (Expires and cache-control) is valid, the Cache is used directly, and If not, the negotiated Cache (last-modified/if-modified-since and Etag/if-none-match) is used. The server decides whether to use the negotiation cache. If the negotiation cache is invalid, the request cache is invalid, and the request result is obtained again and stored in the browser cache. If it takes effect, 304 is returned and the cache continues to be used. The main process is as follows: