Reproduced at

Have an in-depth understanding of the browser caching mechanism

Why is HTTP Cache so confusing

View chestnuts:

Explain the application of Cache header cache-control in service based on HTTP protocol

Private browser caches and Shared proxy Caches

Caching is not easy to understand. The information I look up online is vague, such as Private Browser caches and Shared proxy Caches.

The service architecture of the browser Cache might look like this: the browser (Cache) <=> server. The proxy server Cache architecture may look like this: browser <=> CDN (Cache) <=> source server.Copy the code

Different HTTP caches solve different problems and are used in different scenarios. I understand that browser caching is primarily about avoiding unnecessary requests and heavy network traffic, while proxy caching is about bringing services closer to the user and more efficiently (and of course addressing requests and network traffic).

For Web developers, it’s probably common to encounter a browser cache, which is buffered when we visit a site, and the next time we visit it, we might see the page loaded in the buffer. This can greatly speed up access.

2. Correctly understand the cache-control directive

This instruction is a general header field, that is, this instruction can be used as a request and a response instruction, and this instruction also has multiple parameters, for example, its parameter max-age = 0 in the request and response instructions respectively represent what? We must distinguish clearly when we understand.

Understand the cache-control directive further

It has three meanings:

(1) Can cache (for response)

Private: indicates that it should only exist with the browser cache. Public: indicates that it can be cached in the browser or CDN. No-cache: This word is confusing, not because the cache cannot be used, but because it must be checked by the server before it can be used. No-store: indicates that the cache is not allowed.Copy the code

(2) How long to cache (for response)

Max-age = seconds to tell the browser how long the cache is valid.Copy the code

(3) Revalidation (for a response, a condition check)

Must-revalidate: indicates that the browser must check the server to see if the local cache is valid. This parameter is similar to the request parameter max-age = 0.Copy the code

This directive visually tells the browser whether or not you can cache the object, how long it should be cached, and whether or not you should check it every time you use the cache.

One picture is very clear:

  • Is this resource cacheable?
  • Does the client need to check with the server every time it uses the cache?
  • Is the cache Public or Private?
  • What is the cache time?
  • What is the resource identifier (Etag)?

Compare Expires and Cache-Control

The difference is that Expires is a product of HTTP1.0 and cache-Control is a product of HTTP1.1. If both exist, cache-Control takes precedence over Expires. In some environments where HTTP1.1 is not supported, Expires can be useful. So Expires is an outmoded object that currently exists as a way to write compatibility.

5. Negotiation buffer

Negotiation cache is a process in which the browser sends a request to the server with the cache ID after the cache is invalid, and the server decides whether to use the cache based on the cache ID. There are two main situations:

  • The negotiated cache takes effect, returning 304 and Not Modified

  • Negotiation cache takes effect, returns 200 and request result

Negotiated caching can be implemented by setting two HTTP headers: Last-Modified and ETag.

1. The last-modified and If – Modified – Since

When the browser accesses the resource for the first time and the server returns the resource, the last-Modified header is added to the response header. The value is the Last modification time of the resource on the server. The browser caches the file and header after receiving the resource

Last-Modified: Fri, 22 Jul 2016 01:47:00 GMT
Copy the code

The next time the browser requests the resource, the browser detects a last-Modified header and adds if-modified-since, which is the last-modified value; When the server receives the resource request again, it compares the value in if-modified-since with the last modification time of the resource in the server. If there is no change, it returns 304 and an empty response body, reading directly from the cache. If the time of if-modified-since is less than the time of the last modification of the resource on the server, the file has been updated, and the new resource file and 200 are returned

But last-Modified has some drawbacks:

If the file is opened locally, last-Modified will be Modified even if the file is not Modified. The server will fail to hit the cache and send the same resource because last-Modified can only be measured in seconds. If the file is Modified in an imperceptible amount of time, The server will assume that the resource is still hit and will not return the correct resource

Since the cache is not sufficient based on the file modification time, can the cache policy be determined directly based on the file content modification? ETag and if-none-match were introduced in HTTP / 1.1

2. The ETag and If – None – Match

An Etag is a unique identifier (generated by the server) that is returned to the current resource file when the server responds to a request. The Etag is regenerated whenever the resource changes. When the browser sends a request to the server next time it loads a resource, it will add the Etag value returned last time to if-none-match in the request header. The server only needs to compare the if-none-match value sent by the client with the Etag of the resource on the server. It is a good idea to determine whether the resource has been modified relative to the client. If the server finds that the ETag does not match, it sends the new resource (including the new ETag) to the client in a regular GET 200 packet return. If the ETag is consistent, 304 is returned to inform the client to use the local cache directly.

3. Comparison between the two:

  • First, Etag is superior to Last-Modified in accuracy.

Last-modified time is in seconds. If a file changes several times within a second, their last-Modified time is not actually Modified, but Etag changes each time to ensure accuracy. If the server is load-balanced, the last-Modified generated by each server may also be inconsistent.

  • Second, in terms of performance, Etag is inferior to Last-Modified, because last-Modified only takes time to record, whereas Etag requires the server to compute a hash value through an algorithm.
  • Third, server verification takes Etag as the priority

6. Caching mechanism

There are generally two types of browser caching strategies: strong caching and negotiated caching. If the cache expires, we can use a negotiated cache to resolve the problem. Negotiation cache requires a request, and 304 is returned if the cache is valid.

7. Apply caching policies in actual scenarios

Frequently changing resources

Cache-Control: no-cache
Copy the code

For resources that change frequently, cache-control: no-cache is used to make the browser request the server each time, and then ETag or Last-Modified is used to verify that the resource is valid. This does not save the number of requests, but it can significantly reduce the size of the response data.

A resource that is not constantly changing

Cache-Control: max-age=31536000
Copy the code

Such resources are typically processed by setting their cache-control to a large max-age=31536000 (a year) so that subsequent browser requests for the same URL will hit the forced Cache.

In order to solve the update problem, dynamic characters such as hash and version number need to be added to the file name (or path), and then dynamic characters need to be changed to change the reference URL, so that the previous mandatory cache is invalidated (it is not immediately invalidated, but is no longer used). Libraries that are available online (jQUERy-3.3.1.min.js, lodash.min.js, etc.) use this pattern.

The impact of user behavior on browser cache

The effect of user behavior on the browser cache refers to the cache policies that are triggered when the user acts on the browser. There are three main types:

  • Open the web page and enter the address in the address bar to check whether there is a match in the disk cache. Use if available; If no network request is sent.

  • Plain flush (F5) : Since TAB is not closed, memory cache is available and will be used preferentially (if a match is made). Disk cache comes next.

  • Forced refresh (Ctrl + F5) : The browser does not use caching, so requests are sent with cache-Control: no-cache(with Pragma: no-cache for compatibility) in the header, and the server returns 200 and the latest content.