introduce

Front-end Cache, HTTP Cache, memory Cache, strong Cache, negotiated Cache, cache-control, ETag, etc. What are these? What are the relationships and differences? A face meng force.

Let’s start with the browser opening a web page.

When the browser requests a resource, it reads it in the following order:

  1. Service Worker
  2. Memory Cache
  3. Disk Cache
  4. Network request

1. Service Worker

A Service Worker is a separate thread that runs behind the browser and is typically used for caching. To use the Service Worker, the transport protocol must be HTTPS. Service workers are not widely used at present, so we will not study them in detail for the moment. Reference Service Worker API – Web API interface reference | MDN (mozilla.org).

2. Memory Cache

A Memory Cache is a Cache in Memory. Almost all network requests are automatically added to the Memory Cache by the browser, but the Memory Cache is limited in size and lasts for a short time, and is released as the process is released. Once we close the Tab page, the cache in memory is freed.

The memory cache is not affected by max-age, no-cache, etc. Even if we do not set the cache, if the current memory space is sufficient, some resources will be cached. To disable memory caching, set no-store.

3. Disk Cache

A Disk Cache is a Cache stored on a Disk. It is slower than a Memory Cache but has a large capacity and is persistent. And it allows the same resource to be used across sessions, or even across sites, such as two sites using the same image.

The disk cache determines, based on the fields in the HTTP Herder, which resources need to be cached, which resources can be used without being requested, and which resources have expired and need to be re-requested. Most of the Cache comes from Disk Cache.

When is Disk Cache preferred?

  1. Large files.
  2. The memory usage is too high, and files are stored on disks first. Procedure

4. Network request

When none of the above caches are hit, a network request is made to obtain resources.

After obtaining the resource, add it to the cache based on the cache policy to speed up the next request.

The Cache used in front-end development is the Disk Cache, also known as the HTTP Cache.

Let’s take a closer look at Disk Cache.

Disk Cache (HTTP Cache)

There are two types of HTTP caching: strong caching and negotiated caching. This is done by setting the HTTP Header.

Strong cache

Resources are read directly from the cache without sending a request to the server. In the Network option of the Chrome console, you can see that the request returns a status code of 200 and Size displays from Disk cache or from Memory cache. Strong caching can be implemented by setting two HTTP headers: Expires and cache-Control.

1. Expires

Cache expiration time, used to specify the expiration time of resources, is a specific point in time on the server. If the response time is later than this time, the cache is considered invalid and resources need to be acquired again. Example: Expires: Sun, 08 Nov 2021 03:37:26 GMT

2. Cache-Control

The browser caches the most important Settings, which override other parameter Settings. Refer to the cache-control HTTP – | MDN (mozilla.org).

Expires VS Cache-Control

Expires is a product of HTTP1.0 and cache-Control is a product of HTTP1.1. Cache-control takes precedence over Expires when both exist. In some environments where HTTP1.1 is not supported, Expires can be useful. So Expires is an outmoded object that currently exists as a way to write compatibility.

Strong caches Determine whether or not the cache is cached based on whether or not the server side file has been updated after a certain time or period, which may result in the loading file is not the latest content on the server side, so how do we know whether the server side content has been updated? Here we need to use a negotiated cache strategy.

Negotiate the cache

Negotiation cache is a process in which the browser sends a request to the server with the cache id after the cache is invalid, and the server decides whether to use the cache based on the cache ID.

There are mainly the following two situations:

  1. The negotiated cache takes effect, returning 304 and Not Modified.

  1. Negotiation cache invalid, return 200 and request result.

Negotiated caching can be implemented by setting two types of HTTP headers:

  1. Last-Modified
  2. ETag

1. The last-modified and If – Modified – Since

When the browser accesses the resource for the first time and the server returns the resource, the last-Modified header is added to the response header. The value is the Last modification time of the resource on the server. The browser caches the file and header after receiving the resource.

The next time the browser requests the resource, the browser detects a last-Modified header and adds if-modified-since, which is the last-modified value; When the server receives the resource request again, it compares the value in if-modified-since with the last modification time of the resource in the server. If there is no change, it returns 304 and an empty response body, which is read directly from the cache. If the time of if-modified-since is less than the time of the last modification of the resource on the server, the file has been updated, and the new resource file and 200 are returned.

But last-Modified has some drawbacks:

  • If the cache file is opened locally, last-Modified is Modified even if the file is not Modified. The server cannot match the cache and sends the same resource.
  • Because last-Modified can only be measured in seconds, if the file is Modified in an imperceptible amount of time, the server will assume that the resource is still a hit and will not return the correct resource.

Since the cache is not sufficient based on the file modification time, can the cache policy be determined directly based on the file content modification? ETag and if-none-match were introduced in HTTP / 1.1.

2. The ETag and If – None – Match

An Etag is a unique identifier (generated by the server) that is returned to the current resource file when the server responds to a request. The Etag is regenerated whenever the resource changes.

When the browser sends a request to the server next time it loads a resource, it will add the Etag value returned last time to if-none-match in the request header. The server only needs to compare the if-none-match value sent by the client with the Etag of the resource on the server. It is a good idea to determine whether the resource has been modified relative to the client. If the server finds that the ETag does not match, it sends the new resource (including the new ETag) to the client in a regular GET 200 packet return. If the ETag is consistent, 304 is returned to inform the client to use the local cache directly.

Last-Modified VS ETag

  • Etag is superior to Last-Modified in accuracy.

    Last-modified time is in seconds. If a file changes several times within a second, their last-Modified time is not actually Modified, but Etag changes each time to ensure accuracy. If the server is load-balanced, the last-Modified generated by each server may also be inconsistent.

  • In terms of performance, Etag is inferior to Last-Modified, because last-Modified only records time, whereas Etag requires the server to compute a hash value through an algorithm.

  • In terms of priority, server verification takes precedence over Etag.

The mandatory-cache takes precedence over the negotiated Cache. If mandatory-cache (Expires and cache-control) is valid, the Cache is used directly, and If not, the negotiated Cache (last-modified/if-modified-since and Etag/if-none-match) is used. The server decides whether to use the negotiated cache. If the negotiated cache is invalid, the cache of the request is invalid. 200 is returned, and the resource and cache id are returned again, and then stored in the browser cache. If it takes effect, return to 304 and continue to use the cache.

What will the browser do if no caching policy is set?

In this case, the browser uses a heuristic algorithm that typically takes 10% of the Date minus last-Modified value in the response header as the cache time.

Caching best practices in real world scenarios

  1. Index. HTML is not cached and gets the latest version on every request.
  2. Strong caching of resource files such as JS, CSS and images (set to one month or one year). Use Webpack and other front-end building tools to add hash processing to resource file names.

Why not cache index.html strongly?

Since most systems are single-page applications now, index. HTML is the entrance of all resources. If index. HTML is strongly cached, the latest resource files cannot be obtained, resulting in the failure of the whole system to update normally.

Why can other resource files be strong cached?

Because other resources are packaged by the front-end build tool, the generated file names have special hash values. Each rebuild generates a new batch of resource file names with hash values. For example, index.de62f314.js.

When a new front-end project file is deployed to the server, the browser requests index.html and finds that the name of the relevant resource file has changed.

How to set up cache

Set meta information in HTML

<meta http-equiv="Cache-Control" content="max-age=31536000" />
Copy the code

Set the response header on the server

res.setHeader('Cache-Control', 'public, max-age=31536000')
Copy the code

How to disable caching

Set meta information in HTML

<meta http-equiv="pragma" content="no-cache"> 
<meta http-equiv="cache-control" content="no-cache"> 
<meta http-equiv="expires" content="0">
Copy the code

The impact of user behavior on browser caching

The effect of user behavior on the browser cache refers to the cache policies that are triggered when the user acts on the browser. There are three main types:

  • Open the web page and enter the address in the address bar to check whether there is a match in the disk cache. Use if available; If no network request is sent.

  • Plain flush (F5) : Since TAB is not closed, memory cache is available and will be used preferentially (if a match is made). Disk cache comes next.

  • Forced refresh (Ctrl + F5) : The browser does not use caching, so requests are sent with cache-Control :no-cache(with Pragma:no-cache for compatibility) in the header, and the server returns 200 and the latest content.