Caching is a complicated topic, but it’s one of the best ways to improve performance. There are many categories of caches, such as database caches, server caches (proxy server caches -Nginx, CDN caches), browser caches, etc.

Today is mainly about the browser cache, also known as THE HTTP cache, can be said to be the front end of the important knowledge, (actually I’m just a front end of the picture, you know), let’s start blowing…

preface

  • Browser caching starts on the second request. On the first request, the server returns the corresponding resource and adds a caching policy in the Response Headers.

  • There are two main types of browser caches: strong caches and negotiated caches, or what some people like to call comparison caches.

  • The browser cache is designed to reduce the burden on the server, improve the website performance, and speed up the loading of web pages on the client. Therefore, the browser cache belongs to the client cache. The cached resource files are generally stored in the memory or local disk.

Benefits of caching

  • Caching reduces redundant data transfers and saves you network costs.
  • Caching alleviates network bottlenecks and allows pages to load faster without requiring more bandwidth.
  • Caching reduces the load of requests on the server. Having a cache makes it easier to send fewer requests to the server, which is important especially for heavily visited websites.
  • Resources are read from the cache without having to send a request to the server and wait for a return, which speeds up client access because loading pages from a distant location is slower.

Er, view a lot of articles basic this advantage everyone is much the same, everyone looks on the casual kangkang on the line, anyway adhere to a principle is to let your site “fast” up.

Analyze the cache policy in the console

Open the console of a site, select Network, and you can see something like this. We focus on the Status, Size, and Time columns. We can mainly see the 200 Status code (some are gray), resource Size, from Disk cache, and Time.

Refresh the page, you can see the status code 304, from Memory cache, 0ms.

There are many HTTP status codes. 200 and 304 are two of the more common ones and are associated with caching. A value of 200 for a normal Size indicates that the request is processed properly and the new resource is returned. 304 indicates that the resource file has not changed and can be directly cached. These status codes are returned by the server.

In the above two figures, we notice that some of the 200s are grey, and the grey 200s indicate that the request is not sent to the server, but is read directly from the cache. Reading from the cache is divided into whether reading from memory cache or from disk cache.

Note that a 0ms request is in the “from memory” cache state, meaning that the request is read from memory. It doesn’t take any time. Can you hurry up?

state describe
Normal 200 Status code The latest resource is downloaded from the server. The value is the total resource size obtained from the server.
A status code of 304 Access server, resource not updated, use local resource. The value is the size of the communication packet with the server, not the size of the resource itself.
200-from memory cache The status code is gray. It reads resources that have been loaded before from memory without requesting the server. When the page is closed, the resources will be released by memory. Generally, scripts, fonts, and images are stored in memory. The request time is 0ms.
200-from disk cache The status code is gray. It reads resources from the disk that have been loaded before. The resources will not be released until the server is requested and the page is closed. Generally, non-scripts are stored in memory, such as CSS.

Caching workflow (Three-level principle)

First, look for memory, if there is memory, load from memory;

If not found in the memory, select the hard disk to obtain, if there is a hard disk, load from the hard disk;

If not found in the hard disk, then the network request, loaded to the resource cache to the hard disk and memory;

Caching strategies

There are two kinds of browser caches: strong caches and negotiated caches, or what some people call comparison caches.

Strong cache

The main feature of mandatory caching is that there is no need to send a request to the server. After the user first visits the page, the browser will cache the resource document. The cache location (memory/disk) is controlled by the browser, and the server will not be requested again within the expiration time. The expiration time is obtained from the server response header of the first request. Whether to use a strong Cache is controlled by the Expires, cache-control, and Pragma headers.

Simple instructions

  1. After the user visits the page for the first time, HTTP tells the original server to attach an “expiration date” to each resource document, just as all drinks have an expiration date stamped on their bottles.
  2. This is indicated by adding a Expires or cache-control field to the HTTP response header, more commonly referred to online as the “header.” The first section explains how long it takes to consider something new.
  3. The Expire header and the cache-control header do essentially the same thing — they both carry a date, except that the cache-control header uses a relative time instead of an absolute date, and cache-control has a higher priority than Expires.
  4. Cache-control main property values:
    • Max-age: the unit is second. The cached content will become invalid after t seconds.
    • No-cache: Indicates that the strong cache is not used and the negotiated cache is used to verify the cached data.
    • No-store: Disables the use of caches, including negotiated caches, to request the latest resources from the server each time.
    • Private: a cache dedicated to an individual, intermediate agents, CDN, etc cannot cache this response.
    • Public: The response can be cached by an intermediate proxy, CDN, etc.
  5. The Expires header is created in HTTP/1.0+, and the cache-control header in HTTP/1.1.
  6. Pragme has the highest priority among the three, and it has only one value, no-cache. The effect is the same as cache-control: no-cache, without strong cache.

Negotiate the cache

Negotiating the cache, literally, is negotiating, the browser and the server are negotiating, so the browser has to talk to the server every time. On the first request to the server, the server returns the resource, along with a cache id of the resource, which is stored in the browser cache. When requesting resources for the second time, the browser will first send the cache id to the server. After receiving the id, the server will determine whether the ID matches. If the id does not match, it indicates that the resource is updated, and the server will return the new data and the new cache ID to the browser. If the cache identifier matches, the resource has not been updated, and the 304 status code is returned, the browser reads the data from the local cache server.

Simple instructions

  1. Negotiation caching is a process in which the server determines whether cache resources are available, so the client communicates with the server via some identifier. This refers to the two groups of response headers, last-modified/if-modified-since, Etag/ if-none-match, and Etag/ if-none-match. The two pairs come in pairs.
  2. The response header for the first request will have a last-Modified or Etag field, and subsequent requests will have the corresponding field (if-modified-since or if-none-match). If the response header does not have a last-Modified or Etag field, There is no field in the request header.
  3. Last-modified and ETag can be used together. The server validates the ETag first. If the LAST-Modified is consistent, the server will continue to compare the last-Modified before deciding whether to return 304.
    • Last-Modified/IF-Modified-Since
      1. When a browser requests a resource for the first time, the server returns a response header with last-Modified, which indicates the Last time the resource was Modified. For example, last-Modified: Thu,31 Dec 2037 23:59:59 GMT. When the browser requests the resource again, the HTTP request header contains if-modify-since, which is the last-modified value returned before caching.
      2. After receiving if-modify-since, the server determines whether the cache is hit based on the last modified time of the resource. If the cache is hit, the 304 status code is returned, and the contents of the resource are not returned, nor is last-modified.
    • Etag/IF-None-Match
      1. ETag/ if-none-match is different from last-modify/if-modify-since in that it returns a verification code. ETag ensures that each resource is unique. Changes in resources will cause changes in ETag.
      2. The server determines whether the cache is hit based on the if-none-match value sent by the browser. Unlike last-modified, when the server returns a 304 Not Modified response, because the ETag has been regenerated, the response header will return the same ETag, even if it is unchanged.
      3. Etags appear in HTTP1.1 mainly to solve:
        • Some files may change periodically, but the contents of the file do not change (only the modification time). At this time, we do not want the client to think that the file has been changed and GET again.
        • For files that are Modified very frequently, such as in seconds or less (say N changes in 1s), the granularity that if-modified-since can check is s, which is impossible to determine (or the UNIX record MTIME can only be accurate to seconds).
        • Some servers do not have the exact last modification time of a file.

Here’s what the two look like together:

The impact of user behavior on caching

The user action Expires/Cache-Control Lash-Modified/ETag
Address box hit enter effective effective
Page link Jump effective effective
A new window effective effective
Move back effective effective
F5 to refresh invalid effective
Ctrl+F5 force refresh invalid invalid

Nginx Configure a cache policy

The reason why I want to talk about this topic is that I recently did an H5 project in the company, but the users could not automatically obtain the latest version because of frequent updates. The Leader asked me to try to solve this problem by myself.

I know it should be a cache problem, and I have learned Nginx, so I think I should start with Nginx.

Analysis about

The project is a Vue project, which is built from VUe-CLI. When packaging, the CSS and JS names are added with hash values, so the js and CSS generated after the modification are unique. The page requests new resources, so there is no caching problem. However, the entry file index.html will cause update problems due to the cache, and if we update it, but the browser is using the cache, we will have problems. Therefore, you need to set the entry file without strong cache. You need to go to the server every time to verify whether the file is modified, that is, use the negotiated cache.

Nginx configuration

server { listen 80; Server_name domain; Root directory; index index.html; location / { try_files $uri /index.html; } location ~ .*\.(html)$ { add_header Cache-Control no-store; # add_header cache-control no-cache; # add_header Pragma no-cache; }}Copy the code

or

location / { if ($request_filename ~ .*\.(htm|html)$) { add_header Cache-Control no-cache; }}Copy the code

Reference article – This article is very good, there are also examples to test, and the diagrams are well drawn, very good.

🏆 technology project issue 8 chat | magical function and problems of cache