preface

Have you ever wondered:

  • The server receives the request. Does the server need to check the cache?
  • What fields are checked?
  • What kind of caches need to be checked on the server side?
  • Force cache and negotiate cache order?
  • Does setting max-age:0 have anything to do with whether the browser is cached?
  • What’s the role of S-max-age?
  • How does the browser check whether the comparison cache is expired? What is the priority of these fields?
  • What’s the difference between last-Modified and Etag?
  • How is the string Etag generated?
  • What are from Disk cache and From Memory cache? When is it triggered?
  • What is heuristic caching? Under what conditions?

If you can’t answer any of the above questions for sure, here are seven nodeJS tips that will give you your own answers.

Item address: KOA-cache

An overview of the article is as follows:

Before we do this, let’s take a look at some of the ways HTTP caches resources. Without further ado, here we go.

Cache way

As shown in the picture below, a web page usually needs to load various types of resources, such as images, CSS, JS files and so on.The caching method is used to control these resources:

  1. Will it be cached?
  2. If resources are cached
    • Where can cached resources be cached?
    • Expiration time of cached resources?
    • How are cached resources updated when they expire?

The HTTP fields that Control these caches are cache-control, Expires, and last-modifed /Etag. The cache-control field has the highest priority. So, we will pass the following aspects:

  1. The cacheability of resources
  2. Cache storage Policy
  3. Cache expiration Policy
  4. Cache Update Policy

To see how cache-control fields work and how they relate to Expires, last-modifed, and Etag fields.

Cache storage Policy

The storage policy for a cached resource is used to determine whether and where the resource will be cached. The cacheability of a resource is controlled by the no-store value of cache-control. As follows:

Cache-Control: no-store
Copy the code

The resource with this response header set is not cached by any client or proxy. Where resources are stored is controlled by the private and public values of cache-Control, which represent two different ways of storing resources: private (browser) caches and shared (proxy) caches.

Private (browser) cache

Private caches can only be used by individual users. This means that requests sent by the client are stored only in the individual user’s own browser private cache. As shown in the figure below:

We can set the private cache mode of resources as follows:

Cache-Control: private
Copy the code

Share (proxy) cache

A shared cache can be used by multiple users. By setting up a shared cache, resources can be cached by any middleman, such as an intermediary agent, CDN, etc. See the figure belowWe can set the shared cache mode of resources as follows:

Cache-Control: public 
Copy the code

Cache Expiration Policy

The cache expiration policy determines when the cache resources expire and which unexpired and expired cache resources can be received by the client.

Set the expiration time of cache resources

To set the expiration time of a Cache, use the Expires field and cache-control fields below.

  1. Max-age: indicates the maximum time that resources can be cached. After this time the cache is considered expired (in seconds).
  2. S-max-age: Applies only to shared caches (such as individual proxies), private caches ignore it. The priority of S-maxage is higher than that of max-age. If s-max-age is present, the max-age and Expires headers are overwritten.

The max-age time is relative to the request time. It’s a relative time. Expires is an absolute time, so setting Expires is like setting a deadline for caching resources. Expires is not recommended because many servers are out of sync. In addition, if both cache-control max-age and s-max-age are set in the response header, the Expires header is ignored.

The client sets the acceptable cache resources

The following setting of the cache-control value determines which Cache resources the client can receive.

  1. Min-fresh: Can only appear in requests. Min-fresh requires the cache server to return cached data within min-fresh time. For example, cache-control :min-fresh=60, which requires the Cache server to send data within 60 seconds.
  2. Max-stale: Indicates that the client will receive the cached data even if it is expired. For example, cache-control :max-stale=60. This means that the client will receive resources that expire for less than 60 seconds on the Cache server.

Cache Update Policy

The purpose of the cache update policy is to provide a way for the client or cache server to determine whether our local cache needs to be updated.

There are three Cache update modes, which can be set using the following values of cache-control:

  1. No-cache: This value does not mean no cache. Using no-cache will still cache the resource. It just doesn’t read the cache directly. Before the cache can be read, a request needs to be sent to the server to confirm whether the resource is up to date. This validation process is called negotiation caching or comparison caching.
  2. Must-revalidate: Tells the browser and cache server that a local copy can be used until the local cache expires. Once the local copy expires, you must check it on the source server.
  3. Proxy-revalidate: Similar to must-revalidate, but only works with a server-side proxy server for many users. Private caches are not affected.

It is important to note that no specific instruction is required to revalidate a cache expiration. From the above, we can know that for must-revalidate, it will take effect only after the cache expires. That is to say, before the cache expires, it will not send a request to check whether the cache is updated, but will directly read the local cache. Since cache expiration automatically revalidates, why do you need must-revalidate?

Must -revalidate with the cache server

This is because various caching servers, such as NGINX, Vanish, and Squid, more or less allow the return of expired caches via cache-control commands or by modifying software configuration, so the addition of must-revalidate prevents the return of expired caches. Since caches with must-revalidate must be successfully revalidated in any case, there are no exceptions. So a more appropriate name for must-revalidate is never-return-stale.

Must-revalidate with browser

Does the browser return statle, that is, does the browser use an expired cache? The answer is yes, when we use the browser forward and back function, the browser will try to use the local cache to reopen the page, even if the cache has expired, it will not revalidate. Even must-revalidate does not force the browser to revalidate.

The cache of actual combat

To get a better understanding of HTTP caching, let’s implement a static server.

  1. Initialize the project
Mkdir koa-cache CD koa-cache # Initialize git init yarn init # Install dependency yarn add KOaCopy the code
  1. Project directory

The project directory is shown in the figure below:

  1. The code

Front-end code writing

// index.html
<! DOCTYPEhtml>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="Width = device - width, initial - scale = 1.0">
  <title>test cache</title>
  <link rel="stylesheet" href="/static/css/index.css">
</head>
<body>
  <div id="app">Cache test</div>
  <img src="/static/image/cat.jpeg" alt="">
</body>
</html>
Copy the code
// index.css
#app {
  color: #FBDC5C
}
Copy the code

The purpose of the server-side code is to get the index.html page and the corresponding front-end request resources (images and CSS files) by accessing localhost:3000 in the browser. To do this, we need to implement a static resource service. As follows:

// index.js
const Koa = require('koa')
const app = new Koa()

// Resource type table
const mimes = {
  css: 'text/css'.less: 'text/less'.html: 'text/html'.txt: 'text/plain'.xml: 'text/html'.gif: 'image/gif'.ico: 'image/x-icon'.jpeg: 'image/jpeg'.jpg: 'image/jpeg'.png: 'image/png'.svg: 'image/svg+xml'.tiff: 'image/tiff'.json: 'application/json'.pdf: 'application/pdf'.swf: 'application/x-shockwave-flash'.wav: 'audio/x-wav'.wma: 'audio/x-ms-wma'.wmv: 'video/x-ms-wmv'  
}

// Resolve the requested resource type
const parseMime = (url) = > {
  let extName = path.extname(url);
  extName = url == '/' ? extName.slice(1) : 'html'
  return mimes[extName] 
}

// Get the content of the requested resource
const parseStatic = (url) = > {
  let filePath = path.resolve(__dirname, `.${url}`);
  if (url == '/') {
    filePath = `${filePath}/index.html`
  }
  
  return fs.readFileSync(filePath)
}

app.use(async(ctx) => {
  const url = ctx.request.url;
  
  ctx.set('Content-Type', parseMime(url))
  ctx.set.body = parseStatic(url)
})

app.listen(3000.() = > {
  console.log('starting at port 3000')})Copy the code
  1. Start the project
node index.js
Copy the code

We open the browser and go tolocalhost:3000You can see our index.html page and the corresponding resources loaded.

Now that our static server is ready, let’s dive into the HTTP cache by implementing two resource caching strategies: strong caching and negotiated caching. 支那

Strong cache

First, the strong cache policy is defined as: the strong cache will set an expiration time for the resource. In the strong cache phase, that is, when the resource is not expired, it will not send a request to the server, but directly read the resource from the cache. After the resource expires, it will request the resource again to update. So, now let’s look at how to implement strong caching in terms of three aspects of caching.

  1. Cache storage policy: If this parameter is not set, requested resources are stored locally in the browser by default
  2. Cache expiration policy: Set Expires or max-age
  3. Cache update policy: if not set, cache expiration will be revalidated by default

Expires

// index.js
app.use(async (ctx) => {
  const url = ctx.request.url;
  ctx.set('Content-Type', parseMime(url))
  
  // Set expiration time to 30,000 milliseconds from now, which is 30 seconds from now
  const deadline = new Date(Date.now() + 30000).toGMTString()
  ctx.set('Expires', deadline)  

  ctx.body = parseStatic(url)
});
Copy the code

Note: The browser mentioned below refers to Chrome

  1. Open a browser and go to localhost:3000

The first request for the page and its resources returns 200.Take the CSS resource as an example. The following figure shows the Response header of the request and response:

  1. Refresh the page within 30 seconds

The second request for the page and its resources takes 0ms to read directly from the memory cache

  1. After 30 seconds, refresh the page

Conclusion: A resource with an Expires field will request the resource again after the set Expires field expires, and before it expires, it will read the browser’s local cache directly

Actual station 2: max-age

// index.js
app.use(async (ctx) => {
  const url = ctx.request.url;
  ctx.set('Content-Type', parseMime(url))
  
  // Set the expiration time to 30 seconds later
  ctx.set('Cache-Control'.'max-age=30')

  ctx.body = parseStatic(url)
});
Copy the code
  1. Open a browser and go to localhost:3000: as shown below

Take the CSS resource as an example. The following figure shows the Response header of the request and response:

  1. Refresh the page within 30 seconds

  1. After 30 seconds, refresh the page as follows

Conclusion:If the max-age field is set, the system requests resources again after the max-age field expires. Before the max-age field expires, the system directly reads resources from the local cache of the browser

Expires vs Max-age

app.use(async (ctx) => {
  const url = ctx.request.url;
  ctx.set('Content-Type', parseMime(url))
  
  // Set both max-age and Expires
  ctx.set('Cache-Control'.'max-age=20');
  const deadline = new Date(Date.now() + 60000).toGMTString()
  ctx.set('Expires', deadline);
  

  ctx.body = parseStatic(url)
});
Copy the code
  1. Open a browser and go to localhost:3000

  1. Refresh the page within 20 seconds

  1. Refresh the page within 60 seconds after 20s

Conclusion: Both max-age and Expires resources are set, and max-age is the main one.

Negotiate the cache

Firstly, the definition of negotiated cache is that each time a resource is requested, it needs to send a request to the server and confirm whether the resource has changed by comparing the local resource with the server resource. If the resource changes, 200 and the latest resource are returned to the client. If the resource does not change, the client reads the local cache after 304 is returned. To implement the negotiated cache, you need to use the Etag or if-Modified field. When the server adds an Etag or if-Modified field to the response resource, the client will automatically add an if-none-match field or if-modified-since field on the next request.

If you do not set an expiration time for the resource, the browser automatically starts the heuristic cache. By default, the heuristic cache sets an expiration time for the resource, with 10% of the value of ** date-last-modifed as the cache time. ** Therefore, when implementing a negotiated Cache policy with last-Modified, you need to disable the heuristic Cache with the no-cache value of cache-control.

There are two ways to implement the negotiated cache:

  1. Case 1: If the expiration time is not set for the resource, the browser automatically starts the heuristic cache. After the resource expires, the browser uses the negotiated cache to verify whether the resource is updated
  2. Case 3: The expiration time is set for the resource. After the resource expires, the cache is negotiated to verify whether the resource is updated
  3. Case 3: No expiration time is set for the resource, and heuristic caching is disabled. Each request requires a request to be sent to the server to verify that the resource is expired.

4: Etag &if-none-match

The code is as follows:

const md5 = (data) = > {
  let hash = crypto.createHash('md5');
  return hash.update(data).digest('base64');
}

app.use(async (ctx) => {
  const url = ctx.request.url;
  ctx.set('Content-Type', parseMime(url))
  
  // Calculate the set eTAG and verify the comparison
  const buffer = parseStatic(url)
  const fileMd5 = md5(buffer); // Generate the MD5 value of the file
  const noneMatch = ctx.request.headers['if-none-match']
  
  if (noneMatch === fileMd5) {
    ctx.status = 304;
    return;
  }
  
  console.log('Etag cache invalid ')

  
  ctx.set('Etag', fileMd5)
  ctx.body = buffer
});
Copy the code
  1. Open a browser and go to localhost:3000: as shown below

  1. Then refresh the page again

Let’s look at the response header for requesting CSS resources: etag equals if-none-matchSo the resource hasn’t changed, so return 304

  1. After the CSS is modified, refresh the page

Again, let’s look at the response header:Return 200 because the CSS resource has changed, and return the most recent resource

  1. Next, refresh the page again

5. What’s your name

The code is as follows:

app.use(async (ctx) => {
  const url = ctx.request.url;
  ctx.set('Content-Type', parseMime(url))
  
  const filePath = path.resolve(__dirname, `.${url}`)
  const stat = fs.statSync(filePath)
  // The last time the file was modified
  const mtime = stat.mtime.toGMTString();
  const ifModifiedSince = ctx.request.header['if-modified-since']
  
  if (mtime === ifModifiedSince) {
    ctx.status = 304
    return
  }
  console.log('Negotiation cache last-modifed invalid')
  
  // About heuristic caching
  ctx.set('Cache-Control'.'no-cache')
  ctx.set('Last-Modified', mtime)
  
  ctx.body = parseStatic(url)
});
Copy the code

With the code above, you can achieve the same effect as in the example above. You can verify this in the same way as in the example above.

Strong cache + negotiated cache

Strong cache plus negotiation cache. This means that we will set an expiration time for the resource, and only after the resource expires will we send a request to the server to compare the local resource with the server resource to see if the resource has changed. If the resource changes, 200 and the latest resource are returned to the client. If the resource has not changed, 304 is returned and the resource is read from the local cache.

Practice 6: Last-Modified + heuristic cache

app.use(async (ctx) => {
  const url = ctx.request.url;
  ctx.set('Content-Type', parseMime(url))
  
  const filePath = path.resolve(__dirname, `.${url}`)
  const stat = fs.statSync(filePath)
  // The last time the file was modified
  const mtime = stat.mtime.toGMTString();
  const ifModifiedSince = ctx.request.header['if-modified-since']
  
  if (mtime === ifModifiedSince) {
    ctx.status = 304
    return
  }
  console.log('Negotiation cache last-modifed invalid')

  ctx.set('Cache-Control'.'must-revalidate')
  ctx.set('Last-Modified', mtime)
  
  ctx.body = parseStatic(url)
});
Copy the code
  1. Open a browser and go to localhost:3000: as shown below

  1. Refresh the page again

In this case, the cache time of the resource is calculated by the heuristic cache algorithm. Refresh the page again. Since the cache time of the resource has not expired, the browser will read the resource directly from the memory cache.

  1. Modifying the CSS File
#app {
  color: #FBDC5C;
  font-size: 50px;
}
Copy the code
  1. Refresh the page again

As shown above, the resource cache time calculated according to the heuristic cache calculation method indicates that the resource has expired. The request is reissued because the CSS file has been changed, so the server returns the latest CSS file resource.

  1. Refresh the page again

Because resources have long expired. So go straight to the negotiation cache phase. Since the CSS file has not changed, the browser reads the file directly from the local cache after returning 304.

Actual combat 7: Max-age + Etag/ last-modified

const md5 = (data) = > {
  let hash = crypto.createHash('md5');
  return hash.update(data).digest('base64');
}

app.use(async (ctx) => {
  const url = ctx.request.url;
  ctx.set('Content-Type', parseMime(url))
  
  // Set the cache time for the resource to 30 seconds
  ctx.set('Cache-Control'.'max-age=60')

  // Calculate the set eTAG and verify the comparison
  const buffer = parseStatic(url)
  const fileMd5 = md5(buffer); // Generate the MD5 value of the file
  const noneMatch = ctx.request.headers['if-none-match']
  
  if (noneMatch === fileMd5) {
    ctx.status = 304;
    return;
  }
  
  console.log('Etag cache invalid ')
  
  ctx.set('Etag', fileMd5)
  ctx.body = buffer
});
Copy the code
  1. Open a browser and go to localhost:3000: as shown below

  1. Within 60s, refresh the page again

  1. After 60s, refresh the page again

  1. After the CSS file is modified, refresh the page within 60 seconds

The latest CSS resource is not returned.

  1. After 60s, refresh the page again

Because the cache of the resource has expired, a request is sent to the server to verify that the local cache resource is up to date. Because the locally cached CSS resource is not up to date, the server returns the latest CSS resource

Memory Cache&Disk Cache

In the example above, the cache is read directly from the memory cache when the cache is not expired. In the browser, once TAB is closed, the memory cache is freed. The disk cache is a disk cache that is not released even if TAB is disabled. Now, let’s do a little experiment to see when the browser will read the memory cache and when it will read the disk cache.

The code is as follows:

// index.js
app.use(async (ctx) => {
  const url = ctx.request.url;
  ctx.set('Content-Type', parseMime(url))
  
  // Set the expiration time to 1 day
  ctx.set('Cache-Control'.'max-age=86400')

  ctx.body = parseStatic(url)
});
Copy the code
  1. Open a browser and go to localhost:3000

  1. Continue to refresh the page

  1. Now close that TAB, open a new TAB in your browser, and access localhost:3000

From the figure above, we can see that resources are read from the Disk cache. Therefore, it can be argued that when TAB is disabled, the memory cache will be cleared, and when the memory cache is cleared, the browser will read resources from the disk cache.

conclusion

  1. Use a strong cache policy only for resources: Cache resources are read directly from the local cache before they expire. After the resource expires, the server requests the resource, regardless of whether the resource is the latest, will return.
  2. Only the negotiated cache policy is used for resources: each request for resources is sent to the server. If the resource has not changed after the server comparison, 304 is returned, and the client reads the resource from the local cache and returns. If the resource changes after the server comparison, 200 and the latest resource are returned.
  3. Use a strong cache + negotiated cache policy for resources. Read cached resources directly from the local cache before the resource expires. After the resource expires, the client requests the resource from the server. If the resource does not change after the comparison, the client returns 304. Then the client reads the resource from the local cache and returns. If the resource changes after the server comparison, 200 and the latest resource are returned.