preface

HTTP cache policy I believe we all have a certain understanding, but their own words are probably vague understanding. So, it’s back to the induction, and this time, figure it out!

The role of browser caching

  1. Caching reduces redundant data transmission and saves network bandwidth, resulting in faster page loading and reduced user traffic consumption
  2. Caching reduces the demands on the server, which makes it more responsive.

An overview of the

There are two types of browser caching strategies: strong caching and negotiated caching. One see strong cache, there may be associated with the “weak cache”, the “strong” describe really is not very appropriate, actually refers to the response of the fresh, strong cache cache and negotiation refers to the response of the old (expired), and had to use reuse old response condition request (If – XXX) cache validation, we can speak below.

The basic principle of

  1. When loading a resource, the browser determines whether the strong Cache is matched based on the Expires and cache-control parameters in the request header. If yes, the browser reads the resource directly from the Cache without sending a request to the server.
  2. If the strong cache is not hit, the browser must send a request to the server to last-Modified and ETag to verify that the resource hit the negotiated cache, and if so, return 304 to read the cache resource
  3. If neither hits, request the resource to be loaded directly from the server

Where do cached files go

Memory cache and disk cache

  • 200 From Memory cache: It caches resource files into memory, but reads data directly from memory. However, data will be cleared when you exit the process in this way, for example, close the browser. Scripts, fonts, and images are stored in the memory cache.

  • 200 From Disk cache: Cache resource files to hard disks. Data remains after the browser is closed; Non-script files, such as the CSS, are stored on hard disks.

Access memory cache first, disk cache second, and network resource request last. It is also faster to read data from a memory than from a hard disk. However, it is not possible to store all data in memory because memory is also limited.

Build a service simulation demo with Express

const path = require('path')
const fs = require('fs')
const express = require('express')
const app = express()

app.get('/'.(req, res) = > {
  res.send(` 
         
       
       Init Demo   Hello Init Demo< script SRC ="/test.js">  `)
})

app.get('/test.js'.(req, res) = > {
  let sourcePath = path.resolve(__dirname, '.. /public/test.js')
  let result = fs.readFileSync(sourcePath)
  res.end(result)
})

app.listen(3000.() = > {
  console.log('listening on 3000')})Copy the code

As shown in the code above, when entering http://localhost:3000/, the request process is as follows:

  • The browser requests static resourcestest.js
  • The server reads the disk filetest.js, returns to the browser
  • The browser requests again, and the server reads the disk file againtest.js, back to the browser.

If you refresh the page several times and re-enter it, you can see that the test.js request is reissued each time. No caching is done in this way.

Strong cache

Strong caching is implemented through both Expires and cache-control response headers

Principle:

When loading a resource, the browser determines whether to force caching based on the header information (Expires and cache-control) in the local Cache resource. If a hit is made, the resource in the cache is used directly. Otherwise, the request continues to be sent to the server.

Expires

Expires is the HTTP1.0 specification that represents the expiration time of the resource, which describes an absolute time (with a value of GMT time) returned by the server.

If the current time on the browser is less than the expiration time, the cached data is directly used.

app.get('/test.js'.(req, res) = > {
  let sourcePath = path.resolve(__dirname, '.. /public/test.js')
  let result = fs.readFileSync(sourcePath)
  res.setHeader(
    'Expires',
    moment().utc().add(1.'m').format('ddd, DD MMM YYYY HH:mm:ss') + ' GMT' // Set expiration to 1 minute
  ) 
  res.end(result)
})
Copy the code

In the code above we added the Expires header, which returns both the expiration date and the file after the first request is made to the server. When you refresh again, you see that the file was read directly from the memory cache; When a minute has elapsed, the request will be renewed.

This mode is limited by the local time. If the time on the server is different from that on the client (for example, if the browser is set to a later time, the time is always expired), the cache may become invalid. And after expiration, the server reads the file again and returns it to the browser, regardless of whether the file has changed.

Expires is the stuff of HTTP 1.0, and most default browsers now use HTTP 1.1. Its role is largely ignored because cache-control is the primary reason.

Cache-Control

Aware of the shortcomings of Expires, cache-control was introduced in HTTP 1.1 as an alternative, which uses max-age to determine the Cache time in seconds and takes precedence over Expires, which represents a relative time. That is, if both max-age and Expires exist, they are overridden by the max-age of cache-control.

app.get('/test.js'.(req, res) = > {
  let sourcePath = path.resolve(__dirname, '.. /public/test.js')
  let result = fs.readFileSync(sourcePath)
  res.setHeader('Cache-Control'.'max-age=60') // Set the relative time -60 seconds to expire
  res.end(result)
})
Copy the code

Again after the first request, you can see that the negotiated cache was hit.

In addition to this field, there are other fields that can be set:

public

Indicates that it can be cached by browsers and proxy servers. Proxy servers are usually used by Nginx

private

Only the client can cache the resource; The proxy server does not cache

no-cache

Skip setting strong cache, but do not prevent setting negotiated cache; If you have a strong cache, you will only use the negotiation cache if the strong cache fails. If you set no-cache, you will not use the negotiation cache.

no-store

Disallow caching and rerequest data each time.

For example, if I disable caching and repeat the above operation, I will request the server every time

res.setHeader('Cache-Control'.'no-store, max-age=60') // Disable caching
Copy the code

Negotiate the cache

The disadvantage of a strong cache is that it will expire every time, but if the file has not changed after the expiration time, it will be a waste of server resources to fetch it again, hence the negotiated cache.

When a browser request for a resource does Not match the strong cache, it sends a request to the server to verify whether the negotiated cache is hit. If the negotiated cache is hit, the HTTP status returned by the request response is 304, telling the browser to read and swap out, and displaying a Not Modified string. If there is no hit, the requested resource is returned.

The negotiated cache is managed using last-modified/if-Modified-since and ETag/ if-none-match pairs

Principle:

Request the client to the server, the server will detect if there is a corresponding identification, if there is no corresponding identifier, to identify the server returns a corresponding to the client, the client request again next time, bring the logo in the past, and then the server will verify the identity, if verification is passed, will respond to 304, telling the browser reads cache. If the identity does not pass, the requested resource is returned.

Last-Modified

The process is as follows:

  • The browser requests the resource, and the server returns the file each timeLast-Modified(Last modification time) into the Header
  • When the browser cache file expires, the browser takes on the request headerIf-Modified-Since(The value is the previous oneLast-Modified) Request server
  • The server compares the request headerIf-Modified-SinceThe same as the last time the file was modified, the hit cache returns 304; If not, the 200 response is returned and the contents of the file are updatedLast-ModifiedAnd so forth.
app.get('/test.js'.(req, res) = > {
  let sourcePath = path.resolve(__dirname, '.. /public/test.js')
  let result = fs.readFileSync(sourcePath)
  let status = fs.statSync(sourcePath)
  let lastModified = status.mtime.toUTCString()
  if (lastModified === req.headers['if-modified-since']) {
    res.writeHead(304.'Not Modified')
    res.end()
  } else {
    res.setHeader('Cache-Control'.'max-age=1') // Set the expiration to 1 second so that we can use last-Modified immediately
    res.setHeader('Last-Modified', lastModified)
    res.writeHead(200.'OK')
    res.end(result)
  }
})
Copy the code

The first request is as follows:

After 1 second expires, the request is returned with 304, matching the negotiation cache:

I ran into a small problem with a 200 response message in the header of the request:

Provisional headers are shown. Disable cache to see full headers.
Copy the code

It may be caused by plug-ins, so I opened the traceless mode and solved it (if you have other reasons and solutions, please leave a message and tell me).

ETag

Last-modified also has its drawbacks. For example, the GMT modification time is only accurate to the second. If a file changes several times within a second, the server does not know that the file has been changed and the browser cannot get the latest file. And if the file is modified and then revoked, the content remains unchanged, but the last modification time changed, also need to request again. It is also possible that the server does not obtain the correct file modification time, or the time is inconsistent with that of the proxy server.

In order to solve the problem caused by the inaccurate file modification time, the server and the browser negotiate again, this time do not return the time, return the unique identifier of the file ETag. ETag changes only when the contents of the file change. ETag has a higher priority than last-Modified.

The process is as follows:

  • The browser requests the resource, and the server returns the file with its unique ETag
  • When the browser cache file expires, the browser takes on the request headerIf-None-Match(The value is the previous oneETag) Request server
  • The server compares the request headerIf-None-MatchAs with the file’s ETag, the hit cache returns 304; If not, the 200 response is returned and the contents of the file are updatedETagAnd so forth.
const md5 = require('md5')

app.get('/test.js'.(req, res) = > {
  let sourcePath = path.resolve(__dirname, '.. /public/test.js')
  let result = fs.readFileSync(sourcePath)
  let etag = md5(result)

  if (req.headers['if-none-match'] === etag) {
    res.writeHead(304.'Not Modified')
    res.end()
  } else {
    res.setHeader('ETag', etag)
    res.writeHead(200.'OK')
    res.end(result)
  }
})
Copy the code

When requesting again:

  • It’s not specified in HTTP, rightETagGenerating hashes is a common practice in real projects.

However, each ETag server generation requires a read and write operation, whereas last-Modified only requires a read operation, so the ETag generation calculation is more expensive.

Priority of the cache

The first thing to make clear is that strong caching takes precedence over negotiated caching

In HTTP1.0 there is also a Pragma field, which is also a strong cache. When this field is no-cache, it tells the browser not to cache the resource, that is, to send a request to the server once at a time, which takes precedence over cache-control

If you want to see Pragma, you will see Pragma in the Request Header when you enable disable cache in Chrome devTools or press Ctrl + F5 to force a refresh.

The last priority is:

Pragma > Cache-Control > Expires > ETag > Last-Modified

How to clear the cache

By default, the browser caches static resources such as images, CSS, and JS. Sometimes, in the development environment, the resources may not be updated in time due to strong cache. You can use the following methods:

  1. Ctrl+F5 force refresh (F5 will skip the strong cache rule and go directly to the negotiation cache)
  2. Google Chrome can be selected in NetworkDisable cache
  3. Add a timestamp to the resource file
  4. Other Settings are webpack


Example Github code: http-cache-demo

reference

  • Understand negotiated caching and mandatory caching for HTTP browsers
  • The front-end also needs to understand the Http caching mechanism


  • Ps: Personal technical blog Github warehouse, if you feel good welcome star, encourage me to continue to write ~