Recently, when I was looking at the interview questions, I always saw some questions about Http cache, but I always had a vague understanding of them. In particular, Http headers have a lot of fields, like if-modified-since and if-none-match, that are really annoying. Then IT occurred to me that it would be nice to build a server, add headers, and see what happens. You can get a good idea of Http caching by looking up information on the Web and adding headers using ExpressJS.

Personal blog understand xie Xiaofei’s blog

Http profile

The browser communicates with the server over THE HTTP protocol, which always means that the client initiates a request and the server responds. The model is as follows:

HTTP packets are data blocks sent and responded to during communication between the browser and the server. The browser requests data from the server and sends a request message. The server returns data to the browser and sends a response message. The packet information is divided into two parts:

  1. Header: Contains additional information (cookie, cache information, etc.) and caches related rule information
  2. The data body part: the data content that the HTTP request really wants to transfer

Some of the headers used in this article are as follows:

The field names Field belongs to
Pragma General head
Expires Response headers
Cache-Control General head
Last-Modified Response headers
If-Modified-Sice Request header
ETag Response headers
If-None-Match Request header

Classification of Http caches

Http caches can be divided into two main categories, mandatory caches (also known as strong caches) and negotiated caches. The two types of cache rules are different. The mandatory cache does not need to interact with the server if the cache data is not invalid. Negotiated caches, as the name suggests, need to be compared to see if they can be used.

The two types of cache rules can exist at the same time, and the forced cache has a higher priority than the negotiated cache. That is, if the forced cache rule takes effect, the cache is used directly and the negotiated cache rule is not executed.

The original model

Let’s start by simply setting up an Express server without any cache headers.

const express = require('express');
const app = express();
const port = 8080;
const fs = require('fs');
const path = require('path');

app.get('/',(req,res) => {
    res.send(` 
         Document   Http Cache Demo   `)
})

app.get('/demo.js',(req, res)=>{
    let jsPath = path.resolve(__dirname,'./static/js/demo.js');
    let cont = fs.readFileSync(jsPath);
    res.end(cont)
})

app.listen(port,()=>{
    console.log(`listen on ${port}`)})Copy the code

We can see the request result as follows:

The request process is as follows:

  • The browser requests the static resource Demo.js
  • The server reads the disk file demo.js and sends it back to the browser
  • The browser requests again, and the server reads the disk file A.js again and sends it back to the browser.
  • Loop request..

It can be seen that the traffic of this method is related to the number of requests, but the disadvantages are also obvious:

  • Waste user traffic
  • A waste of server resources, the server has to read the disk file and then send the file to the browser
  • The browser waits for JS to download and execute before rendering the page, affecting the user experience

Next we start adding cache information to the header information.

1. Mandatory caching

There are two types of mandatory caching, Expires and cache-control.

Expires

The Expires value is the cache expiration time (GMT) that the server tells the browser. That is, if the current time on the browser has not reached the expiration time on the next request, the cached data will be used directly. Let’s set the Expires response header using our Express server.

// Other code...
const moment = require('moment');

app.get('/demo.js',(req, res)=>{
    let jsPath = path.resolve(__dirname,'./static/js/demo.js');
    let cont = fs.readFileSync(jsPath);
    res.setHeader('Expires', getGLNZ()) / / 2 minutes
    res.end(cont)
})

function getGLNZ(){
    return moment().utc().add(2.'m').format('ddd, DD MMM YYYY HH:mm:ss') +' GMT';
}
// Other code...
Copy the code

We added an Expires response header to demo.js, but since it’s GMT, we’ll use Momentjs to convert it. The first request will still be sent to the server, and the expiration date will be returned to us along with the file; But when we refresh, that’s when the magic happens:

You can see that the file was read directly from the memory cache without making a request. We’re going to set the expiration time here to two minutes, and after two minutes you can refresh the page and see the browser sends the request again.

Although this method adds cache control and saves traffic, it still has the following problems:

  • Because the browser time and server time are not synchronized, if the browser is set to a later time, the expiration time is not used
  • After the cache expires, the server reads the file again and returns it to the browser, regardless of whether the file has changed

However, Expires is an HTTP 1.0 thing, and the default browser now uses HTTP 1.1 by default, so its role is largely ignored.

Cache-Control

A new caching scheme is added for the time inconsistency between browser and server. Instead of telling the browser the expiration date directly, the server tells a relative time cache-control =10 seconds, meaning that within 10 seconds, the browser Cache is used directly.

app.get('/demo.js',(req, res)=>{
    let jsPath = path.resolve(__dirname,'./static/js/demo.js');
    let cont = fs.readFileSync(jsPath);
    res.setHeader('Cache-Control'.'public,max-age=120') / / 2 minutes
    res.end(cont)
})
Copy the code

Negotiation cache

The downside of mandatory caching is that the cache is always out of date. However, if the file has not changed after the expiration time, it is a waste of server resources to retrieve the file again. The negotiation cache has two groups of packets:

  1. The last-modified and If – Modified – Since
  2. The ETag and If – None – Match

Last-Modified

In order to save server resources, improve the scheme again. The browser negotiates with the server. Each time the server returns a file, it tells the browser when the file was last modified on the server. The request process is as follows:

  • The browser requests the static resource Demo.js
  • The server reads the disk file demo.js and returns it to the browser with last-modified (GMT standard format).
  • When the cache file on the browser expires, the browser takes on the request headerIf-Modified-SinceLast-modified (equal to last-Modified of the previous request) request server
  • The server compares the items in the request headerIf-Modified-SinceAnd the last time the file was modified. If so, continue with the local cache (304), if not return the file contents and last-Modified again.
  • Loop request..

The code implementation process is as follows:

app.get('/demo.js',(req, res)=>{
    let jsPath = path.resolve(__dirname,'./static/js/demo.js')
    let cont = fs.readFileSync(jsPath);
    let status = fs.statSync(jsPath)

    let lastModified = status.mtime.toUTCString()
    if(lastModified === req.headers['if-modified-since']){
        res.writeHead(304.'Not Modified')
        res.end()
    } else {
        res.setHeader('Cache-Control'.'public,max-age=5')
        res.setHeader('Last-Modified', lastModified)
        res.writeHead(200.'OK')
        res.end(cont)
    }
})
Copy the code

We refresh the page several times and see the request result as follows:

Although this scheme is further optimized than the previous three schemes, the browser detects if the file has been modified, and if it has not, it no longer sends the file; But there are still the following disadvantages:

  • The last-Modified time is GMT, which can only be accurate to seconds. If a file has been Modified for several times within one second, the server does not know that the file has been Modified, and the browser cannot obtain the latest file
  • If the file on the server has been modified many times but the contents have not changed, the server needs to return the file again.

ETag

In order to solve the problem caused by the inaccurate file modification time, the server and the browser negotiate again, this time do not return the time, return the unique identifier of the file ETag. ETag changes only when the contents of the file change. The request process is as follows:

  • The browser requests the static resource Demo.js
  • The server reads the disk file demo.js and sends it back to the browser with the ETag that uniquely identifies the file
  • When the cache file on the browser expires, the browser takes on the request headerIf-None-Match(equal to the ETag of the last request) request server
  • The server compares the items in the request headerIf-None-MatchAnd file ETag. If consistent, the local cache continues (304), and if inconsistent, the file contents and ETag are returned again.
  • Loop request..
const md5 = require('md5');

app.get('/demo.js',(req, res)=>{
    let jsPath = path.resolve(__dirname,'./static/js/demo.js');
    let cont = fs.readFileSync(jsPath);
    let etag = md5(cont);

    if(req.headers['if-none-match'] === etag){
        res.writeHead(304.'Not Modified');
        res.end();
    } else {
        res.setHeader('ETag', etag);
        res.writeHead(200.'OK'); res.end(cont); }})Copy the code

The request result is as follows:

Something extra

In the header table we can see that there is a field called Pragma which is a piece of dusty history….

Back in the “distant” http1.0 era, there were two fields –Pragma and Expires — that allowed clients to cache. Although these two fields are long overdue, you can still see them on many sites for HTTP backwards compatibility.

About the Pragma

When the value of this field is no-cache, the browser is told not to cache the resource, that is, it must send a request to the server each time.

res.setHeader('Pragma'.'no-cache') // Disable caching
res.setHeader('Cache-Control'.'public,max-age=120') / / 2 minutes
Copy the code

Caching is disabled by Pragma. Caching is set to two minutes with cache-control, but if we revisit it we will find that the browser will issue a request again, indicating that Pragma takes precedence over cache-Control

About the cache-control

We see that one of the properties in cache-Control is public, so what does that mean? Cache-control contains not only max-age, but also common values such as private, public, no-cache, max-age, and no-store. The default value is private.

  • Private: The client can cache
  • Public: both client and proxy servers can be cached
  • Max-age = XXX: the cached content will be invalid in XXX seconds
  • No-cache: A comparative cache is required to validate cached data
  • No-store: all contents will not be cached, forced cache, and comparison cache will not be triggered

Pragma: no-cache and cache-Control: no-cache are Pragma: no-cache and cache-Control: no-cache.

Priority of the cache

Above we said that the forced Cache is prioritized over the negotiated Cache, Pragma is prioritized over cache-control, so what is the priority order for the other caches? I consulted the information online and got the following order (PS: If you are interested, please let me know if it is correct) :

Pragma > Cache-Control > Expires > ETag > Last-Modified

If you think it’s good, check out my Nuggets page. Please visit Xie xiaofei’s blog for more articles

References:

HTTP cache priority issues

Thoroughly understand HTTP caching mechanisms and principles

Summary of HTTP cache control

Discussion on browser HTTP caching mechanism

The Express framework is a simple way to practice setting HTTP control over caching