Browser HTTP caching is a common way to optimize Web performance and is often examined in front-end interviews. This article takes you through the HTTP caching mechanism of a browser in action by configuring a KOA2 server.

To get a feel for the browser HTTP cache:

In the preceding screenshot, you can directly access the homepage of V2EX from the browser. The size displayed in the rectangle is from Disk cached, indicating that these resources match the strong cache. The status code of the strong cache is 200.

Let’s see what happens when I access the image directly from the arrow above:

How is the cache judgment rule implemented

In fact, all network protocols are a set of specifications, how the client and server is just in accordance with the specification to achieve it. The same is true of the browser HTTP cache. The browser was developed in accordance with the HTTP cache specification, and our HTTP server should follow the specification as well. Of course, you wrote the server yourself, so you can go off spec, but the browser doesn’t know what you’re doing and the HTTP cache won’t work.

We know that when a browser interacts with a server, it sends request data and response data, which we call HTTP packets.

The browser HTTP caching protocol is essentially implemented by carrying cache-relevant fields in the header of the request response.

Classification of browser HTTP caches

The browser HTTP cache takes two minutes:

  1. Strong cache
  2. Negotiate the cache

Strong caching means that the browser determines whether the cache has expired locally and reads the cache directly from memory or disk without communicating with the server.

The negotiation cache sends a negotiation request to the server with a request header related to the negotiation cache. The server determines whether the cache has expired and returns a status code of 304. When the browser finds that the return code of the response is 304, it directly reads the local cache. If the server determines that it has expired, it returns the requested resource and last-Modified with a status code of 200.

The brief process of determining cache when a browser requests a resource is as follows:

When a browser requests a resource, it first checks whether the resource is cached in memory or on disk. If there is no cache, the browser may not have accessed the resource before or the cache may have been cleared and you have to request the resource from the server.

If there is a cache, the strong cache is checked first. If a strong cache is hit, the local cache is used directly. If the strong cache is not hit but a last-modified response header related to the negotiated cache was returned during the last request for the resource, the request is sent to the server with a request related to the negotiated cache. The status code returned by the server determines whether the negotiated cache is hit. If so, the local cache is used. If there is no hit, the content returned by the request is used.

The difference between strong and negotiated caching

  1. The hit status codes are different. Strong cache returns 200, negotiated cache returns 304.
  2. Different priorities. The strong cache is determined first, and the negotiation cache is determined after the strong cache fails.
  3. Strong caching benefits more than negotiated caching because negotiated caching has one more negotiation request than strong caching.

Demo Server Description

The entire KOA2 demo server is here: koA2-browser-http-cache. Js entry file, index.html home page source code, sunset. JPG and style.css are the images and styles used in index.html.

The server code is very simple, with three routes using the KOA-Router. No caching code has been written yet.

// src/index.js
const Koa = require('koa');
const Router = require('koa-router');
const mime = require('mime');
const fs = require('fs-extra');
const Path = require('path');

const app = new Koa();
const router = new Router();

// Handle the first page
router.get(/(^\/index(.html)? $) | (^ \ / $) /.async (ctx, next) => {
    ctx.type = mime.getType('.html');

    const content = await fs.readFile(Path.resolve(__dirname, './index.html'), 'UTF-8');
    ctx.body = content;

    await next();
});

// Process the image
router.get(/\S*\.(jpe? g|png)$/.async (ctx, next) => {
    const { path } = ctx;
    ctx.type = mime.getType(path);

    const imageBuffer = await fs.readFile(Path.resolve(__dirname, `.${path}`));
    ctx.body = imageBuffer;

    await next();
});

// Process CSS files
router.get(/\S*\.css$/.async (ctx, next) => {
    const { path } = ctx;
    ctx.type = mime.getType(path);

    const content = await fs.readFile(Path.resolve(__dirname, `.${path}`), 'UTF-8');
    ctx.body = content;

    await next();
});

app
    .use(router.routes())
    .use(router.allowedMethods());


app.listen(3000);
process.on('unhandledRejection', (err) => {
    console.error('There's a promise, there's no catch', err);
});
Copy the code

After visiting the home page, the page looks like this:

The server does not currently have a cache configured. You can see that the size section shows the size of the resource. If it is a strong cache, it will display from memory cache or from Disk cache.

Strong cache

Strong caching is a very profitable tool in Web performance optimization.

Strong cache related header fields

As mentioned earlier, caching protocols are essentially implemented through request response headers. The header fields associated with strong caching are the following:

pragma

Progma is a product of HTTP1.0. It is similar to cache-control and can only be set to no-cache. The effect is the same as that of cache-control: no-cache. That is, strong Cache is disabled and only negotiated Cache is used.

expires

The response header field, which contains the date/time, indicates the expiration date of the resource. For example, Thu, 31 Dec 2037 23:55:55 GMT An invalid date, such as 0, represents a past date, indicating that the resource has expired.

If a “max-age” or “s-max-age” directive is set in the cache-control response header, the Expires header is ignored, meaning that cACAhe-Control takes precedence over Expires.

Because expires is a time value, if the system time difference between the server and the client is large, it can cause cache clutter.

cache-control

The fields added in HTTP 1.1 are designed to replace Pragma. The cache-control header field can be used in either the request header or the response header.

We all know that in Chrome when Shift + F5 or disable cache is checked in the Network panel the browser requests the latest resource every time it loads a resource instead of using the cache. If caching is disabled, the browser will send cache-control: no-cache to the server every time it requests a resource, telling the server that it does not need to negotiate cache and that it will return the latest resource directly.

Here is a screenshot of the image I requested with negotiated caching configured after I checked disable caching:

Cache-control is a response header field that is an improvement on expires. One of the cache-control values is cache-control: max-age=seconds, for example: cache-control: Max – age = 315360000. Seconds is a time difference, not a fixed time, and because of that time difference, there is no cache mess caused by client-side and server-side time synchronization mentioned above.

Priority of the header field that is strongly cached

Pragma > Cache-Control > Expires.

The specific process of strong caching

Before the browser HTTP cache brief flow, here specifically about the determination of strong cache process.

First, when the browser finds a cache of your requested resource in memory or on disk, it also checks to see if the last time the resource was requested it returned the strong cache-related response headers described above. Step by step, determine the priority of the header field related to strong caching as described above. There may be some fields that the server does not return, such as pragma, so judge directly later. Pragma: no-cache pragma: no-cache pragma: no-cache pragma: no-cache pragma: no-cache If there is no pragma, but there is cache-control: no-cache, this is just like pragma: no-cache, strong cache judgment fails. If cache-control: max-age=seconds, the expiration time is calculated based on the time when the browser last requested the resource and the seconds. If the expiration time is earlier than the expiration time, the strong cache is hit. If the expiration time is expired, the strong cache fails to be detected. So we said that if control-control is max-age or s-max-age then expires is immediately invalid. If the cache-control value is not the same or not, the expires value is also used to determine whether or not the expires value is expired. If the expires value is not, the strong cache will be hit. Otherwise, the cache will be invalid and the server will request the latest resource.

Configure strong caching using Expires

Change the SRC /index.js image route by adding an expires field in the response header with an expiration time of 2 minutes.

Router. Get (/\S*\.(jpe? g|png)$/, async (ctx, next) => { const { response, path } = ctx; ctx.type = mime.getType(path); // Add the Expires field to the response header with an expiration time of 2 minutes+ response.set('expires', new Date(Date.now() + 2 * 60 * 1000).toString());

    const imageBuffer = await fs.readFile(Path.resolve(__dirname, `.${path}`));
    ctx.body = imageBuffer;

    await next();
});
Copy the code

First Visit:

Note where my arrow points above, the long left mouse click on the load button brings up three different load options, the last one in particular is useful for development, clearing the page cache.

Then immediately refresh the page:

Two minutes later, the refresh is the same as the first image, so I won’t show the screenshot. As you can see, configuring strong caching is very simple: configure the response header according to the protocol.

Test Pragram, Cache-Control, and Expires priorities

Add cache-control to the response header: no-cache, that is, strong cache is not allowed.

Router. Get (/\S*\.(jpe? g|png)$/, async (ctx, next) => { const { response, path } = ctx; ctx.type = mime.getType(path);+ response.set('cache-control', 'no-cache');Response.set ('expires', new Date(date.now () + 2 * 60 * 1000).toString()); const imageBuffer = await fs.readFile(Path.resolve(__dirname, `.${path}`)); ctx.body = imageBuffer; await next(); }); Router.get (/\S*\.css$/, async (CTX, next) => {const {path} = CTX; ctx.type = mime.getType(path); const content = await fs.readFile(Path.resolve(__dirname, `.${path}`), 'UTF-8'); ctx.body = content; await next(); });Copy the code

After cache-control: no-cache is set, the browser no longer uses caching. If caching is used, it is explained in the Status Code section as shown in the above screenshot. Conclusion Cache-Control does take precedence over Expires.

Setting cache-control: max-age=60 theoretically should make the cache expire after 1 minute, which it did.

Notice the expires time in the following two images. The first screenshot shows 21:33 expires. However, because cache-control has a higher priority, it expires one minute earlier, so the result is that the cache expires at 21:22 minutes as shown in the second figure.

Let’s test pragma again.

Router. Get (/\S*\.(jpe? g|png)$/, async (ctx, next) => { const { response, path } = ctx; ctx.type = mime.getType(path);+ response.set('pragma', 'no-cache');Response. set('cache-control', 'max-age=${1 * 60}'); response.set('cache-control', 'max-age=${1 * 60}'); Response.set ('expires', new Date(date.now () + 2 * 60 * 1000).toString()); const imageBuffer = await fs.readFile(Path.resolve(__dirname, `.${path}`)); ctx.body = imageBuffer; await next(); }); Router.get (/\S*\.css$/, async (CTX, next) => {const {path} = CTX; ctx.type = mime.getType(path); const content = await fs.readFile(Path.resolve(__dirname, `.${path}`), 'UTF-8'); ctx.body = content; await next(); });Copy the code

The result is the same as cache-control: no-cache; the local cache is never used. So the conclusion is:

Pragma > Cache-Control > Expires.

Negotiate the cache

Negotiation cache needs to send a request to the server, so it has lower revenue than strong cache. The larger the cache resource volume is, the higher revenue is.

The header field associated with the negotiated cache

Which header fields in the negotiation cache are paired, i.e. :

  • The request header if-modified-since and the response header last-modified
  • Request header if-none-match and response header etag
if-modified-since 和 last-modified

Both values are time values in GMT format accurate to the second. It’s easy to understand what they mean literally: Have they been changed since…? , the time was last changed to xyz time.

What’s the connection between them? The if-modified-since header should be the last-modified value of the last request header.

When a browser makes a resource request that carries the if-modified-since field, the server compares the if-Modified-since value in the request header with the last modification time of the requested resource. If the resource was last modified later than if-Modified-since, Then the resource expires, the status code is 200, the response body is the requested resource, and the latest last-Modified value is added to the response header. Return 304 without expiration, hit the negotiation cache, the response body is empty, and the response header does not require a last-Modified value.

if-none-matchAnd the response headersetag

The above two fields are the first fields that handle negotiation caching in HTTP 1.0, and were only introduced in HTTP 1.1.

Here I summarize a few key points of the description of if-none-match in MDN (only GET request resources are discussed here) :

  1. The server returns the requested resource with a response code of 200 if and only if no resource on the server has an ETag attribute value that matches the one listed in the header.
  2. When the server generates a response with status code 304, the headers in the 200 response are cache-Control, Content-location, Date, ETag, Expires, and Vary.
  3. If-none-match has a higher priority than if-modified-since.

The common appearance of eTAG is eTAG: “54984C2B-44E”. As with the previous pair, the if-none-match header is the eTAG value in the response header that requested the resource last time.

One might look at this and ask: What the hell is eTag? Etag is the unique identifier of the requested resource. The simple way to implement this is to take a summary string of the requested resource using some hash algorithm and wrap it around double quotes.

The description of ETAG on MDN is:

They are ASCII strings between double quotes (such as “675AF34563DC-tr34”). The method for generating ETag values is not explicitly specified. Typically, you use the hash of the content, modify the hash of the timestamp last, or simply use the version number. For example, MDN uses hashes of hexadecimal numbers for wiki content.

Using if-none-match/etag header fields to handle negotiation caching is similar to if-Modified-since /etag. It’s just comparing hash values instead of dates.

Why do YOU need eTAG with Last-Modified?

  1. The resource is updated within 1 second and accessed within that second. The last-Modified negotiated cache cannot obtain the latest resource. This is essentially because last-Modified is accurate to the second and does not reflect changes in less than a second.
  2. When a resource has been modified several times without changing its content, last-Modified processing is wasteful. The last-Modified value of a resource will definitely change if it is modified many times, but if it is not, we don’t need the server to return the latest resource and use the local cache. This is not a problem with etag, because the same resource is modified multiple times with the same content and hash value.
  3. Using etag is more flexible, because etag does not necessarily use hash values as I said. Etag uses a weak comparison algorithm, that is, two files with identical contents can be considered identical except that each bit is identical. For example, two pages are considered the same if they differ only in the generation time of the footer.

Negotiates cache header field priority

If none – match > if – modified – since.

If-modified-since is ignored when the server receives a request that contains both if-modified-since and if-none-match.

Test the last-Modified configuration negotiation cache

As I write this, I slightly refactor the server code and configure the negotiated cache using last-Modified:

const Koa = require('koa');
const Router = require('koa-router');
const mime = require('mime');
const fs = require('fs-extra');
const Path = require('path');

const app = new Koa();
const router = new Router();

const responseFile = async (path, context, encoding) => {
    const fileContent = await fs.readFile(path, encoding);
    context.type = mime.getType(path);
    context.body = fileContent;
};

// Handle the first page
router.get(/(^\/index(.html)? $) | (^ \ / $) /.async (ctx, next) => {
    await responseFile(Path.resolve(__dirname, './index.html'), ctx, 'UTF-8');
    await next();
});

// Process the image
router.get(/\S*\.(jpe? g|png)$/.async (ctx, next) => {
    const { request, response, path } = ctx;
    response.set('pragma'.'no-cache');

    // max-age is accurate to second. Set the expiration time to 1 minute
    // response.set('cache-control', `max-age=${1 * 60}`);
    // Add the Expires field to the response header with an expiration time of 2 minutes
    // response.set('expires', new Date(Date.now() + 2 * 60 * 1000).toString());

    const imagePath = Path.resolve(__dirname, `.${path}`);
    const ifModifiedSince = request.headers['if-modified-since'];
    const imageStatus = await fs.stat(imagePath);
    const lastModified = imageStatus.mtime.toGMTString();
    if (ifModifiedSince === lastModified) {
        response.status = 304;
    } else {
        response.lastModified = lastModified;
        await responseFile(imagePath, ctx);
    }

    await next();
});

// Process CSS files
router.get(/\S*\.css$/.async (ctx, next) => {
    const { path } = ctx;
    await responseFile(Path.resolve(__dirname, `.${path}`), ctx, 'UTF-8');
    await next();
});

app
    .use(router.routes())
    .use(router.allowedMethods());


app.listen(3000);
process.on('unhandledRejection', (err) => {
    console.error('There's a promise, there's no catch', err);
});
Copy the code

The first is the first access with caching disabled. You can see that there is no if-modified-since in the request header. The server returns last-Modified.

If the disable cache is disabled, the last request is Not modified. If the last request is Not modified, the last request is Not modified. There is a slight glitch in the above code that does not set the content-Type when if-modified-since does not equal last-modified, but these details do not interfere with the core knowledge of negotiated caching.

When I replaced the sunset. JPG image with another one, the last modification time of the image changed, so the new image was returned and the latest last-modified was added to the response header. The if-modified-since attached to the next request is last-modified after this return.

The test used eTAG to configure the negotiation cache

Modify the route for processing images:

// Process the image
router.get(/\S*\.(jpe? g|png)$/.async (ctx, next) => {
    const { request, response, path } = ctx;
    ctx.type = mime.getType(path);
    response.set('pragma'.'no-cache');

    const ifNoneMatch = request.headers['if-none-match'];
    const imagePath = Path.resolve(__dirname, `.${path}`);
    const hash = crypto.createHash('md5');
    const imageBuffer = await fs.readFile(imagePath);
    hash.update(imageBuffer);
    const etag = `"${hash.digest('hex')}"`;
    if (ifNoneMatch === etag) {
        response.status = 304;
    } else {
        response.set('etag', etag);
        ctx.body = imageBuffer;
    }

    await next();
});
Copy the code

I’m not going to show the picture here, but it’s pretty much the same as using Last-Modified.

Note: My code here is just for demonstration purposes. If you were to actually configure a caching mechanism for production, you would cache the last-Modified and etag values of the resource indexed, rather than accessing the file state and reading the file each time using the hash algorithm as in my code.

How do I update a resource configured with strong caching?

The spring recruitment internship in front of me in the Tencent interview was asked this question twice, and then could not answer up, was hung. The interview felt very difficult, the impact is profound, at that time also asked how to do DNS optimization, but also the answer is not good, in the future, I will write an article about how to do DNS optimization.

Updating a strong cache is a difficult question to ask you if you haven’t studied it before. If you want to update the strong cache, if you request the same URL, the browser will return it to the cache before it expires. So the solution is to tweak the urls of pages that need to update the strong cache, that is, update the URLS when you need to update the strong cache. Because the need to update the URL, so the current page then can not use a strong cache, otherwise how to update the URL. JPG “. You can also insert the hash value of the resource content in the URL: /5bed2702-557a-sunset. JPG.

Here’s an example:

The SRC of the img tag in index.html on the home page is “/v1-0-0/sunset. JPG “, when the server changes the sunset. JPG to another image.

Visit index.html again. Since index. HTML itself is an HTML file that does not use strong caching and requires a request to the server every time it is accessed, the SRC on the page was changed to “/v1-0-1/sunset. JPG “and the image with strong caching was updated.

Finally put a stolen map, I really not good at drawing (. Here, the determination process of strong and negotiated cache is shown in detail.

Thank you for reading this article. If it is helpful to you, please pay attention to it and click the “like” button to support it. If there are any errors in this article, please point out them in the comments section.

This article is original content, first published in personal blog, reproduced please indicate the source.

Reference Resources:

  1. HTTP Caching
  2. Interview picks for HTTP caching
  3. Understand negotiated caching and mandatory caching for HTTP browsers