There is a lot of information about HTPP Cache on the Internet, but I feel most of it is not well covered, and this topic is very important for Web developers. In fact, if you want to learn more about it, you can refer to MDN and Google Developer website. Never look at an RFC document. It’s too hard for you to read and you give up.

This paper is a note to explain the key points and why the topic is not good to talk about (most of the reasons for the bad writing are the overall control is not good, or the understanding is one-sided).

Understand the difference between Private Browser caches and Shared proxy caches

HTTP Cache includes browser Cache and proxy server Cache (such as CDN), and many articles do not distinguish between the two effectively, so it can be confusing.

The service architecture of the browser Cache might look like this: Browser (Cache) => server. The proxy server Cache architecture is as follows: Browser =>CDN (Cache) => source server.

Different HTTP caches solve different problems and are used in different scenarios. I understand that browser caching is primarily about avoiding unnecessary requests and heavy network traffic, while proxy caching is about bringing services closer to the user and more efficiently (and of course addressing requests and network traffic).

For Web developers, browser caches are probably the most common, and this article focuses on such caches.

For HTTP Cache Header instructions (mainly cache-control), there are many differences in the use of these two types of Cache, which need to be carefully distinguished.

Browser behavior is not controllable

HTTP Cache is controlled by the HTTP Cache Header directive, which is divided into request and response directives. The response directives tell the browser what to do (which the browser does not comply with), and the response directives partly control future requests.

However, different browsers may have different processing mechanisms for request commands. For example, browsers may issue different request commands for actions such as “Rollback action”, “F5 Action”, and “Ctrl +F5”.

Take a look at the figure below to see why the “fallback action” retrieves data from the browser cache.





Browser behavior control directives

Directives using the HTTP/1.1 standard

HTTP is always evolving, regardless of browser versions or server issues. Try to use headers from the latest standard protocols, because confusing interpretations can be confusing, as directives like Expires and Pargma can be replaced by cache-control.

Understand the cache-control directive correctly

This instruction is a general header field, that is, this instruction can be used as a request and a response instruction, and this instruction also has multiple parameters, for example, its parameter max-age = 0 in the request and response instructions respectively represent what? We must distinguish clearly when we understand.

Further understanding of the cache-control directive

Understanding this directive basically means understanding HTTP Cache, I personally feel that this sentence (cache-control directives Control who can Cache the response, under which conditions, and for how long) accurately describes this directive.

It has three meanings:

(1) Can cache (for response)

  • Private: indicates that it should only exist with the browser cache.
  • Public: indicates that it can be cached in the browser or CDN.
  • No-cache: This word is confusing, not because the cache cannot be used, but because it must be checked by the server before it can be used.
  • No-store: indicates that the cache is not allowed.

(2) How long to cache (for response)

  • Max-age = seconds to tell the browser how long the cache is valid.

(3) Revalidation (for a response, a condition check)

  • Must-revalidate: indicates that the browser must check the server to see if the local cache is valid. This parameter is similar to the request parameter max-age = 0.

This directive visually tells the browser whether or not you can cache the object, how long it should be cached, and whether or not you should check it every time you use the cache.

How do you use your cache-control policy

The cache-Control strategy is an art form for a developer. The first step is to understand the nature of the resource and define the HTTP Cache policy based on that. This image from the Google Developer website illustrates the strategy.





The cache-control strategy

  • Is this resource cacheable?
  • Does the client need to check with the server every time it uses the cache?
  • Is the cache Public or Private?
  • What is the cache time?
  • What is the resource identifier (Etag)?

How does the browser validate the cache

From the above description, developers understand how to set up HTTP Cache, so how can browsers choose whether to use caching or not? Understanding this will strengthen your understanding of the cache-control policy.

The ETag directive and last-Modified directive are added to the ETag directive and last-Modified directive. ETag indicates the uniqueness of the resource. If the value changes, the resource is updated. Last-modified indicates when the resource was Last updated.

As you can see from the above figure, developers can output both headers in response. So what does this directive mean?

When the browser discovers that there is a local cache and the server indicates that there is no need to confirm each use, the browser can use the local cache directly.

When the browser finds that the cache has expired, it can choose to retrieve the resource again, but in this case, the server resource has not changed, so to reduce bandwidth usage, the server prints a 304 HTTP header, telling the browser that you continue to use the cache you stored. The question is, how does the server know that the resource has not changed (from the time the browser cache took effect)? If the last-Modified header is printed in the first response, then the client finds that the cache is invalid. If a resource on the server was Last updated less than or equal to the if-modified-since time, the resource is actually new. And then it sends 304 heads.

How to configure it through Nginx

In most cases, Nginx does the following configuration, but it’s important to understand the implications.

location ~* \.(ico|css|js|gif|jpe? g|png) (\? [09 -] +)? $ { expires10d;
}Copy the code

The expires parameter outputs the following header:

Cache-Control:max-age=864000
Date:Tue, 28 Mar 2017 10:00:38 GMT
ETag:"5864a0ab-1e75"
Expires:Fri, 07 Apr 2017 10:00:38 GMT
Last-Modified:Thu, 29 Dec 2016 05:35:39 GMTCopy the code

If the cache does not expire, it will be used all the time and will not be checked by the server each time. If you want to check each time you request a resource, you can use the following command:

location ~* \.(ico|css|js|gif|jpe? g|png) (\? [09 -] +)? $ { expires10d;
    add_header Cache-Control "no-cache,must-revalidate,max-age=0";
}Copy the code

How do dynamic programs control

The dynamic program is responsible for all the OUTPUT of the HTTP Cache header, and also calculates the Etag itself.

<? php$now = gmdate("D, d M Y H:i:s", time() ) . " GMT";
$if_modified_since = isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])?$_SERVER['HTTP_IF_MODIFIED_SINCE'] : false;

if ($if_modified_since && $if_modified_since  >$now){
    header('the HTTP / 1.1 304 Not Modified');
    exit(a); }else {

    $seconds_to_cache = 3600*24;
    $ts = gmdate("D, d M Y H:i:s", time() + $seconds_to_cache)." GMT";

    header("Last-Modified: $ts");
    header("Cache-Control: no-cache, must-revalidate"); }Copy the code