Browser cache policy

One, foreword

Starting with an interview question about how a browser enters a URL and renders a page, let’s talk about caching in the browser.

Caching is a simple and efficient way to optimize performance. A good caching strategy can shorten the distance of web page request resources, reduce latency, and reduce bandwidth and network load because cached files can be reused. This is why the browser opens the same page faster the second time.

For a data request, it can be divided into three steps: initiating a network request, back-end processing and browser response. Browser caching helps us optimize performance in the first and third steps. For example, if the cache is used directly without making a request, or if the request is made but the back-end stores the same data as the front-end, there is no need to send the data back, thus reducing the response data.

In the following sections, we will explore the browser cache mechanism by analyzing the cache process and cache location.

Second, the browser cache process analysis

The browser communicates with the server in reply mode, that is, the browser initiates an HTTP request and the server responds to the request. After the browser sends the request to the server for the first time and gets the request result, it decides whether to cache the result or not according to the cache identifier of the HTTP header in the response packet. If yes, the request result and cache identifier are stored in the browser cache. The simple process is as follows:

From the figure above, we can know:

Each time the browser initiates a request, it first looks up the result of the request and the cache identifier in the browser cache
Each time the browser receives the result of the returned request, it stores the result and the cache id in the browser cache

The above two conclusions are the key to the browser cache mechanism, which ensures that each request is cached and read. As long as we understand the rules of the browser cache, all problems will be solved, and this article will analyze this in detail. To help you understand, we have divided the caching process into two parts, namely mandatory caching and negotiated caching, depending on whether the HTTP request needs to be re-initiated to the server.

3. Cache location

From the point of view of the cache position, there are four kinds, and each has its priority. When the cache is searched in sequence and none is hit, the network will be requested.

Service Worker
Memory Cache
Disk Cache
Push Cache

1.Service Worker

A Service Worker is a separate thread that runs behind the browser and is typically used for caching. To use the Service Worker, the transport protocol must be HTTPS. Since request interception is involved in Service workers, the HTTPS protocol must be used for security. The Service Worker’s cache differs from other built-in caching mechanisms in that it gives us control over which files are cached, how the cache is matched, and how the cache is read, and the cache is persistent.

There are three steps to realize the caching function of Service Worker: The Service Worker needs to be registered first, and the required files can be cached after the install event is monitored. Then the next time the user accesses the server, it can intercept the request to check whether there is cache. If there is cache, it can directly read the cache file, otherwise it will request data.

When the Service Worker does not hit the cache, we need to call the fetch function to fetch data. That is, if we do not hit the cache in the Service Worker, we will look up the data based on the cache lookup priority. But whether we fetch data from the Memory Cache or from a network request, the browser displays what we fetch from the Service Worker.

2.Memory Cache

The Memory Cache is the Cache in Memory that contains the resources captured in the current page, such as styles, scripts, and images that have been downloaded on the page. It is definitely faster to read data from memory than from disk. Although memory cache reads data efficiently, it has a short cache duration and will be released as the process is released. Once we close the Tab page, the cache in memory is freed.

So since in-memory caching is so efficient, can we keep all our data in memory? It’s impossible. The amount of memory in a computer is bound to be much smaller than a hard disk, and operating systems need to be very careful about how much memory they use, so there won’t be much for us to use.

When we visit the page and refresh the page again, we can see that much of the data comes from the in-memory cache

One important cached resource in the memory cache is the resource downloaded by preloader-related instructions (e.g. ). It is well known that the preloader directive is one of the most common methods of page optimization, which can parse JS/CSS files while requesting the next resource online.

Note that memory caching does not care about the value of the HTTP Cache header cache-Control that returns the resource, and the matching of the resource does not only match the URL, but may also check the content-Type, CORS, and other characteristics.

3.Disk Cache

A Disk Cache is a Cache stored on a hard Disk. It is slower to read, but everything can be stored on Disk, which is better than a Memory Cache in terms of capacity and storage timeliness.

Disk Cache coverage is by far the largest of all browser caches. Based on the fields in the HTTP Herder, it determines which resources need to be cached, which resources can be used without being requested, and which resources have expired and need to be re-requested. And even in the case of cross-site, resources with the same address once cached by the hard disk will not be requested again. Most of the Cache comes from Disk Cache, which we’ll discuss in more detail in HTTP protocol headers.

What files does the browser throw into memory? What goes into the hard drive? Opinions vary on the Internet, but here’s a good one:

For large files, most likely not stored in memory, and vice versa is preferred
If the current system has high memory usage, files are preferentially saved to hard disks

4.Push Cache

Push Cache is HTTP/2 and is used when all three caches fail. It only exists in the Session, is released once the Session ends, has a short cache time (about 5 minutes in Chrome), and does not strictly implement the cache instructions in the HTTP header.

If none of the above caches is hit, then you have to make a request to fetch the resource.

Therefore, for the sake of performance, most interfaces should choose a good cache policy. Generally, browser cache policy is divided into two types: strong cache and negotiated cache, and the cache policy is implemented by setting HTTP headers.

4. Apply caching policies in actual scenarios

1. Frequently changing resources

Cache-Control: no-cache

For resources that change frequently, cache-control: no-cache is used to make the browser request the server each time, and then ETag or Last-Modified is used to verify that the resource is valid. This does not save the number of requests, but it can significantly reduce the size of the response data.

2. Resources that are not constantly changing

Cache-Control: max-age=31536000

Such resources are typically processed by setting their cache-control to a large max-age=31536000 (a year) so that subsequent browser requests for the same URL will hit the forced Cache. In order to solve the update problem, dynamic characters such as hash and version number need to be added to the file name (or path), and then dynamic characters need to be changed to change the reference URL, so that the previous mandatory cache is invalidated (it is not immediately invalidated, but is no longer used). Libraries that are available online (jQUERy-3.3.1.min.js, lodash.min.js, etc.) use this pattern

Thank you

Thank you for reading this article. I hope it will be helpful to you. If you have any questions, please point out.

I’m a pumpkin (✿◡ ◡), thumbs up if you think it’s ok ❤.