preface

The browser cache, which comes up frequently in interviews, is a very simple and efficient way to optimize performance. So it is necessary for us to understand and master it. Let’s learn together with me.

The cache

The cache is the memory that can exchange data at high speed. It exchanges data with the CPU before the memory, so the speed is very fast. This is baidu Baike’s explanation of caching. In my opinion, the browser cache can also be used as a way to exchange data with clients before the server (using the cache without sending requests), but at a very fast rate. This also makes it a simple and efficient way to optimize performance. This means that the browser saves a copy of the resource that the client gets from the server to a specific location. When the browser needs the resource again, depending on the specific caching policy, it has the opportunity to not have to go to the server and ask for it. The result is easy, but the process is not. Where is the resource cache? Are resources cacheable? Are cached resources used all the time? Let’s explore cache locations and cache strategies.

The cache location

If you open chrome developer tools and check Network, you can see the (disk cache), (memory cache) and (memory cache) fields in the Size TAB, which indicates that these resources are either from the cache or from the server

  • Service Worker
  • Memory cache
  • Disk cache
  • Push cache

Service Worker

One of the core features of the increasingly popular PWA technology is the Service Worker, which essentially acts as a proxy server between client and server. Service workers can intercept network requests and make different cache strategies according to network conditions. Different from memory cache and Disk cache inside browsers, Service workers give us greater freedom. This gives us freedom to control the cached files, cache matching and read rules, and the cache is persistent even if the TAB page or browser is closed, unless the API is manually called or the browser is emptied due to capacity overruns.

If the Service Worker does not hit the cache, we need to call the fetch() method to continue fetching the resource. Then the browser will go to the memory cache or Disk cache to find the resource according to the priority. But any content retrieved after calling this function, whether from another cache or from a request, will be displayed as retrieved from the Service Worker.

Also note that for security reasons, Service workers can only be used in HTTPS or local localhost environments because the ability to modify network requests is too dangerous to expose to man-in-the-middle attacks.

To learn more about Service workers, click here

Memory cache

Memory cache is a Memory cache, which can be read faster than disk cache. Almost all resources (mainly js/ CSS files parsed by preloader), One side of the network requests the next resource, preload resources, the current page has fetched resources) can be cached in Memory cache, but the Memory cache space is limited, cannot store all resources in Memory cache. When the TAB page is closed, the memory cache is invalidated. Sometimes you don’t even have to wait for the TAB page to close (the browsing is taking up too much memory), and the previous cache will be invalidated as the later cache enters. So a Memory cache is just a transient store.

If there are multiple requests for the same cache, the cache will be requested at most once, and it does not care about the cache-control value of the HTTP request header. The Control of the Memory cache is not controlled by the browser. When matching resources, it is not knowledge to match urls, and other characteristics such as Content-Type and CORS may be checked.

Disk cache

A Disk cache, also known as an HTTP cache, is slower than an in-memory cache (or at least faster than requesting resources from a server), but it has far more space than an in-memory cache and is permanently present in the file system. Most of the cache comes from disk cache. It also allows the same resources to be used across sessions and sites.

Disk cache strictly follows the cache policy (which we’ll discuss next), determining whether a resource is cacheable, used, and expired based on the fields in the HTTP header, and acting accordingly.

Push cache

Push cache is an outgrowth of HTTP/2, and since HTTP/2 is not yet very popular, I’ll cover it briefly. Push caches are closely related to HTTP connections. Each HTTP connection has its own Push cache. Since multiple pages can use the same HTTP connection, this means that multiple pages can use the same Push cache. The lifetime of the cache in the Push cache is determined by the HTTP connection, which means that the Push cache is only valid until the connection is closed. Therefore, the connection may be closed before the resource pushed by the server arrives, and thus the connection needs to be re-established, causing unnecessary trouble. Therefore, it is better to use Push cache to store some resources for immediate use. Finally, these resources can only be used once, and when the browser retrieves the resource from the Push cache, the Push cache removes it.

Caching strategies

Cache strategies are discussed in terms of Disk cache, which fall into two main categories:

  • Strong cache
  • Negotiate the cache

Strong cache

Strong caching, which I personally define as “tough caching”, requires caching as long as the Cache does not Expire. It relies on HTTP header Expire and cache-control fields.

Expire is a residual HTTP/1.0 field used to specify the expiration time of resources. The server does not need to request resources from the server before the expiration time. However, it has a disadvantage, that is, it is limited by the local time, if you change the local time, may cause cache invalidation. Hence the cache-control field in HTTP/1.1.

Cache-control is the most important rule in HTTP/1.1. It is mainly used to Control the Cache. The main instructions in the header are:

  • Public: Cache can be provided to any party, that is, the client and proxy server can cache. 【 Respond to instructions 】
  • Private: Only certain users are allowed to cache, that is, only clients are allowed to cache. 【 Respond to instructions 】
  • No-cache: the validity of the cache must be confirmed before the cache is used. That is, the cache can be used, but the data must be confirmed to the server before the cache is used (negotiated cache). 【 Request instruction, response instruction 】
  • No-store: does not cache any content of a request or response. That is, no caching
  • Max-age =[seconds] : indicates the maximum age value of the response, that is, the expiration time of the cache. For example, max-age=50 indicates that the cache will expire after 50 seconds.
  • S-maxage =[SEC] : Specifies the maximum Age value for the response of the public cache server. It is the same as max-age, but only applies to proxy servers. S-maxage takes precedence over max-age, which overwrites max-age and Expire if it exists.
  • Max-stale (=[SEC]) : Receives an expired response; That is, the number of seconds allowed to expire. If no seconds are added, it is acceptable for resources to expire at any time. 【 Request instructions 】
  • Min-fresh =[seconds] : expect the response to be valid within the specified time; For example, min-fresh=50 is the current time plus 50 seconds. If the cache has exceeded the expiration time, it will not be used.

HTTP/1.1 sets the expiration time of the Cache by setting the parameter of the max-age directive of cache-control. Max-age takes precedence over Expire. However, in order to be compatible with HTTP/1.0, both fields should be set in the actual project.

Negotiate the cache

When the strong Cache expires, or when cache-control is set to no-cache, the negotiated Cache occurs. Negotiation cache, as the name implies, needs to be discussed before caching. The negotiated Cache relies on the last-Modified and if-Modified-since, Etag and if-none-match directives of cache-control.

When the browser accesses a resource for the first time, the server adds the last-Modified directive to the Header that returns the resource

  • Last-modified: Indicates the time when the resource was Last Modified

The client caches the resource (including the Header), and when the browser next requests the resource, it passes the last-Modified cache to the server as an argument to the if-Modified-since directive in the request Header

  • If-modified-since: compares the resource update time

If the server receives if-modified-since, the server will compare it with the last Modified time of the corresponding resource in the server. If it is the same, the 304 Not Modified status code will be returned, indicating that the resource has Not changed and can be cached. If not, it returns a 200 OK status code and carries the changed resource. After the client gets the resource, it will re-cache the resource.

But last-Modified is flawed

  • Last-modified is minimum in seconds. If a file is Modified within a second, the server still considers the resource hit and does not return the correct resource
  • If the cache file is opened locally, last-Modified is Modified even if the contents are not changed, causing the server to fail to hit the cache and send the same resource
  • If a file is dynamically generated by the server, last-Modified is always the update time of the file. Although the file may not have changed, it does not serve as a cache

Due to a last-Modified defect, HTTP/1.1 added two directives, Etag and if-none-match.

  • Etag: Matching information of a resource. It is a way to uniquely identify a resource as a string. The server assigns an Etag value to each resource. When the resource is updated, the Etag value needs to be updated. When ETag values are generated, there is no uniform algorithmic rule, but only allocation by the server. ETag can be divided into strong ETag value and weak ETag value. A strong ETag value that changes no matter how subtle the entity changes. A weak ETag value is used only to indicate whether the resources are the same. A weak ETag value appends W/ to the beginning of the field value.

When a client accesses a resource, the server adds an ETag directive to the Header of the returned resource. After receiving the response, the client caches the ETag of the resource and the ETag of the response. When accessing the server again, the client uses the ETag of the resource as the parameter of the if-none-match directive in the request Header.

  • If-none-match: compare entity tag (ETag)

If the server receives an if-none-match, it compares the ETag of the resource in the server. If the Match is the same, the server returns the 304 Not Modified status code. If the Match is different, the server returns the 200 OK status and carries the changed resource in the request body. The client will replace the cached resource and ETag value.

summary

This article on the browser cache knowledge summary has been more comprehensive, I hope to bring you help, if you find mistakes or improper in the process of reading, please feel free to point out, if you think the writing is good, really learned the knowledge, you can reward a small praise, let us work together, come on!!