Front end 123: How browser caching works

>> Blog post

Workflow for browser caching

Getting content over the web is slow and expensive. Large responses require multiple round trips between the client and server, which delays the browser getting and processing the content, and can increase traffic charges for visitors. Therefore, the ability to cache and reuse previously acquired resources becomes a key aspect of performance optimization.

Here’s a look at the most familiar Devtools network diagram:

The cyan, green, and orange circles indicate data from memory, disk, and Http requests (not cached), as well as a request with return code 304 that also fetches data from memory/ Disk. The difference between the 304 and the memory/disk cache is: If the resource has not been updated, the browser will return 304 code. When the browser receives 304 code, it will update the expiration date of the resource and retrieve the current resource directly from the disk/memory cache. In other words, if the resource has not expired, The browser skips verifying the resource to the server and goes straight to the memory/disk cache.

The general process is as follows:

1) Check whether the Service Worker Cache exists first, and proceed to the next step if there is no Service Worker Cache
2) Check whether there are resources in memory, if there are resources, directly load (from memory-200).
3) If the memory is not available, the disk is loaded directly. If the memory exists and has not expired, the disk directly sends a request to the server to obtain resources. If the resource is not updated, the server returns 304, and the browser retrieves the resource from the hard disk cache and updates the expiration time /Etag/ last-Modified. If the resource is updated, it retrieves the latest resource and returns the resource via HTTP request, re-caching the resource and updating the expiration time /Etag/ last-Modified.
4) If the hard disk does not have one, an HTTP network request is sent to the back end.
5) Cache the loaded resource to hard disk and memory, and update the expiration time of the resource /Etag/ last-modified.

Service Worker Cache has higher priority, more complex data control, and highest operation freedom. Memory Cache emphasizes a kind of Cache storage method and browser Memory Cache strategy. HTTP Cache can also be called Disk Cache depending on the storage mode of Memory Cache. It depends on the whole HTTP Cache verification process (strong Cache and negotiated Cache), and finally determines when to read from the Cache and when to update resources from the server. Push Cache has little information and is not widely used.

Service-worker Cache(highest priority)

A Service Worker is a separate thread that runs behind the browser and is typically used for caching. To use the Service Worker, the transport protocol must be HTTPS. Since request interception is involved in Service workers, the HTTPS protocol must be used for security. The Service Worker’s cache differs from other built-in caching mechanisms in that it gives us control over which files are cached, how the cache is matched, and how the cache is read, and the cache is persistent.

There are three steps to realize the caching function of Service Worker: The Service Worker needs to be registered first, and the required files can be cached after the install event is monitored. Then the next time the user accesses the server, it can intercept the request to check whether there is cache. If there is cache, it can directly read the cache file, otherwise it will request data.

When the Service Worker does not hit the cache, we need to call the fetch function to fetch data. That is, if we do not hit the cache in the Service Worker, we will look up the data based on the cache lookup priority. But whether we fetch data from the Memory Cache or from a network request, the browser displays what we fetch from the Service Worker.

Memory Cache(priority)

The Memory Cache is the Cache in Memory that contains the resources that have been obtained on the current page, such as the styles, scripts, and images that have been downloaded on the page. It is definitely faster to read data from memory than from disk. Although memory cache reads data efficiently, it has a short cache duration and will be released as the process is released. Once we close the Tab page, the cache in memory is freed. Memory caching of cached resources does not care about the value of the HTTP response header cache-control that returns the resource. In other words, this is a form of caching that is strongly dependent on the browser’s local memory management strategy, and is treated slightly differently by browsers.

Memory Cache follows these policies:

For large files, most likely not stored in memory, and vice versa is preferred
If the current system has high memory usage, files are preferentially saved to hard disks

* HTTP Cache(priority)

HTTP caches are classified into strong cache and negotiated cache according to their working modes. The browser checks whether the strong cache matches and attempts to negotiate the cache only when the match fails.

1) Strong cache

> HTTP 1.0 era – Expires

When we retrieve a remote resource from the server through the browser, the server requests Response Headers via HTTP to return an expires timestamp field (shown in blue), such as Expires: Wed, 13 Oct 2021 22:15:05 GMT indicates that the resource expires at 22:15:05 GMT on Wednesday, October 13, 2021 (Beijing time +8h= GMT). The resource is read from the cache (if it exists in the cache), or the request is re-sent to the server. The way expires works requires that the client-server time error be small, or a cache update policy may not take effect for a short period of time.

> HTTP 1.1 times-cache-control

Cache-control: max-age is also implemented using the response headers field carried by the server when the resource is returned. For example, cache-control: Max-age =31536000, indicating that the resource expires 31536000 seconds after the browser receives the resource. Unlike the timestamp returned by Expires, cache-Control simply returns a time length to avoid time errors, and the browser can make an accurate determination based on a local time difference. I. Public /private: In large architectures that rely on various proxies, we have to consider proxy server caching. Public and private are used to control whether the proxy service cache can cache resources. If we set the resource to public, it can be cached by both the browser and the proxy. If we set private, the resource can only be cached by the browser. Private is the default, but proxy caching can work if only S-maxage is set. Ii. s-maxage: specifies the validity of the cache on the proxy server (e.g. the cache CDN). Cache-control: specifies the validity of the cache on the proxy server (e.g. the cache CDN). Max – age = 3600, s – maxage = 31536000. Iii. No-cache: After no-cache is configured for resources, the browser does not inquire about the cache status every time a request is sent. Instead, the server directly checks whether the resource is expired and negotiates the cache. Iv. No-store: no cache policy is used. Each request is directly obtained from the server, and no resource cache is performed on the browser client.

> Cache-control and Expires coexist

Cache-control takes precedence. When cache-Control and Expires are present at the same time, cache-control prevails, but downward compatibility is an option to use both caching strategies.

2) Negotiate cache

Negotiation cache depends on the communication between the server and the browser. The browser stores the Response Headers field of the HTTP request upon the first resource acquisition: Last-modified/Etag: when the strong cache is not hit, its value is used as the flag bit carried by the browser and the server to determine whether the resource has expired. If the server determines that the resource has expired, it will re-download the resource and update the corresponding flag bit. If the resource is not updated, a 304 status code is returned and the browser reuses the client cache resource.

< p style = “max-width: 100%; clear: both; min-height: 1em

Last-modified is the timestamp returned with the HTTP response header on the server indicating the Last time a resource was updated. If the client requests a resource, add the request headers if-modified-since field to the server to verify whether the resource is updated. If the request headers field is last-modified, add the request headers if-modified-since field to the server to verify whether the resource is updated. Wed, 13 Jan 2021 15:34:55 GMT. There are some disadvantages to using last-modified: I. Hit miss 1: When we update a resource file on the server, but the actual content does not change, the corresponding resource update timestamp will change. If the server file does not change, the browser will judge by the timestamp, which will cause the resource to be completely redownloaded. Ii. hit error 2: if-modified-since only the time difference in seconds can be detected, and the file changes cannot be detected within 1s. This will cause some browser cache updates to be delayed.

> Etag and if-none-match

Etag is a new negotiated cache to compensate for last-Modified defects. Etag is the unique identifier of the resource returned with the HTTP request header on the server. For example, Etag: W/” 2A3B-1602480F459 “is generated based on the content of the resource. The Etag value does not change even after multiple updates as long as the content remains unchanged. If-none-match: W/” 2a3B-1602480f459 “the next time the browser requests the resource, the Request Headers field carries the same value with the name if-none-match.

> EtagIn the sense of file changesLast-ModifiedIt’s more accurate, it’s a higher priority, butEtagIt can be used as an auxiliary negotiation cache with the former, which can consume some server performance. whenEtagandLast-ModifiedCoexist withEtagShall prevail.

Push Cache(lowest priority)

Push Cache refers to the HTTP2 Cache that exists during the server Push phase:

Push Cache is the last line of defense for caching. The browser will only ask for Push Cache if the Memory Cache, HTTP Cache, and Service Worker Cache all miss.
Push Cache is a Cache that exists during the session and is released when the session terminates.
Different pages can share the same Push Cache as long as they share the same HTTP2 connection.

Front end 123: How browser caching works

Workflow for browser caching

Service-worker Cache(highest priority)

Memory Cache(priority)

* HTTP Cache(priority)

Push Cache(lowest priority)

Related Posts

Play Next. Js for server-side rendering

Brief introduction to Vue style front-end development specification

Using protobuf (HTTP request)