Front-end interview Prep 06: Browser caching

1. Browser caching process

The process of communication between the browser and the server is a reply mode: the browser makes a request and the server returns a result based on the parameters in the request. Caching reduces data transfer and improves access efficiency. After receiving the request result from the server for the first time, the browser decides whether to cache the result based on the CACHE identifier of the HTTP header in the response packet. The specific process is as follows:

Each time the browser initiates a request, it first looks up the results and caches the identity in the browser cache.
Each time the browser receives the result of the returned request, it stores the result and the cache id in the browser cache.

These two points ensure that each request is cached and read, and once you understand the rules of browser caching, everything becomes clear. The caching process can be divided into two parts, strong caching and negotiated caching, depending on whether an HTTP request needs to be re-sent to the server.

2. Browser cache mechanism

Browser cache policies are divided into strong cache and negotiated cache. The process of requesting cache is as follows:

Browsers load resources based on the request headerexpiresandcache-controlCheck whether the strong cache is matched. If the strong cache is matched, the resources are directly read from the cache without sending requests to the server.
If the strong cache is not hit, a request is sent to the server, passinglast-modifiedande-tagVerify that the resource matches the negotiated cache. If so, the server returns the request, but does not return data for the resource, and still reads the resource from the cache.
If the negotiated cache also does not hit, the resource is loaded directly from the server.

The similarity between the two caches is that if a hit is made, the resource is loaded from the client, not from the server. The difference is that the strong cache does not send a request to the server, whereas the negotiated cache sends a request to the server.

The parameters of the response header are explained in detail:

2.1 strong cache

The difference between Expires and cache-Control is that the former is an HTTP1.0 protocol and the latter is an HTTP1.1 protocol, and the latter takes precedence over the former.

2.1.1 Expires

Expires is a header that indicates that a resource has expired. It describes an absolute time that is returned by the server.

Expires: Wed, 11 May 2018 07:20:00 GMT
Copy the code

2.1.2 Cache-Control

The cache-control header is optional and can be used for requests and responses.

Cache-control is the most important rule in HTTP1.1. It controls the caching of web pages.

public: All content is cached (both client and proxy servers are cacheable)
private(Default): All content can only be cached by the client, cache-control
no-cacheThe actual mechanism is that resources are still cached, but each time a cached resource MUST be validated with the server before being used. That is, negotiation caching
no-store: None of the content is cached, either by force or negotiation.
max-age=xxx(XXX is Numeric) : The cache contents will expire after XXX seconds, and if the cached resources are less than the specified number of times, the client will accept the resources directly from the community. If 0 is specified, you usually need to request the server directly
Min-fresh =60(in seconds): Requires the cache server to return cached resources that have not been cached for at least a specified time. For example, resources that do not expire within 60 seconds are required to return.
s-maxage(unit: s) : The same as max-age,This parameter is valid only on the proxy server(such as CDN cache). Such as whens-maxage=60 During this 60 seconds, even if updatedCDNThe browser will not request the content.max-ageFor normal caching, whiles-maxageUsed for proxy caching.s-maxageValue has a higher priority than max-age. If there is an S-maxage, it overwrites itmax-ageandExpires header.
must-revalidate: If yesmax-ageCache is used when the resource is smaller than max-age, otherwise the resource needs to be validated.
Max-stale =3600(unit: second): indicates that cached resources are received even if they are expired.
only-if-cached: indicates that the client will only ask the target resource to be returned if it is cached locally by the cache server. In other words, the directive requires that the cache server not reload the response or revalidate the resource.
proxy-revalidate: requires all cache servers to validate the cache again before receiving a response from a client with this directive
no-transform: specifies that the cache cannot change the media type of the entity body, either in the request or the response

2.1.3 Flow diagram of strong cache

2.2 Negotiated Cache

When the strong cache is Not hit, the browser sends a request to the server to verify that the negotiated cache is hit. If the negotiated cache is hit, the request response returns an HTTP status of 304 and displays a Not Modified string.

The negotiated cache is managed using last-Modified, if-Modified-since and ETag, if-none-match headers.

2.2.1 last-modified, the if-modified-since

Last-modified indicates the date the local file was Last Modified. The browser will add if-modified-since to the request header and ask the server If the resource has been updated Since that date. Updates send the new resource back. The drawback of this parameter is that it causes last-modified if the cache file is opened locally. Hence the ETag in HTTP1.1.

2.2.2 ETag, if-none-match, if-match

Etag is a hash value generated by the server for a specific resource and used as a unique identifier for the resource. Resource changes will result in the transformation of ETag, regardless of the last modification time.

If-none-match’s header sends the last returned ETag to the server, asking If the resource’s ETag has been updated, and sending a new resource back If it has changed.

If-match tells the server to Match the entity tag of the resource, and the server cannot use the weak Etag value. The server executes the request only when the IF-match field value and the Etag of the resource are consistent, otherwise it returns the feed Fialed, 412 Feed. You can also ignore this value by using “*”

ETag has a higher priority than Last-Modified for several reasons:

Periodic file changes, where only the time has changed and the contents have not changed, are expected to be read by the client from the cache
The originalExpiresYou can only control the file modification frequency to the level of seconds. If the file is modified N times within 1s, you cannot determine how many times the file is modified.
Some servers do not know exactly when a file was last modified.

2.2.3 Strong ETag and Weak ETag

Strong ETag: Changes the value of an entity no matter how slight a change it makes
Weak ETag: indicates whether the resources are the same or not. The value of ETag is changed only when the resources are fundamentally changed. In this case, the value of ETag is appended at the beginning of the field valueW/

2.3 Browser Status Code

200: strong slowExpires/Cache-ControlWhen storage fails, a new resource file is returned
200(from disk cache /from Memory cache): strong cacheExpires/Cache-ControlBoth are present, not expired,Cache-ControlGive priority toExpiresIs displayed, the browser successfully obtains resources from the local PC
304(Not Modified) : Negotiated cacheLast-modified/EtagWhen there is no expiration, the server returns status code 304

2.4 Heuristic caching

There are no caching fields — no caching policies are set
Typically 10% of the Date minus the last-modified value in the response header is taken as the cache time

2.5 Actual Scenarios

The general order is as follows:

Cache-control — before requesting the server
Expires – Before a request to the server
If-none-match (Etag) — Request server
If-modified-since (last-modified) — Request server

Negotiated caching needs to be used in conjunction with strong caching. If strong caching is not enabled, negotiated caching is meaningless

Most Web servers have negotiated caching enabled by default, and both [last-modified, if-modified-since] and [ETag, if-none-match] are enabled.

But the following scenarios need to be noted:

Last-modified files must be consistent across multiple machines in a distributed system to avoid load balancing on different machines resulting in failed comparisons. Distributed systems turn off ETAGS as much as possible (eTAGS are generated differently for each machine);

2.6 summary

The overall cache request flow is as follows:

3. Position of the browser cache

At the end of the previous section, different cache read locations for the 200 status code were mentioned. In this section, to sort out the relevant knowledge points.

From the point of view of the cache position, there are four kinds, and each has its priority. When the cache is searched in sequence and none is hit, the network will be requested. Are:

Service Worker
Memory Cache
Disk Cache
Push Cache

3.1 the Service Worker

Service Workers essentially act as a proxy server between the Web application and the browser, and can also act as a proxy between the browser and the network when the network is available. They are designed, among other things, to enable the creation of an effective offline experience, intercept network requests and take appropriate action based on whether the network is available and whether updated resources reside on the server. They also allow access to push notifications and background synchronization apis. — MDN DOCS

For a detailed introduction to the Service Worker, a separate article can be compiled to introduce its use and principle. Here we briefly mention a few. (Digging)

Service workers must use HTTPS protocol to ensure security because they are involved in request interception. Service workers can freely control which files to cache, how to match the cache, how to read the cache and establish a sustainable cache.

Service Worker implementation caching can be roughly divided into three steps:

registeredService Worker.
Listening to theinstallThe event
Cache the required files, so the next time the user access can be queried by intercepting the request whether there is a cache, there is a cache can be directly read cache files, otherwise to request data.

When the Service Worker does not hit the cache, we need to call the fetch function to fetch the data. That is to say, if the cache is not caught in the Service Worker, the cache will be hit according to the browser’s cache policy. However, no matter where we get the data from, The browser will display what we fetched from the Service Worker.

3.2 the Memory Cache

Memory Cache refers to the Memory Cache, which contains the resources captured in the current page, such as CSS, JS, and images that have been downloaded on the page. Memory cache is fast to read, but persistent, and will be released as the process is released. Once we close the TAB page, the cache in memory is freed.

The memory cache contains an important cache resource, which is downloaded by preload related instructions. This is a common method of page optimization, where you can parse js/ CSS files while requesting the next resource.

It is important to note that memory caching does not care about the HTTP Cache header cache-Control value of the returned resource, and the matching of resources is not only for URL matching, but also for content-Type,CORS and other characteristics.

3.3 Disk Cache

Disk Cache is a hard Disk Cache with slow read speed, large capacity, long duration, and wide usage.

This cache is the most widely used of all caches and is the primary storage location for our cache policy in Section 2. What files are stored in the Memory Cache and what files are stored in the Disk Cache depends on the browser policy. The general logic may be:

For large files, there is a high probability that they will not be saved in memory, and vice versa is preferred
When the memory usage is high, files are preferentially stored on hard disks.

3.4 Push the Cache

Push Cache is an http2 thing that is used when all three caches fail. It only exists in the Session, is released once the Session ends, has a short cache time (about 5 minutes in Chrome), and does not strictly implement the cache instructions in the HTTP header.

Here are a few general conclusions for understanding (source):

All resources can be pushed and cached, but Edge and Safari support is relatively poor
You can push no-cache and no-store resources
Once the connection is closed, the Push Cache is released
Multiple pages can use the same HTTP/2 connection and thus use the same Push Cache. This depends on the browser implementation. For performance reasons, some browsers may use the same HTTP connection for different tabs with the same domain name.
The Cache in a Push Cache can only be used once
The browser can refuse to accept an existing resource push
You can push resources to other domains

4. Actual scenarios and user behaviors

Frequently changing resources:Cache-Control: no-cache
A resource that is not constantly changing:Cache-Control: max-age=31536000

There are three main types of user behavior:

Open the web page: check whether there is a match in the disk cache. Use if available; If no network request is sent.
Plain flush: Since TAB is not closed, memory cache is available and will be used preferentially (if a match is made). Disk cache comes next
Forced refresh: The browser does not use caching, so requests are sent with cache-control: no-cache(with Pragma: no-cache for compatibility) in the header, and the server returns 200 and the latest content.

Refer to the link

Cache (1) to (5)
Thoroughly understand the browser caching mechanism
Have an in-depth understanding of the browser caching mechanism
The Way of front-end interviews # Caching