This is the 28th day of my participation in the August Text Challenge.More challenges in August

Introduction to the

In order to improve the speed and efficiency of website access, we need to design a variety of caches, through the cache can avoid unnecessary additional data transmission and requests, thus improving the speed of website requests. For THE HTTP protocol, HTTP caching comes with it.

Today we’ll take a deeper look at caching mechanisms and use in HTTP.

Type of cache in HTTP

Cache saves a copy of the requested resources locally. In this way, the copy is returned to the server when the next request is made, eliminating the need to download resources from the server. This reduces resource transfer and improves efficiency.

In addition to directly accessing and returning resources, HTTP caches can be divided into two types. One is a shared cache, which means that different clients can obtain resources from this shared cache and these resources are accessible to multiple clients. There is also a private cache, which means that the cache can only be accessed privately by users or clients and is not accessible to other users.

It is easy to understand that the caches in common browsers are basically private caches. These caches are unique to the browser and are not shared with other browsers.

Shared cache is mainly used on some web proxy, such as web proxy server, because web proxy servers may provide resource service for many users, for these users to access resources to each user to save a needless, only need a can, kept in a web proxy server that can reduce resource copy is invalid.

The status of the cached response in HTTP

For HTTP caches, GET requests are typically cached, because GET requests have no additional parameters other than the URI and are meant to fetch resources from the server.

Different GET requests return different status codes.

If the resource is returned successfully, 200 is returned indicating OK.

If it is a redirect, 301 is returned. If it is an exception, 404 is returned. If the result is incomplete, 206 is returned.

Cache control in HTTP

Cache control in HTTP is represented by HTTP headers. Cache-control was added to HTTP1.1 to Control the caching of requests and responses.

If caching is not required, use:

Cache-Control: no-store
Copy the code

If you need to validate the client’s cache, use:

Cache-Control: no-cache
Copy the code

If you want to force validation, you can use:

Cache-Control: must-revalidate
Copy the code

In this case, expired resources will not be allowed to be used.

For the server, cache-control can be used to Control whether the Cache is private or public:

Cache-Control: private
Cache-Control: public
Copy the code

Another very important cache control is expiration time:

Cache-Control: max-age=31536000
Copy the code

By setting max-age, you can override the Expires header to indicate, for example, that the resource is up to date and doesn’t need to be fetched from the server again.

Cache-control is a header field defined in HTTP1.1. There is a similar field called Pragma in HTTP1.0. Pragma: no-cache can be set to a Pragma similar to cache-control: no-cache. That is, forcing the client to resubmit the cache to the server for validation.

However, Pragma is not included for server-side responses, so Pragma is not a complete replacement for cache-control.

Cache refresh

After the cache is stored on the client, it can be used on request. However, to be safe, we need to set an expiration time for the cache. The cache is valid only before the expiration time. If the expiration time is exceeded, we need to retrieve it from the server.

This mechanism ensures that the resources obtained by the client are always up to date. In addition, resource updates from the server can be timely delivered to the client.

If the resource on the client is in an expired state, the state of the resource is fresh; otherwise, the state of the resource is stale.

If a stale resource is stale, the server does not immediately clear the stale resource. Instead, it sends an if-none-match request to the server in the next request to determine whether the resource is still fresh on the server. If the resource has not changed, the server sends an if-none-match request to the server. 304 (Not Modified) is returned, indicating that the resource is still valid.

The duration of fresh is determined by “cache-control: max-age=N”.

If the header is not present in the response, the Expires header is checked to see if it exists, and if so, the fresh time can be calculated using Expires Date.

If you don’t even have an Expires header in the response, how can you determine the fresh time of a resource?

In this case, the last-Modified header is searched, and if it exists, the fresh time is (date-last-Modified) / 10.

revving

In order to improve the efficiency of HTTP requests, we certainly want to cache as long as possible, but as mentioned earlier, too long caching times can make it difficult to update server resources. How do you solve it?

For files that are not frequently updated, the URL to request them can be determined by filename + version number. The same version number means that the resource content is fixed and can be cached for a very long time.

When the content of the server resource changes, you simply update the version number upon request.

Although such an operation would result in a modification of the server resource and the version requested by the client, with the help of modern front-end packaging tools, this is not a major problem.

Cache invalidation

When the cached resources expire, you can request the resources from the server again or verify the cached resources again.

Of course, revalidation requires server support and the “cache-control: must-revalidate” header.

How does a client verify that a resource is valid? Obviously, we can’t send resources from client to server for validation, which is too complicated and wastes resources in case of large file requests.

An easy way to do this is to hash resource files and simply send the results of the hash for comparison.

Of course, in HTTP, an ETags header is provided, which can be used as a unique tag for resources to be validated on both the client and server sides. The client can then request an if-none-match and let the server determine whether the resource is a Match. This judgment is called strong check.

As an alternative to weak verification, If last-Modified is included in the response, the client can request an if-Modified-since to ask the server If the file has changed.

For the server side, it can choose whether or not to verify the file, if not, it can directly return a 200 OK status code, and directly return the resource. If checked, a 304 Not Modified is returned, indicating that the client can continue to use the cached resource, as well as some other header fields, such as the expiration date of the updated cache.

Than the response

When the server responds, you can include the Vary header. The Vary header value is a key in the response header, such as Content-Encoding, which indicates that an Encoding resource is cached.

For example, the client first requests:

GET /resource HTTP/1.1
Accept-Encoding: * 
Copy the code

The server side returns:

HTTP/1.1 200 OK
Content-Encoding: gzip
Vary: Content-Encoding
Copy the code

The resource will be cached along with the Content-Encoding of type GZIP.

When the customer asks again:

GET /resource HTTP/1.1
Accept-Encoding: br
Copy the code

Because the encoding of the cached resource is gZIP, which is different from the encoding accepted by the client, we need to get it from the server again:

HTTP/1.1 200 OK
Content-Encoding: br
Vary: Content-Encoding
Copy the code

At this point, the client caches another resource in BR format.

The next time the client requests a BR-type resource, the cache is hit.

To summarize, Vary means to differentiate and cache resources using other types such as encoding.

But this will also cause the problem of duplicate storage of resources, the same resource because of different encoding format is cached many copies. To solve this problem, resource requests need to be standardized.

Standardizing means that the request encoding is checked before the request is requested, and only one of the encoding is selected to avoid multiple cache situations.

conclusion

This is the end of the HTTP cache introduction, you can in the actual application of HTTP cache to deepen the understanding.

This article is available at www.flydean.com/04-http-cac…

The most popular interpretation, the most profound dry goods, the most concise tutorial, many tips you didn’t know waiting for you to discover!

Welcome to pay attention to my public number: “procedures those things”, understand technology, more understand you!