Web cache can be roughly divided into database cache, server-side cache (proxy server cache, CDN cache), and browser cache. Browser caches also contain many things: cookies, local caches, HTTP caches, and so on.

cookie

1. What is cookie?

Cookie is a special string stored in the client. It exists in the form of key-value pairs and is mounted under the Document object. You can directly use Document. cookie to obtain and set the relevant cookie information.

2. What is the operating mechanism of cookie?

The HTTP protocol itself is stateless, and the server cannot determine the user’s identity. Therefore, we often use cookies to verify the basic information and identity information of users.

When a user visits and logs in a website for the first time, the setting and sending of cookies will go through the following four steps: When the client sends a request to the server -> the server sends an HttpResponse response to the client containing the set-cookie header -> The client saves the Cookie and then sends an HTTP request containing a Cookie to the server -> the server returns the relevant data.

3. Specific attributes and uses of cookies?

For Document. cookie, it mainly contains five attributes, as follows:

The attribute name The purpose of the attribute Matters needing attention
NAME=VALUE Key/Value pair, you can set the Key/Value to save The NAME cannot be the same as the NAME of any other property item, otherwise the new property overwrites the previous one
Expires Set the cookie expiration time (unit: s), that is, the cookie will expire and be deleted by the client after the expiration time The Cookie reads and writes this property using getMaxAge() and setMaxAge(int maxAge). MaxAge has three values, which are positive, negative and 0 respectively.A positive numberIndicates the expiration time when the maxAge attribute isA negative number(will be created), indicating that the Cookie is only a temporary Cookie and will not be persisted. The Cookie is valid only in the browser window or the child Windows opened by the browser window. After the browser is closed, the Cookie becomes invalid immediately0Is not created, indicating that the Cookie is deleted immediately
Domain Set the domain name that can manipulate cookies
Path The default path is /, indicating that all pages in the root directory have the right to operate cookies
Secure Cookie information is transmitted in secure mode If this property is set, the Cookie will only be returned during an SSH connection

4, cookie method encapsulation

For cookies, there is no built-in method of its own. If the cookie is operated, it needs to operate directly on document.cookie. Currently commonly used cookie plug-ins are encapsulated as follows: Write cookies

function setCookie(name,value) { var Days = 30; var exp = new Date(); exp.setTime(exp.getTime() + Days*24*60*60*1000); document.cookie = name + "="+ escape (value) + "; expires=" + exp.toGMTString(); }Copy the code

Read the cookies

function getCookie(name) { var arr,reg=new RegExp("(^| )"+name+"=([^;] (*). | $) "); if(arr=document.cookie.match(reg)) return unescape(arr[2]); else return null; }Copy the code

Delete the cookies

function delCookie(name) { var exp = new Date(); exp.setTime(exp.getTime() - 1); var cval=getCookie(name); if(cval! =null) document.cookie= name + "="+cval+"; expires="+exp.toGMTString(); }Copy the code

The local cache

LocalStorage is a new technology introduced by html5, which mainly includes localStorage and SessionStorage. Details are as follows:

1. Built-in methods

LocalStorage and sessionStorage are two properties that are mounted under the Window object and have the same built-in methods:

SetItem (): creates or modifies the storage information. RemoveItem (): deletes a storage information. Clear (): clears all storage informationCopy the code

2. Operation mechanism

Web applications allow data to be stored on the user’s computer using an API implementation provided by the browser. So that the client store follows the “same origin policy”, therefore, different sites page will not be able to read each other stored data, and can be Shared each other between different pages of the same site is stored data, it provides us with a communication mechanism, for example, a page to fill out the form data can be displayed in another page. Web applications can choose the validity period for storing data. Temporary storage allows data to be saved until the current window closes or the browser exits. Permanent storage Data can be permanently stored on a hard disk.

Cookie, sessionStorage, localStorage difference

Cookie size is limited to about 4KB. Its main uses include storing login information, such as “remember password” when you log in to a website marketplace. This is usually done by storing a piece of data in a Cookie that identifies the user. SessionStorage: Maintains a storage area for each data source that exists for the duration of the current browser window opening, including page reloading or entering a same-origin page localStorage (long-term storage) : SessionStorage is the same as sessionStorage, but the data will still exist after the browser is closed.

features cookie localStorage sessionStorage
The life cycle of data Cookies are generally generated by the server and the expiration time can be set It is permanent unless it is cleared This parameter is valid only on the current session page and is cleared after you close the page or browser
Data storage size About 4 k Generally for 5 m
Communicates with the server Each time it is carried in the HTTP header for user authentication, storing too much data can cause performance problems It is saved only in the client and does not communicate with the server
Ease of use There is no way out of the box, you need to encapsulate yourself Native interfaces are acceptable and can be repackaged to provide better support for objects and arrays

HTTP cache

Before we dive into HTTP caching, let’s clarify a few terms:

  • Cache hit ratio: The ratio of the number of requests for data from the cache to the total number of requests. Ideally, the higher the better.
  • Expired content: Content that is marked as “stale” after the set expiration date. Usually expired content cannot be used to reply to client requests and must be re-requested to the source server for new content or to verify that cached content is still ready.
  • Validation: Verifies that the expired content in the cache is still valid. If the validation passes, refresh the expiration time.
  • Invalidation: Invalidation is the removal of content from the cache. Defunct content must be removed when it changes.

Browser caching is primarily a caching mechanism defined by the HTTP protocol. HTML meta tag, for example

This means that the browser does not cache the current page. But proxy servers do not parse HTML content, and HTTP headers are commonly used to control caching.

Browser Cache classification

The browser cache is divided into strong cache and negotiated cache. The simple process for the browser to load a page is as follows:

The browser determines whether a strong cache has been hit based on the HTTP header for the resource. If hit, the resource is added directly to the cache and the request is not sent to the server. If the strong cache is not hit, the browser sends a resource load request to the server. The server determines whether the browser’s local cache is invalid. If available, the server does not return the resource information and the browser continues to load the resource from the cache. If the negotiated cache is not hit, the server returns the full resource to the browser, which loads the new resource and updates the cache.

Strong cache

When a strong cache is hit, the browser does not send the request to the server. In Chrome developer tools, the HTTP return code is 200, but the Size column will display as (from cache).

Strong caching is controlled using the Expires or cache-Control fields in the HTTP return header to indicate how long a resource is cached.

Expires

Cache expiration time, used to specify the expiration time of resources, is a specific point in time on the server. That is, Expires=max-age + request time needs to be used in combination with last-Modified. But as we mentioned above, cache-control has a higher priority. Expires is a Web server response header field that tells the browser in response to an HTTP request that the browser can cache data directly from the browser before the expiration date without having to request it again.

This field returns a time, such as Expires:Thu,31 Dec 2037 23:59:59 GMT. This time represents the expiration time of the resource, which is valid until 23:59:59 on December 31, 2037, i.e., hitting the cache. One obvious disadvantage of this approach is that the expiration time is an absolute time, so when the client local time is changed, the server and client time deviation becomes large, resulting in cache confusion. Cache-control was developed.

Cache-Control

<meta http-equiv= "cache-control" content= "no-cache,max-age,must-revalidate,no-store" > <meta http-equiv= "Pragma" Content = "no-cache" > <meta http-equiv= "Expires" content= "0" > <meta http-equiv= "cache" content= "no-cache" >Copy the code

Cache-control is a relative time. For example, cache-control :3600 indicates that the resource is valid for 3600 seconds. Since the time is relative and compared to the client time, server-client time deviations do not cause problems. Cache-control and Expires can be both or both enabled on the server. Cache-control takes a higher priority when both are enabled.

Cache-control can be composed of multiple fields. The values are as follows:

  1. Max-age Specifies the length of time, in seconds, during which the cache is valid. For example, cache-control :max-age=31536000, which means that the Cache Expires in (31536000/24/60 * 60) days. The first time the resource is accessed, the server also returns the Expires field with an expiration time of one year.

Revisiting the resource hits the cache without disabling the cache and without the expiration time, without asking the server for the resource and fetching it directly from the browser cache.

  1. S-maxage, like max-age, overrides max-age and Expires, but only applies to shared caches and is ignored in private caches.

  2. Public indicates that the response can be cached by any object (the client sending the request, the proxy server, and so on).

  3. Private Indicates that the response can only be cached by a single user (operating system user or browser user). The response is non-shared and cannot be cached by the proxy server.

  4. No-cache forces all users who have cached the response to send a request with a validator to the server before using the cached data. Not literally not cached.

  5. No-store disallows caching and refetches data from the server for each request.

7, must-revalidate specifies that if the page is expired, the server will fetch it. This instruction is not commonly used and will not be discussed too much.

Negotiate the cache

If the strong cache is not hit, the browser sends the request to the server. The server determines whether the negotiation cache is matched according to last-modify/if-modify-since or Etag/ if-none-match in the HTTP header. If a hit is made, the HTTP return code is 304 and the browser loads the resource from the cache.

Last-Modify/If-Modify-Since

When the browser requests a resource for the first time, last-modify is added to the header returned by the server. Last-modify indicates the time when the resource was Last modified, for example, last-modify: Thu,31 Dec 2037 23:59:59 GMT.

When the browser requests the resource again, the request header will contain if-modify-since, which is the last-modify value returned before the cache. After receiving if-modify-since, the server determines whether the resource matches the cache based on the last modification time.

If the cache is hit, HTTP304 is returned, the resource content is not returned, and last-modify is not returned. Because of the contrasting server time, the client-server time gap does not cause problems. However, sometimes it is not accurate to determine whether a resource has been modified or not by the last modification time (the last modification time of the resource can also be consistent). Hence ETag/ if-none-match.

ETag/If-None-Match

Unlike last-modify/if-modify-since, Etag/ if-none-match returns an entity tag (Etag). ETag guarantees that each resource is unique, and resource changes will result in ETag changes *. If the ETag value changes, the resource status has been modified. The server determines whether the cache is hit based on the if-none-match value sent by the browser.

! [6][img6]

ETag extension description

We expect ETag to generate unique values for each URL, and ETag will change as resources change. How is the mysterious Etag generated? In the case of Apache, ETag generates an I-Node number based on the following factor files, which are not the other iNode. Is the number used by Linux/Unix to identify files. Yes, the file is not identified with a file name. You can see this using the command ‘ls -i’. When eTAGS are generated using one or more of these factors, they can be generated using a collision-proof hash function. Therefore, ETag can be repeated in theory, but the probability is negligible.

How can last-Modified produce Etag?

You might think that last-Modified is enough to let the browser know if the local cached copy is new enough, so why Etag? Etag was introduced in HTTP1.1 to solve several last-Modified problems:

  1. The last-Modified tag is accurate only to the second level. If a file has been Modified more than once in a second, it will not accurately mark the modification time of the file

  2. If some files are generated regularly, sometimes the contents are unchanged but last-Modified changes, making the file uncacheable

3. The server may not obtain the correct file modification time or the time on the proxy server may be different from that on the proxy server

Etag is the unique identifier of the corresponding resource on the server side that is automatically generated by the server or generated by the developer, which can more accurately control the cache. Last-modified and ETag can be used together. The server will verify the ETag first. If the ETag is consistent, the server will continue to compare last-Modified, and finally decide whether to return 304.

User behavior and caching

Browser cache behavior is also related to user behavior!!

The user action Expires/Cache-Control Last-Modified/Etag
Address enter effective effective
Page link jump effective effective
A new window effective effective
Forward and backward effective effective
F5 to refresh invalid effective
Ctrl + F5 to refresh invalid invalid

Conclusion:

First request from browser:

When the browser requests again:

See the blog post HTTP caching mechanism front end common interview – storage/caching