The latency of common hardware components is shown as follows:

From this data, you can see that it takes about 100ns to do a memory address and 10ms to do a disk lookup. As you can see, using memory as the storage medium for the cache is orders of magnitude better than using disk as the primary storage medium for the database. Therefore, memory is the most common medium for caching data.

1. Cache cases

1, the TLB

Linux Memory Management uses a hardware called MMU (Memory Management Unit) to convert virtual addresses to physical addresses. However, if such a complex calculation is required for each conversion, performance will be wasted. So we use a component called Translation Lookaside Buffer (TLB) to cache the mapping between recently translated virtual addresses and physical addresses. TLB is a caching component.

2, trill

Short videos on the platform are actually done using the built-in Web player. The network player receives the data stream, downloads the data, separates the audio and video stream, decodes the data and outputs it to peripheral devices for playback. Some caching components are usually designed in the player to cache part of the video data before the video is opened. For example, when we open Douyin, the server may return three video information at a time. When we play the first video, the player has helped us cache part of the data of the second and third videos. This will give the user the feeling of “seconds on” when watching the second video.

3. HTTP protocol cache

When we first request a static resource, such as an image, the server returns an “Etag” field in the response header in addition to the image information. The browser caches the image information and the value of this field. The next time the image is requested, the browser sends an “if-none-match” field in the header and writes the cached “Etag” value to the server. The server checks whether the image has changed. If not, it returns a status code of 304 to the browser, and the browser continues to use the cached image. This cache negotiation reduces the size of data transmitted over the network and improves page display performance.

Second, cache classification

1. Static caching

Static caching was well known in The Web 1.0 era. It is typically implemented by generating Velocity templates or static HTML files. Deploying static caching on Nginx can reduce the stress on backend application servers

2. Distributed caching

The name of distributed cache is well-known. Memcached and Redis are typical examples of distributed cache. They have strong performance and can be grouped together through some distributed solutions to overcome the limitations of a single machine. So distributed caching plays a very important role in the overall architecture

3. Local cache

Guava Cache or Ehcache, which are deployed in the same process as the application, have the advantage that they do not require cross-network scheduling and are extremely fast, so they can be used to block hot queries in a short period of time.

3. Cache read and write policies

1. Cache Aside

Instead of updating the cache, the data in the cache is deleted. When the data is read, it is found that the data in the cache is no longer available. The data is read from the database and updated to the cache.

This strategy is the most common one we use for caching, Cache Aside (also known as bypass caching), where data is stored in the database and loaded on demand.

Cache Aside is one of the most frequently used caching strategies in our daily development. However, we should learn to change it according to the situation. It is not fixed. The biggest problem with Cache Aside is that when writes are frequent, the data in the Cache will be cleaned frequently, which will affect the hit ratio of the Cache. If your business has a strict requirement for cache hit ratio, there are two solutions to consider:

One approach is to update the cache when the data is updated, but to update the cache with a distributed lock, because then only one thread is allowed to update the cache at the same time, without concurrency problems. Of course, doing so has some impact on write performance (recommended);

Another approach is to update the cache when the data is updated, but with a short expiration time, so that even if cache inconsistencies occur, the cached data will expire quickly and the business impact is acceptable.

2, the Read/Write Through

The core principle of this strategy is that the user only interacts with the cache, which communicates with the database, writing or reading data.

Write Through

If the data to be written already exists in the cache, then the data in the cache will be updated and synchronized to the database by the cache component. If the data in the cache does not exist, we call this situation “Write Miss”. Generally speaking, we can choose two “Write Miss” approaches: one is “Write Allocate”, which writes to the cache and then synchronously updates to the database by the cache component; The other is “no-write allocate”, which does not write to the cache, but instead updates directly to the database. We see that writing to the database is synchronous in the Write Through policy, which can have a significant impact on performance because the latency of writing to the database is much higher than writing to the cache. Update the database asynchronously using the Write Back policy.

Read Through

The strategy is simpler and goes like this: the cache is queried for the presence of the data in the cache, if it is, it returns it, and if not, the cache component is responsible for loading the data synchronously from the database.

3, Write Back

The core idea of this strategy is to write only to the cache when data is written, and to mark the cache block as “dirty.” Dirty blocks write their data to back-end storage only when they are used again. In the case of “Write Miss”, we use “Write Allocate”, that is, Write to the cache at the same time as Write to back-end storage, so that we only need to update the cache without updating the back-end storage in subsequent Write requests. Distinguish it from the write Through policy above.

If we find a cache hit when we read the cache, we return the cache data directly. If cache miss is looking for a cache blocks are available, and if the cache block is “dirty”, the cache before writing data to the back-end storage in the rocks, and from the back-end storage load data to the cache, if it is not dirty, by caching data from the back-end storage components loaded into the cache, finally we will cache is set to not dirty, Just return the data.

The Write Back policy is used to write data to disks. For example, Page Cache at the operating system level, asynchronous flush of logs, and asynchronous write of messages in message queues to disks. Since the performance advantage of this strategy is undeniable, it avoids the random write problem caused by writing directly to disk, since the latency of writing to memory differs by several orders of magnitude from random I/O to disk.

High availability of cache

The cache hit ratio is a data indicator that the cache needs to monitor. The high availability of the cache can reduce the probability of cache penetration to some extent and improve system stability. The high availability schemes of cache mainly include client scheme, intermediate proxy layer scheme and server scheme:

1. Client solution

In the client scenario, you need to pay attention to both the write and read aspects of the cache. When writing data, the data written to the cache needs to be distributed among multiple nodes, i.e. the data is sharded. When reading data, multiple groups of caches can be used for fault tolerance to improve the availability of the cache system. With regard to reading data, there are two strategies, master slave and multiple copy, which are proposed to solve different problems. Specific implementation details include: data sharding, master slave, multiple copies

Data fragmentation

Consistent Hash algorithm. In this algorithm, we organize the whole Hash value space into a virtual ring, and then place the IP address or host name of the cache node on the ring after Hash value. When we need to determine which node a Key needs to access, we first Hash the Key to determine its position on the ring, and then “walk” on the ring in a clockwise direction. The first cache node we encounter is the node we want to access.

If you add Node 5 between Node 1 and Node 2, you can see that Key 3, which hit Node 2, now hits Node 5, while all other keys remain unchanged. Similarly, if we remove Node 3 from the cluster, it will only affect Key 5. So you see, when adding and removing nodes, only a small number of keys “drift” to other nodes, while most of the Key hits remain the same, ensuring that the hit ratio does not drop significantly. [Hint] Cache avalanche caused by consistent hash is resolved by using virtual nodes. The difference between consistent hash sharding and hash sharding is the cache hit ratio. If machines join or decrease hash sharding, the cache becomes invalid and the cache hit ratio decreases.

master-slave

Redis itself supports master-slave deployment, but Memcached does not, how Memcached’s master-slave mechanism is implemented on the client side. Configure a group of slaves for each group of masters. When data is updated, the Master and Slave are synchronized. When data is read, data is preferentially read from the Slave. If data cannot be read, data is read from the Master and replanted to the Slave to keep the heat of the Slave data. The biggest advantage of the master-slave mechanism is that when a Slave goes down, the Master is still in the backstop, preventing a large number of requests from penetrating the database, which improves the high availability of the cache system.

More than a copy

The master-slave mode can solve the problem in most scenarios. However, in extreme traffic scenarios, a group of slaves cannot handle all the traffic, and the bandwidth of the Slave network card may become a bottleneck. To solve this problem, we consider adding a copy layer before Master/Slave. The overall architecture looks like this:

In this solution, when the client initiates a query request, it first selects a copy group from multiple copy groups to initiate a query. If the query fails, the client continues to query the Master/Slave group and returns the query results to all copy groups to avoid dirty data in the copy group. For cost reasons, each copy group has a smaller capacity than the Master and Slave, so it only stores hotter data. In this architecture, the volume of Master and Slave requests is greatly reduced, and in practice we use Master and Slave as a set of duplicate groups to ensure that they store hot data.

2. Intermediate agent layer

There are also many intermediate proxy layer solutions, such as Facebook’s Mcrouter, Twitter’s Twemproxy, and Pea Pod’s Codis. Their principle can basically be summarized by a graph:

3. Server solution

In version 2.4, Redis proposed the Redis Sentinel mode to solve the problem of high availability in the deployment of primary and secondary Redis. It can automatically promote the secondary node to the primary node after the primary node fails to ensure the availability of the whole cluster. The overall architecture is shown in the following figure:

Redis Sentinel is also clustered to avoid automatic fail-over when Sentinel nodes fail. Each Sentinel node is stateless. The address of the Master will be configured in Sentinel. Sentinel will monitor the status of the Master all the time. When the Master does not respond within the configured time interval, it will consider the Master has been suspended, and Sentinel will select one of the nodes to promote the Master node. And make all other slave nodes the new master slave nodes. During the arbitration within the Sentinel cluster, it will be determined according to the configured value. When several Sentinel nodes consider that the primary failure can be the operation of master/slave switchover, that is, the status of the cache nodes needs to be agreed within the cluster.

[Tip] The connection from the above client to the Sentinel cluster is dashed because write and read requests to the cache do not pass through the Sentinel node.

5. Cache penetration

pareto

The data access model of Internet systems generally follows the “80/20 principle”. “80/20 principle”, also known as Pareto principle, is an economic theory put forward by Italian economist Pareto. In simple terms, it means that in a group of things, the most important part is usually only 20%, and the other 80% is not that important. When applied to data access, 20% of the hot data is frequently accessed, and the other 80% is not. Since the cache capacity is limited, and most accesses request only 20% of the hot data, in theory, we only need to store 20% of the hot data in the limited cache space to effectively protect vulnerable back-end systems, and we can forgo caching the other 80% of the non-hot data. So this small amount of cache penetration is inevitable, but does no harm to the system.

2, back to the null value

When we get a null value from the database or an exception occurs, we can plant a null value back into the cache. However, because the null value is not accurate business data and will occupy the space of the cache, we will add a relatively short expiration time to the null value, so that the null value can expire quickly in a short time. Nullvalue can block a large number of penetrating requests, but if there is a large number of nullvalue cache, it will waste the storage space of the cache. If the cache space is full, some user information that has been cached will be removed, which will lead to the decrease of cache hit ratio. So with this solution, I recommend that you evaluate the cache capacity before you use it. If you need a large number of cache nodes to support, then you can not solve this problem by returning null values, so you can consider using bloom filters.

3, Bloom filter

In 1970, Bloom proposed an algorithm of Bloom filter to determine whether an element is in a set. The algorithm consists of a binary array and a Hash algorithm. Its basic idea is as follows: We calculate the corresponding Hash value of each value in the set according to the provided Hash algorithm, and then modulo the Hash value to the length of the array to get the index value to be included in the array, and change the value of the array from 0 to 1. To determine whether an element exists in the set, you just need to compute the index value of the element using the same algorithm. If the value is 1, the element is considered to be in the set, otherwise it is considered not in the set.

How do you solve cache penetration using bloom filters?

This section uses a table that stores user information as an example. First we initialize a large array, say 2 billion long, then we choose a Hash algorithm, then we Hash all the user ids that currently exist and map them to the large array, setting the mapping location to 1 and the rest to 0. In addition to writing a newly registered user to the database, it also updates the value of the corresponding position in the bloom filter array according to the same algorithm. So when we need to query the information of a certain user, we first query whether the ID exists in the Bloom filter, if not, directly return the null value, and do not need to continue to query the database and cache, so that can greatly reduce the cache penetration caused by abnormal query.

Advantages of Bloom filter:

(1) High performance. For both write and read operations, the time complexity is O(1) constant

(2) Save space. For example, a 2 billion array requires 2000000000/8/1024/1024 = 238M of space, whereas if you use arrays, assuming that each user ID takes up 4 bytes of space, Then storing 2 billion users requires 2000000000 * 4/1024/1024 = 7600M of space, 32 times the size of a Bloom filter.

Disadvantages of Bloom filter:

(1) It has a certain probability of error in judging whether an element is in the set. For example, it will judge that an element that is not in the set is in the set.

Cause: The Hash algorithm is defective.

Solution: Use multiple Hash algorithms to compute multiple Hash values for an element. The element is considered to be in the collection only when the values in the array corresponding to all Hash values are 1.

(2) Deletion of elements is not supported. The bug that bloom filters do not support deleting elements is also related to Hash collisions. For example, if two elements A and B are members of the set and they have the same Hash value, they will map to the same location in the array. At this time, we delete A, and the value of the corresponding position in the array changes from 1 to 0. Then when we find that the value of B is 0, we will also judge that B is an element that is not in the set, and we will get A wrong conclusion.

Solution: INSTEAD of just having 0 and 1 values in the array, I’ll store a count. For example, if both A and B hit an array index at the same time, the value is 2. If A is deleted, the value is changed from 2 to 1. Instead of storing bits, arrays store values, which increases space consumption.

4. The dog pile Effect

For example, when there is a very hot cache entry, once it fails, a large number of requests will penetrate the database, which will cause a sudden extreme stress on the database. This situation is called the “dog-pile effect”. The idea to solve the dog pile effect is to minimize the concurrency after cache penetration, and the solution is relatively simple:

(1) Control in the code after a hot cache item invalidation start a background thread, penetration into the database, the data is loaded into the cache, before the cache is loaded, all requests to access the cache are no longer penetration and directly returned.

(2) By setting a distributed lock in Memcached or Redis, only requests that acquire the lock can penetrate the database

Sixth, the CDN

1. Reasons for static resource acceleration

There are a lot of static resource requests in our system: for mobile apps, these static resources are mainly pictures, videos and streaming media information; For Web sites, this includes JavaScript files, CSS files, static HTML files, and so on. They have a lot of read requests and require a lot of access speed and occupy a lot of bandwidth. In this case, slow access speed and full bandwidth affect dynamic requests. Then you need to consider how to speed up read for these static resources.

2, the CDN

The key point of static resource access is nearby access, that is, users in Beijing access data in Beijing, and users in Hangzhou access data in Hangzhou, so as to achieve optimal performance. We consider adding a special cache layer on the upper layer of the business server to handle most of the access to static resources. The nodes of this special cache layer need to be spread across the country so that users can choose the nearest node to access. The cache hit ratio also needs to be guaranteed to minimize the number of requests (back requests) to the source site of the resource storage. This layer of cache is the CDN.

Content Delivery Network (CDN). Simply put, CDN is to distribute static resources to servers located in multiple geographical locations, so it can well solve the problem of data access nearby, and accelerate the access speed of static resources.

3. Set up CDN system

Which two points should be considered in setting up a CDN system:

(1) How to map user requests to CDN nodes

You may think that this is very simple, just tell the user the IP address of the CDN node, and then request the CDN service deployed on the IP address. However, this is not the case. You need to replace the IP with the corresponding domain name. So how do you do this? This needs to rely on DNS to help us solve the domain name mapping problem. The Domain Name System (DNS) is actually a distributed database that stores the mapping between Domain names and IP addresses. There are two types of domain name resolution results. One is called “A record”, which returns the IP address corresponding to the domain name. The other is the “CNAME record”, which returns another domain name, meaning that the resolution of the current domain name is forwarded to another domain name.

Here’s an example: For example, if your company’s level 1 domain name is example.com, you can define the domain name of your image service as “img.example.com” and then configure the CNAME of the resolution result of this domain name to the domain name provided by the CDN. Such as uclound may provide a domain is the domain “80 f21f91.cdn.ucloud.com.cn”. So your e-commerce system can use the image address “img.example.com/1.jpg”.

When the user requests the address, DNS server will resolve domain to 80 f21f91.cdn.ucloud.com.cn domain name, then the domain name resolves to the CDN node IP, so that you can get above the CDN resource data.

Domain name level resolution optimization

Domain name resolution is hierarchical, and each level has a dedicated DOMAIN name server to resolve domain names. Therefore, domain name resolution may require multiple DNS queries across the public network, resulting in poor performance. One solution is to do a pre-resolution of the domain name to be resolved when the APP starts, and then cache the resolution result in a local LRU cache. So when we want to use the domain name, we just need to get the IP address directly from the cache. If the cache does not exist, we will go through the whole DNS query process. At the same time, we can start a timer to update the data in the cache periodically to avoid the data invalidation caused by the change of DNS resolution results.

(2) How to select the nearest node according to the user’s geographical location information.

Global Server Load Balance (GSLB) is used to perform Load balancing among servers deployed in different regions. The following components may manage many local Load balancing components. It has two functions: on the one hand, it is a load balancing server, load balancing, as the name implies, means to evenly distribute traffic so that the following managed servers are more evenly loaded; On the other hand, it also needs to ensure that the server through which traffic flows is geographically close to the source of traffic.

GSLB can use a variety of strategies to ensure that the returned CDN nodes and users are in the same geographical region as far as possible. For example, the USER’s IP address can be divided into several regions according to geographical location, and then the CDN nodes can be corresponding to a region, and the appropriate nodes can be returned according to the region of the user. You can also determine which node to return by sending a packet to measure the RTT.

Summary: DNS technology is the core technology used in CDN implementation, which can map users’ requests to CDN nodes. DNS resolution results need to be cached locally to reduce the response time during DNS resolution. GSLB can return a node closer to the user, speeding up access to static resources.

expand

(1) Baidu domain name resolution process

At first, the domain name resolution request checks the hosts file on the host to see if there is an IP address corresponding to www.baidu.com. If not, the Local DNS server has a cache of DNS resolution results. If yes, the Local DNS server returns a cache of DNS resolution results. If not, start the DNS iterative query. The root DNS returns the top-level DNS (.com) address. Then request the.com top-level DNS to obtain the domain name server address of Baidu.com. Then query the IP address corresponding to www.baidu.com from the domain name server of Baidu.com, return the IP address and mark the result as the result from the authoritative DNS, and write the result into the Local DNS resolution result cache. In this way, the next resolution of the same domain name does not need to do DNS iterative query.

(2) CDN delay

Generally, static resources are written to a CDN node through the CDN vendor’s interface, and then resources are dispersed to each CDN node through the synchronization mechanism inside THE CDN. Even though the CDN internal network is optimized, this synchronization process is delayed. Once we cannot obtain data from the selected CDN nodes, we have to obtain data from the source site, and the user network to the source site may span multiple backbone networks, which will not only cause performance loss but also consume the bandwidth of the source site, bringing higher research and development costs. Therefore, when using CDN, we need to pay attention to the hit ratio of CDN and the bandwidth of the source station.