This paper is more theoretical, the main purpose is

  1. Understand the positioning of Nginx in a cluster
  2. Understand asynchronous, non-blocking IO
  3. Understand load balancing algorithms commonly used in Nginx
  4. Understand the difference between browser caching and Nginx caching

It is recommended that the front end learn about the back end from Node, especially when it comes to event-driven thinking, both the front and back end are frequently mentioned. And Node and Nginx have a lot in common.

First of all, Nginx itself is better at handling low-level server-side resources such as static resource handling forwarding, reverse proxy, load balancing, etc. Node is better at handling upper-level specific business logic, and the two can be a perfect combination.

Know where Nginx comes from

Nginx is a web server developed by an asynchronous framework that can also be used as a reverse proxy, load balancer and cache (Made in Russia). Nginx was acquired by F5 in 2019. So far, it has been tested to support 50,000 parallel connections, and in practice, it can support 20,000 to 40,000 parallel connections. Nginx itself is made up of a number of module extensions, much like WebPack or ESlint in modularity;

Understand Nginx positioning from a site architecture perspective

At present, as a member of the back-end cluster, Nginx has its place to play both from the point of view of availability and concurrency. It is a pillar of existence, and modern Internet architecture software design can not leave it.

The front end is to understand Nginx, but also to talk about modern server cluster evolution, I don’t want to start from ancient times. Generally speaking, website development only needs three steps, Web client – application/file server – database server, we are constantly making changes based on these three aspects.

First, most of the original servers were on the same machine, but this resulted in concurrency, limited storage capacity, and once the machine was down, it meant that the entire application could not be accessed, which led to the separation of servers.

After solving the problem of server separation, the next problem is to solve the pressure of our database. When the volume of business increases and the amount of request data increases, the pressure of the database will also increase, and the interface request delay will increase. Of course, the user experience is not good. In addition, we find that most user operations are read operations rather than write operations. By adopting the database read-write separation mode, the master library is responsible for write requests, while the slave library is responsible for read requests and distributes the traffic to different nodes to improve the load capacity of the database.

Later, since a single server is not enough to support the concurrency of our application, and a single server does not have the disaster recovery function, so we clustered the application server and database. In order to coordinate the data transmission of various servers between clusters, We also introduced a load balancer (Nginx/Tomcat) to allocate to the application service cluster (Tomcat) through various algorithms.

Then we realized that even though the load balancer was available, we could still have an application unavailable if it went down. Here we introduced LVS active-standby DISASTER recovery, which we can extend a little bit because LVS is very similar to Nginx in this article. LVS can be either a front-end access point for load balancing or a Keeplive for high availability clusters.

  1. In terms of high availability, if a Web server goes down, or fails, Keepalived will detect and remove the faulty server from the system, and use other servers to take its place. When the server works properly, Keepalived automatically adds the server to the server farm. All of this work is done automatically, without human intervention. All you need to do is repair the faulty server. Active-passive Dr Of LVS is an example where one machine acts as the standby machine to prevent machine downtime. Of course, dual-system hot backup is only one type of HA cluster, and it also changes with user requirements.

  2. In terms of load balancing, Nginx is usually equipped with about 40,000 parallel connections, so if the actual scenario exceeds this number, multiple Nginx joint work will be required. We usually use LVS or DNS as the upper node of Nginx to distribute data to Nginx.

    • Nginx is developed based on layer 7 protocol and adopts the principle of asynchronous forwarding. In other words, it keeps the downstream client link while creating a new link to the upstream server and waits for the upstream server to return the data to the downstream client. The advantage of this is that Nginx can act as an intermediate layer to intercept errors and data from upstream servers, and if there is a problem with the upstream server, the upstream server connection can be switched immediately to maintain the stability of the downstream connection.
    • LVS adopts the synchronous forwarding principle based on layer 4 protocol, that is, when the downstream client initiates a request, LVS directly forwards the connection to the upstream server, so that the downstream and upstream are directly connected. Of course, there is no error handling. Common high availability cluster solutions use LVS for four layers of load balancing + Keeplive and Nginx for seven layers of load.

In fact, in addition to LVS four layer forwarding requests, we can also use DNS domain name resolution to do load balancing according to different user IP addresses, which will be discussed later.

When the large website with more and more business, the amount of data increased, we also carried out a database partition table, that is, the data of the same table, according to certain rules and algorithms, hash to different libraries. Then we want to continue to reduce the database pressure, we also in the application service cluster and cache database service cluster between the introduction of the search engine, through the search engine can let the massive user search requests and our database do a layer of protection.

Finally, when our business pressure is higher, we will isolate the specific business into separate subsystems, which is the concept of microservices. There is also the issue of horizontal expansion involved here for exampleDocker,K8sAnd MQ and ZooKeeper involved in communication between different service clusters. But that is beyond the scope of this article.

Talk about asynchronous, non-blocking, event-driven

Nginx is a lightweight HTTP server with an event-driven asynchronous non-blocking processing framework, which gives it excellent IO performance and is often used for server-side reverse proxies and load balancing. It has the advantages of high concurrency, high reliability, and high performance and supports hot deployment.

Nginx is very similar to Node’s event-driven, asynchronous I/O design philosophy. However, Nginx is written in pure C, Node is written in C++, performance is very good. The difference is that Nginx has a powerful ability to manage connections client-side, whereas Node is all-encompassing.

What is asynchronous I/O

Suppose there is a set of unrelated tasks that need to be completed in a business scenario. There are two prevailing approaches.

  1. A single thread executes sequentially.
  2. Multithreading is done in parallel.

First, multithreading is preferred if we want to create multiple threads with less overhead than parallel execution. The cost of multithreading is the overhead of thread creation and thread context switching at execution time. In addition, in complex business, multithreaded programming often faces problems such as locking and state synchronization, which is the main reason why multithreading is criticized. But multithreading in multi-core CPU can effectively improve the UTILIZATION of CPU, this advantage is beyond doubt.

The single-threaded sequential execution of tasks is more in line with the sequential thinking of programmers. But the downside of serial execution is performance, as any task that is slightly slower will cause the subsequent execution code to block.

In computer resources, I/O and CPU calculations can normally be performed in parallel. The problem with the synchronous programming model, however, is that I/O takes place while subsequent tasks wait, which makes the resources not better utilized. The operating system allocates CPU time slices to the rest of the process to use resources fairly and efficiently.

Because of this, some servers may start multiple worker processes to serve more users in order to improve responsiveness (scale-out).

However, for a set of tasks, if it cannot distribute tasks to multiple processes and therefore still cannot use resources efficiently, it will take longer to complete all tasks. This model is similar to tripling servers to increase service speed by taking up more resources, but it doesn’t really improve the problem. Adding hardware resources is one way to improve the quality of service, but it’s not the only way.

The single-line synchronized programming model blocks I/O and results in less optimal use of hardware resources. Multithreaded programming model is also a headache for developers because of deadlock, state synchronization and other problems in programming.

Node gives its solution in between:

  • Using single thread, away from multi-thread deadlock, state synchronization and other problems;
  • Take advantage of asynchronous I/O to keep single threads out of blocks for better CPU use.

Asynchronous I/O calls are expected to stop blocking subsequent operations and allocate the waiting time for I/ OS to other services.

What is blocking/non-blocking IO

In terms of computer kernel I/O, asynchronous/synchronous and blocking/non-blocking are really two different things. The operating system kernel has only two ways of dealing with I/O: blocking and non-blocking.

When calling blocking I/O, the application waits for the I/O to complete before returning the result. A characteristic of blocking I/O is that the call must wait until all operations at the system kernel level have been completed.

Blocking I/ OS causes the CPU to wait for I/ OS, wasting the waiting time and making insufficient use of CPU processing capabilities. To improve performance, the kernel provides non-blocking I/O. Non-blocking I/O differs from blocking I/O in that it is returned immediately after the call.

After the non-blocking I/O returns, the CPU’s time slice can be used to process other transactions, where the performance improvement is significant.

But there are some problems with non-blocking I/O. Because the full I/O is not complete, the data that is immediately returned is not what the business layer expects, but merely the state of the current invocation. In order to obtain complete data, the application repeatedly calls I/O operations to confirm completion. This technique of repeating calls to determine whether the operation is complete is called polling, and there are several ways to implement polling, which will be described below:

Step1: Read mode

It is the most primitive and lowest-performing one, and reads complete data through repeated calls to check the state of the I/O. The CPU is spent waiting until the final data is available. In plain English, this is true polling.

Step2: Select mode

It is an improvement on Read, which is defined by the completion status of events on file descriptors.

Step3: Epoll mode

This scheme is the most efficient I/O event notification mechanism in Linux. If no I/O event is detected during the polling, the system will sleep until the event occurs and wakes it up. It takes full advantage of event notifications and performs callbacks instead of traversing queries, so it doesn’t waste CPU and is more efficient to execute. This is also Nginx’s approach to non-blocking I/O polling.

So far, both Node and Nginx have used epoll to do non-blocking I/O, but libuv in Node encapsulates epoll.

Why event-driven

This slightly extends the evolution of the following service architecture models:

Step1: synchronous

The earliest servers had a synchronous execution model in which only one request was served at a time, with all requests waiting to be served in sequence. This means that all requests are delayed except the current one. Its processing power is quite low, assuming that the time consumed by each response service is stable at N seconds, the QPS of this type of service is 1/N. (QPS stands for “query rate per second,” which is the number of queries a server can handle per second and is a measure of how much traffic a particular query server can handle in a given period of time.)

Step2: process replication

To address the concurrency problem of synchronous architectures, we can serve more requests and users simultaneously through replication of processes.

This would require one process for each connection, meaning 100 processes would need to be started for 100 connections, which is also unacceptable. Because this process replicates a lot of data, startup is slow.

Later, to solve the problem of slow start, we referred to pre-copy, which is to pre-copy a certain number of processes. At the same time, processes are reused to avoid the overhead of process creation and destruction. However, this model does not scale, and once concurrent requests become too high, memory usage will be exhausted as the number of processes increases. Assume that a server built by replication or pre-replication has resource limits and the maximum number of processes is M, and the QPS of this service is M/N.

Step3: Multithreading

In order to solve the waste problem in process replication, we introduce multi-threaded service model.

Let a thread service a request. Threads have low overhead relative to processes, and data can be shared between threads. Thread pools can also be used to reduce the overhead of creating and destroying threads. Let’s just say that multiple threads handle concurrency better than multiple processes, but each thread stack takes up a certain amount of memory.

In addition, as a CPU core can only do one thing at a time, the operating system can only use CPU resources evenly by dividing the CPU into time slices. However, when the operating system kernel switches threads, it also needs to switch the context of threads. When the number of threads is too large, Time will be wasted in context switching.

So the multithreaded structure is still not scalable enough for large amounts of concurrency. If you ignore the overhead of multithreaded context switching and assume that the resource occupied by the thread is 1/L of the process, its QPS is M * L/N, subject to the resource ceiling.

This per-thread/per-request approach is the server model used by Apache.

Step4: event-driven

When the concurrency increases to tens of thousands, the problem of multithreaded memory consumption will be exposed. In order to solve the problem of high concurrency, event-driven service model appears. Node and Nginx are based on event-driven way to achieve, using multi-process/single thread to avoid unnecessary memory overhead and context switching overhead.

Event-driven architecture is based on a common pattern in software development known as the publish-subscribe or observer pattern. Event-driven architecture is based on a common pattern in software development known as the publish-subscribe or observer pattern. A topic is like an FM radio that broadcasts to an observer interested in hearing what the topic says. It doesn’t matter if there is only one observer or a hundred, as long as the subject has some news to broadcast.

In summary, the Nginx architecture is superior to the traditional multi-threaded architecture, especially in the case of high concurrency. Apache is a synchronous multi-threaded model, one thread per connection. Nginx is asynchronous, and multiple connections (ten thousand levels) can correspond to one process.

Nginx process structure

Nginx is multi-process rather than multi-threaded because threads share the same address space and cannot maintain high availability because a third-party module failure can easily bring down the entire service.

Nginx mainly maintains a parent process, which has no specific function but is responsible for monitoring and managing other child processes. There are two types of child processes, one is cache process and the other is work process.

  • Request sending and receiving is a process within a worker process. Each process can maintain multiple requests. The parent process monitors the worker process to determine whether the worker process needs reloading or hot deployment.
  • In fact, the cache is shared among multiple worker processes. Among them, the cache Manager is used to manage the cache, and the cache loader is used to load the cache. We adapt the number of worker processes and CPU to make better use of THE CPU cache.


Load balancing in Nginx

What is load balancing

Load balancing is also a key component of high availability architecture. It is mainly used to improve performance and availability by distributing traffic to multiple servers through load balancing, and multiple servers can eliminate single points of failure in this part. Nginx can do HTTP layer load balancing, for transport layer TCP, UDP traffic can also do forwarding, forwarding of the application layer is commonly referred to as seven-layer load, forwarding of the transport layer is four-layer load balancing.

  • Layer-4 load balancing works at the transport layer of the OSI model and forwards traffic to the application server by modifying the address information of the packets received from the client. This load balancer does not handle application protocols, such as NGINx, LVS, and F5.
  • Layer-7 load balancing works at the application layer of the OSI model. Because it needs to parse application layer traffic, layer-7 load balancing also needs a complete TCP/IP protocol stack after receiving client traffic. Layer-7 load balancer establishes a complete connection with the client and resolves the request traffic of the application layer. Then, it selects an application server according to the scheduling algorithm and establishes another connection with the application server to send the request. Therefore, the main work of Layer-7 load balancer is proxy. In addition to supporting Layer-4 load balancing, a Layer-7 switch analyzes application layer information, such as HTTP URI or Cookie information, to implement Layer-7 load balancing. Such as Nginx, Apache, haProxy.

Generally LVS do 4 layers of load; Nginx does 7 layers of load.

Load Balancing classification

The layers 4 and 7 mentioned above are actually differentiated based on the hierarchical relationship in the OSI reference model. In fact, load balancing can be roughly divided into the following types according to the hierarchical classification:

  • Layer 2 (Data link) Load A load balancer receives a request for a virtual MAC address in the form of a virtual MAC address and assigns the actual BACK-END MAC address to the load balancer.
  • Layer-3 (network) loads are based on virtual IP addresses. The load balancer allocates the actual IP address of the back-end for the response.
  • Layer 4 (transport layer) loads use “virtual IP+ port” to receive requests, which are then distributed and forwarded to real servers. It’s easier than seven. LVS, HAProxy, and F5 are layer 4 load balancers.
  • Layer 7 (application layer) loads receive requests based on virtual URLS or IP addresses and host names and then delegate them to the corresponding processing servers. The only common ones left are Nginx and Apache. Here would like to say more about DNS load balancing ⬇️

DNS load balancing:

Because DNS queries need to resolve domain names and return IP addresses, we can also perform a layer of load balancing at the DNS level, and allocate different server IP addresses for DNS resolution through geographic information. At present, THE CDN of content distribution network requires DNS resolution servers to cooperate in processing. By resolving the domain name to the server closest to the user’s geographical location, the response speed can be improved.

Software load balancing

Software load balancing is LVS and Nginx mentioned above, and four-layer load balancing is to forward requests using IP addresses and ports at the network layer. Layer 7 load balancers can forward requests to specific hosts based on HTTP request headers and URL information of visiting users. LVS is a four-layer load balancing. Layer 7 load balancing is handed over to Nginx.

LVS are generally used as the front-end server to withstand the pressure, and then forwarded to various server clusters via Nginx. The advantage is that it is cheap and easy to expand, while the disadvantage is that it is not as strong and less secure as the hardware load balancing described below.

Hardware load balancing

Hardware load balancing is to use basic network devices such as switches to achieve load balancing (F5, A10).

The hardware load is characterized by providing certain firewall security features, and its performance is quite strong, the only problem is expensive, very expensive. In addition, it basically has no expansion ability. If the number of visits changes frequently, it basically cannot do dynamic expansion.

Nginx involves load balancing algorithms

Next, we will talk about the specific load balancing algorithm. The performance of the load balancing algorithm is mainly considered in two aspects, the first is high efficiency, and the second is availability.

  1. In terms of efficiency, if the load balancing algorithm is designed wrong, the server cache processing will be greatly affected, resulting in backend resource waste. For example, if the load balancing design is randomly proxying to application servers, there is a high probability that cached data will be regenerated on each request.
  2. In terms of usability, Nginx is mainly about preventing an avalanche of server outages. Common high availability strategies include fail-over, fast failure, parallel invocation, timed retransmission, etc. The idea is to ensure that the server cannot stop responding. Suppose there are five servers, and the cache of each server is running properly. When one of them goes down and only four servers are left, the original cache allocation may have problems. Here is an example of the hash algorithm that will be discussed next.
  • The hash algorithm: The user ID or IP address is used as the key to calculate the corresponding HAS H value, and then mod the number of nodes(hash % n)Where n is the number of server nodes. Therefore, when the number of nodes does not change, requests from each server are always sent to the same upstream server. However, if the server is expanded or down, all client requests will be calculated differently, and large-scale cache failures are likely to occur.

1. Weight round-robin algorithm

Round-robin is the most basic and common way of computing algorithms, whether allocating CPUS or servers is the same. It allocates requests to back-end servers in turn, treating each server on the back-end equally, regardless of the actual number of connections to the server and the current system load. The reason for weighted polling is that different upstream servers are not equally capable of handling business, so the pressure is not the same. So weighted polling is to assign different weights to different upstream servers, with less load capacity being assigned lower weights, and more capacity being assigned.

Upstream is where nginx load balancing is configured.

  • Weight: indicates the server weight. The default value is 1
  • Max_conns: indicates the maximum number of concurrent connections of the server
  • Max_fails: Specifies the maximum number of failures for which access will be denied
  • Fail_timeout: event that access is denied after a failure. The default value is 10 seconds
upstream http_nginx {
    server 127.0.0.1:3001 weight=10;
    server 127.0.0.1:3002;
    server 127.0.0.1:3003;
}
Copy the code

Smooth weighted round-robin

In before the group of polling actually ignored a problem, when the weight according to the design of {2, 1, 1}, back-end nginx receives the request is in accordance with the two server 1, one server 2, 3 one server to send, this will cause continuous request constantly in server 1, The relatively uneven distribution may also result in increased load on server 1 for short periods of time. This is where the smooth-weighting algorithm comes in, splitting two requests from server 1 between the four requests, a minor optimization of weighted polling.

In fact, the principle is relatively simple, in fact, in the forwarding process to dynamically modify the weight of the current server, for example, hit a server 1 on the weight of server 1 -1.

Virtual Node Smooth Weighted Round-Robin

As for smooth-weighted polling, we actually have some optimization schemes. When the SWRR algorithm adjusts the weight, the server traffic with the higher weight will suddenly concentrate and the QPS will increase. Moreover, the time complexity of linear O(n) algorithm of SWRR will decrease linearly in large-scale back-end scenarios. Therefore, VNSWRR was subsequently introduced, which can not only achieve the smooth and decentralized characteristics of SWRR, but also have stable linear complexity. In fact, it is a batch of server nodes to reduce short-term computing.

upstream backend {
    vnswrr; # enable VNSWRR load balancing algorithm.127.0.0.1 port = 81; 127.0.0.1 port = 82 weight = 2; 127.0.0.1 port = 83; 127.0.0.1 port = 84 backup; 127.0.0.1 port = 85 down; } `Copy the code

2. The hash algorithm (ip_hash | hash)

The above weighted polling algorithm basically does not use caching, because polling means that whatever the client sends, I send it to the upstream server in sequence, and the downstream client requests do not have any flags that enable it to use the same upstream server’s cache.

In the hash algorithm, the main solution is to let Nginx can make full use of the cache, here can use the user ID or IP as the user marker, the principle of the hash algorithm has been described above, here are the specific parameters:

  • Ip_hash: Uses the first three bytes as the keyword for an IPV4 address
  • Hash: This hash can be used for various keywords, such as urls if you want to use them as a hashhash $request_uri;
upstream http_nginx {
    # ip_hash;
    # hash $request_uri;
    server 127.0.0.1:3001;
    server 127.0.0.1:3002;
    server 127.0.0.1:3003;
}
Copy the code

3. Minimum Connection (least_CONN)

Nginx can also send requests to the server that currently handles the least number of requests. Can share the stress between servers. However, if the servers themselves do not match, Nginx can cause the server to become overloaded by simply routing requests to the server with the minimum number of connections

5. Consistency hashing

Although the hash algorithm realizes the utilization of the cache, when the server is down or expanded, the hash algorithm will cause a large number of routing location changes, which is very likely to lead to a large range of cache failure, and consistent hash is to solve this problem.

Consistent hashing also uses modulo, but unlike the hash algorithm, which is modulo of the number of nodes, consistent hashing is modulo of 2^32, which is a fixed value, so in fact the result of modulo of 2^32 is a circle.

If we need to do in a specific time scale, then only need to determine a range in the ring to add new server into it, so can guarantee the stability of the cache first, if there is a new server will add nodes in the ring, user nodes under the influence of the cache recalculation will be limited in a small range.

worker_processes 1;
    http {
    upstream test {
    consistent_hash $request_uri;
    server 127.0.0.1:9001 id=1001 weight=3;
    server 127.0.0.1:9002 id=1002 weight=10;
    server 127.0.0.1:9003 id=1003 weight=20; }}Copy the code

The cache

Client cache

The main advantage of browser caching is that there is no network consumption. Even in the case of cache failure, it negotiates the cache to let upstream determine whether there is an update or not. If there is no update there is no need for the upstream server to return the full data but rather through 304 to tell the downstream client that caching is available.

Browser caching reduces traffic consumption to some extent, but the downside is that each client’s cache is different and the browser cache is only for one user device.

Caching in Nginx

If the user does go through the strong and weak cache and still does not find available results, Nginx will send a request to the upstream server, but before sending the request, Nginx will also look to the local server to see if there are any unexpired values from the upstream server. Compared to browser caching, Nginx caching can improve the user experience of all users passing through the Nginx device, and Nginx can block some of the traffic destined for upper-level servers through caching. However, now that it is Nginx, users will inevitably incur network consumption.

# expires [modified] time | expires epoch | max | off
server {
  Set the cache to maximum
  Expires: Thu, 07 Mar 2042 22:01:05 GMT
  Cache-control: max-age=315360000
  expires		max; 
  # do not add cache
  expires		off;
  The cache must expire
  expires		epoch;
  Set the specific cache time
  expires		time 100
  
}
Copy the code

Nginx cache content is stored on disk, but file information is stored in memory, Nginx cache in order to prevent a long time to occupy the server disk and memory, LRU cache elimination algorithm is used to eliminate the cache that is not accessed for a long time. Nginx has two cache-related processes that come into play at this step.

  1. Cache Manage manages cache obsolescence on disk, which is also managed by the parent process.
  2. The cache loader is used to load the cache from the porcelain disk into the shared memory.
# proxy_cache zone | off; The cache namespace
# proxy_cache_path path keys_zone=name:size Specifies the file path and parameter Settings for the cache
# proxy_cache_valid [code ...]  time; What kind of cache needs a response

http {
  proxy_cache_path	/nginx-demo/tempcache levels=2:2 keys_zone=cache_test:2m	loader_threshold=300 loader_files=200 max_size=200m inactive=1m;		
  server {
    proxy_cache 		cache_test;
    proxy_cache_valid		valid 200 1m;
    proxy_cache_methods		GET;
    location / {
      proxy_pass 		http://localhost: 8091; # upstream server location;}}}Copy the code

Specifically, let’s talk about the Ngnix configuration information related to cache. There are many parameters involved here. It is recommended to directly compare the parameters in the ngx_HTTP_proxy_module module.

parameter Scope of application explain The default value
proxy_cache zone off HTTP, Server, location Cache namespace off
proxy_cache_path path [levels=levels] [use_temp_path=on off]keys_zone=name:size [inactive=time] [max_size=size] [manager_files=number] [manager_sleep=time] [manager_threshold=time] [loader_files=number][loader_sleep=time] [loader_threshold=time] [purger=on off] [purger_files=number] [purger_sleep=time] [purger_threshold=time]; http Cache configuration path, process control some parameters of the cache
proxy_cache_key string HTTP, Server, location Cache keyword, how to find cache proxy_cache_key
s c h e m e scheme
proxy_host$request_uri
proxy_cache_valid valid[code] time; HTTP, Server, location What data needs to be cached
proxy_no_cache string HTTP, Server, location Do not store cached content
proxy_cache_bypass string HTTP, Server, location No cached content is used
proxy_cache_methods GET POST HTTP, Server, location Those methods need to be cached

The key parameter is proxy_cache_path, some of which are listed here

proxy_cache_path
path The location of the disk where the cache files are stored
keys_zone Shared Memory Name
size Shared memory size
levels The cache file directory level is not too deep
use_temp_path Directory for storing temporary files
inactive LRU cache flush time
max_size LRU cache file size, exceeding will also be LRU
manager_files Maximum number of files that the cache management process can discard at a time to prevent excessive resource usage
manager_sleep The time that the cache management process sleeps after it has performed a flush. Avoid high-frequency elimination
manager_threshold Cache management process execution time
loader_files The maximum number of times the cache loading process loads the disk cache into the shared cache to prevent excessive resource usage
loader_sleep Sleep time of the cache loading process after it has been loaded
loader_threshold Maximum time that the cache loader takes to load a file

The following is the process of caching, which is relatively easy to understand. Caching is an integral part of the Internet. It is by increasing caching that our service supports high concurrency.