Ali P8 architect talk: distributed cache application scenarios, selection comparison, problems and challenges

Why use a distributed cache high concurrency environment, for example, the typical taobao double 11 seconds killed, and hundreds of millions of users within a few minutes into taobao, if this time visit without intercept, make a lot of read and write requests to the database, apparently because of disk processing speed and memory is not an order of magnitude, the server is coming down. To reduce the pressure of the database and improve the response speed of the system, a layer of cache will be added before the database. The greater the access pressure is, the CDN will start to intercept the access requests such as pictures before the cache.

In addition, due to the limited memory resources and bearing capacity of the earliest single machine, if a large number of local cache is used, the same data will be stored by different nodes for multiple copies, resulting in a large waste of memory resources, so that the birth of distributed cache.

Distributed caching Application scenario Page caching. Used to cache fragments of Web page content, including HTML, CSS, images, etc. Application object cache. As the secondary cache of ORM framework, the cache system provides services to relieve the load pressure of database and accelerate application access. Address session synchronization and status caching in distributed Web deployment. Caches include Session Session status and application scale-out status data. Such data is difficult to recover and has high availability requirements. Therefore, it is usually applied to high availability clusters. Parallel processing usually involves sharing a large number of intermediate computing results; Cloud computing provides distributed caching services. 1. Redis not only supports simple K/V type data, but also provides the storage of list, set, zset, hash and other data structures. Memcache supports only simple data types, requiring clients to handle complex objects themselves

2. Redis supports persistent data, which can keep the data in memory in disk, and can be reloaded for use when restarting (PS: persistent in RDB, AOF).

3. Because Memcache has no persistence mechanism, all cached data is invalidated during downtime. Redis is configured for persistence. After a shutdown is restarted, the data at the time of the shutdown will be automatically loaded into the cache system. Provides a better DISASTER recovery mechanism.

4, Memcache can use Magent in the client for consistent hash distribution. Redis supports distributed server (PS:Twemproxy/Codis/ Redis-Cluster)

5. Memcached’s simple limitations are key and Value limitations. The maximum key length is 250 characters. The acceptable storage data cannot exceed 1MB (the configuration file can be modified to be larger), because this is the maximum value of a typical slab and is not suitable for virtual machines. The Redis Key supports up to 512 KB.

6. Redis uses a single-threaded model to ensure that data is submitted sequentially. Memcache requires cas to ensure data consistency. Check and Set (CAS) is a mechanism to ensure concurrent consistency and belongs to the category of optimistic locking. The principle is simple: take the version number, do the operation, compare the version number, if the same operation, do not do anything

CPU utilization. Because Redis uses only one core, while Memcached can use multiple cores, Redis performs better than Memcached at storing small data on average per core. For data over 100K, Memcached performs better than Redis.

7. Memcache memory management: use Slab Allocation. The principle is fairly simple: pre-allocate a series of groups of fixed size, and then select the most appropriate block storage based on the data size. Memory fragmentation is avoided. Memcached By default, the maximum size of the next slab is 1.25 times that of the previous one.

Redis memory management: Redis defines an array to keep track of all memory allocations. Redis uses wrapped Malloc/Free, which is much simpler than Memcached memory management. Since MALloc first searches for available space allocations in managed memory in a linked list, memory fragmentation is high.

In fact, for the enterprise selection of Memcache, Redis, more should be based on the business scenario (because Memcache, Redis both have high enough performance and stability). If a business scenario requires persistent caching capabilities, or caching capabilities that support multiple data structures, Redis is the best choice.

(PS: Redis cluster solution is also superior to Memcache, Memcache cluster solution in client consistent hash, Redis uses a centrless server cluster solution)

To sum up: Redis is a better choice for caching systems to support more business scenarios.

Common problems and challenges of distributed cache 1. Cache avalanche A cache avalanche can be simply defined as a period in which a new cache is not available due to an old cache failure (for example: We used the same expiration time to set the cache, and a large area of cache expiration occurred at the same time.) All the requests that should have access to the cache were queried by the database, which caused great pressure on the database CPU and memory, and even caused database downtime. It creates a chain reaction that causes the whole system to collapse.

2. Cache penetration Cache penetration refers to data queried by users. If data is not found in the database, it will not be found in the cache either. As a result, the user will not find the query in the cache, and will have to query the database again each time, and then return null (equivalent to two useless queries). This allows the request to bypass the cache and go directly to the database, which is a common cache hit ratio issue.

3. Cache preheating Cache preheating this should be a relatively common concept, I believe many friends should be easy to understand, cache preheating is the system online, the relevant cache data directly loaded into the cache system. This avoids the problem of first querying the database and then caching the data when the user requests it! Users directly query cached data that has been preheated in advance!

4. Cache update In addition to the cache invalidation policy of the cache server, we can also carry out customized cache obsolescence according to specific business requirements. There are two common policies:

(1) Periodically clean expired cache;

(2) When a user requests to come over, we will judge whether the cache used in this request has expired. If it has expired, we will go to the underlying system to get new data and update the cache.

Both have their own advantages and disadvantages, the first disadvantage is to maintain a large number of cache keys is more troublesome, the second disadvantage is that each user request to come over to determine cache invalidation, logic is relatively complex! You can weigh which solution to use according to your own application scenario.

5. Cache degradation When traffic surges, service problems (such as slow or unresponsive response times) occur, or non-core services affect the performance of the core process, you still need to ensure that the service is available, even if it damages the service. The system can automatically degrade according to some key data, or manually degrade by configuring switches.

The ultimate goal of a downgrade is to ensure that the core service is available, even if it is lossy. And some services can’t be downgraded (add to cart, checkout).

The system should be combed before demoting to see if the system can lose the pawn to protect the boss; To sort out what must be fiercely protected and what can be downgraded; For example, you can refer to log level Settings:

(1) General: For example, some services can be automatically degraded due to timeout due to network jitter or online services;

(2) Warning: If the success rate of some services fluctuates within a period of time (such as between 95 and 100%), it can be automatically degraded or manually degraded, and an alarm can be sent;

(3) error: for example, the availability rate is lower than 90%, or the database connection pool was hit, or the number of visits suddenly jumped to the maximum threshold that the system can withstand, at this time can be automatically degraded or manually degraded according to the situation;

(4) Serious error: for example, due to special reasons, the data is wrong, and emergency manual downgrade is needed at this time.

Ali P8 architect talk: distributed cache application scenarios, selection comparison, problems and challenges

Related Posts

The July ranking of programming languages is here. Why are there different results reported by different media?

Wechat pay is so awesome, to explore its structure!

Laravel assigns roles to users