www.infoq.com/cn/articles…

As one of the largest social media websites in China, Weibo carries hundreds of millions of users’ service requests every day. Behind these requests, huge computing, memory, network, I/O and other resources are consumed. In addition, due to the product characteristics of Weibo, holidays and popular events may bring sudden access peaks of several times or even more than ten times, which put forward strict requirements for the underlying infrastructure supporting Weibo, which need to be met:

  1. Hundreds of thousands of user requests per second
  2. Real-time data update
  3. Low response time for service requests
  4. 99.99% + service availability

In order to meet the needs of business development, Weibo platform has developed a set of high performance and high availability CacheService architecture to support the operation of existing online business systems. However, “three feet of ice is not cold in a day”. The Cache architecture of Weibo has also experienced a continuous evolution process from scratch.

MySQL based Web architecture

In the initial microblog system, the page view of the system is relatively small. The simple database based (MySQL) has been able to meet the business needs, and the development is also relatively simple. The schematic diagram of the simple architecture is as follows:

With the promotion of Weibo and the entry of celebrity users, the number of users is increasing rapidly, and the number of visits is increasing day by day. At this time, the simple architecture based on MySQL is somewhat difficult, and the system response is also slow, because MySQL is a persistent storage solution, data reading and writing will go through the disk. Although MySQL also has a buffer pool, it cannot achieve fine-grained control according to service characteristics. In the service scenario of Weibo, a single MySQL server configured with SAS disks can only support several thousand requests, far less than the service requests of Weibo.

Web architecture based on single-layer Cache+MySQL

There are several solutions to this problem:

  1. Business architecture transformation, but in this scenario, the feasibility of this scheme is not high.
  2. MySQL from library expansion, although able to solve the problem, but the cost will be higher, and even if able to withstand the request quantity, but the response time of the resources still cannot meet the desired results, because the disk is read response time is relatively slow, normal 15000 RPM SAS disk read average delay to reach a minimum of 2 ms.
  3. Build a layer of Cache on top of MySQL, Cache the hot request data to Cache, based on Cache+MySQL architecture to provide service requests.

Related Manufacturer content

How to build a recommendation system quickly with zero foundation?

Luji Thinking Go language micro-service transformation complete process

Future IT architecture model for Netflix: Serverless

Core design of Alibaba data processing engine Blink

TensorFlow for deep Learning

Related Sponsors

Considering the overall changes and costs, scheme 3 is more suitable for the business scenario of Weibo. What type of Cache should be used?

Common Cache solutions are:

  1. Local Cache embedding a Local Cache in Web applications has the advantage of faster access, but it also has obvious problems. It is difficult to ensure the consistency of data updates, so the range of use is limited.
  2. In the stand-alone version of remote Cache, a set of remote Cache services are deployed, and the application end requests to interact with the Cache through network requests. In order to solve the horizontal expansion and disaster recovery problems of applications, data routing is usually implemented at the client level.
  3. Distributed Cache. The Cache service itself is a large cluster, which can be used by various service applications and provides some basic distributed features, such as horizontal scaling, disaster recovery, and data consistency.

Considering the simplicity of the system and the applicability of microblogging scenarios, we finally choose the method 2) based on open source Memcached as the microblogging Cache solution.

Memcached is a distributed Cache Server that provides the Cache of key-value data. It supports LRU and data expiration. Memcached manages memory blocks using Slab and provides simple operation protocols such as SET, GET, and DELETE. And has been widely verified in the industry. The server itself is a stand-alone version, and the distributed feature is implemented on the client side. Multiple Memcached nodes are deployed to distribute data on the client side based on consistent hash(or other hash strategies). Locate the specific memcached node and interact with the data. When a node fails, the node is removed and requests from the node are distributed to other nodes. The client provides disaster recovery and scaling capabilities.

After a period of honeymoon period, this architecture gradually encountered some problems.

  • Transient peak problems caused by node hangs

    For example, if you deploy five Memcached nodes and hash the keys consistently across the five nodes, if one of them fails, 20% of the Cache hit requests will reach the backend resources (such as DB). For Weibo, the Cache hit ratio of most core resources is 99%, and the QPS of a single group of resources may reach the level of more than 100W. If 20% penetration occurs at this time, it means that the backend resources need to resist requests of more than 20W, which is obviously too much pressure for the backend resources.

  • Too many nodes are required for a group of resource requests

    The Feed service of Weibo is a large consumer of Cache resources, consuming hundreds of thousands of QPS and bandwidth above GB(Byte) level. At this time, at least a dozen Memcached node units are needed to resist requests. Excessive Memcached node requests will weaken the performance of multiget. Because keys are scattered across many Memcached nodes, performance suffers when pull aggregation is performed, and the response time of Mutliget is affected by the slowest node, thus not meeting the SLA requirements of the service.

  • The Cache scaling capacity and node replacement are too busy

    Real-time dynamic scaling capacity is necessary for scenarios such as Weibo, where there are abnormal peaks (often several times or tens of times) during hot events, holidays, etc. Because Memcached resource nodes instantiated by the client are relatively fixed, scaling capacity requires:

    1. One online code change and node configuration change are carried out. However, if there are many application systems that rely on a certain group of resources, such as the authentication resources at the bottom, multiple business systems need to be changed. This is not a small change, especially in emergency situations, which will lead to slow operation.
    2. The consistency problem caused by read and write needs to be solved. For example, some service systems are reading Cache data and some are writing Cache data. However, it is difficult for these systems to perform configuration switch at a certain time.
    3. You need to replace the old node with a new one (for example, replacing a physical machine), which is similar to the previous problem.
  • O&m problems caused by excessive resources

Cache resource groups are applied for based on services. When a large number of services are applied for, the Cache resource groups are also large. In this case, it is difficult to manage and adjust these resources. Moreover, with the evolution of time, it is more difficult to adjust the operation and maintenance of some older resources when they are old and out of repair.

  • The Cache architecture needs good complexity

Being able to use it and being able to use it well are two different things. If the Cache architecture needs to be well developed by each business to be well used, and there will not be stability problems and cost wastage due to improper use of Cache, such problems will be normal for the current situation of the team that needs to recruit new people one after another. To solve this problem, business applications need to be provided with a Cache that is simple enough to operate with basic commands such as set/ GET /delete without requiring them to care about any of the underlying details.

Distributed CacheService architecture

To solve these problems, the Cache service architecture of Weibo has been evolved. By servizing Cache, a distributed CacheService architecture is provided to simplify the use of service developers and implement functions such as dynamic scaling, disaster recovery, and multi-tier Cache.

The CacheService architecture is as follows:

The system consists of several modules:

  • ConfigService

    This module is based on the configuration service center of the existing microblog. It is mainly a remote service to manage the static configuration and dynamic naming service. It can inform the listening Config client in real time when the configuration changes.

  • The proxy layer

    As an independent application, this module provides a proxy service for receiving requests from the business end and forwarding them to the Cache resources of the back-end end based on routing rules. It is a stateless node. It contains the following sections:

    • Asynchronous event Handler: Used to manage connections, receive data requests, and write back responses.
    • Processor: Parses and processes the requested data.
    • Adapter: Used to adapt to underlying protocols, such as MC and Redis protocols.
    • Router: Distributes routes to Cache resource pools to isolate different services.
    • LRU Cache: Used to optimize performance and mitigate performance degradation due to the extra hop through the proxy (network request).
    • Timer: Used to perform back-end tasks, such as detecting the health status of underlying Cache resources.

    After the Proxy starts, it loads the configuration list of back-end Cache resources from The Config Service for initialization, and receives real-time notification of configuration changes from the configService.

  • Cache resource pool

    This module acts as the actual data cache, with a multi-tier structure to meet the high availability of the service. Where main-node is the primary cache node and ha-node is the backup node. When main-node is down, data can be obtained from the ha-node to avoid penetrating into back-end resources. L1-node is mainly used to resist hotspot access, and its capacity is generally smaller than that of main-node. L1-node supports multiple groups, facilitating horizontal expansion to support higher throughput.

  • The Client Client

    This module mainly provides the client(SDK package) for business developers. It shields all details and only provides the simplest protocol interfaces such as GET, set, and DELETE, thus simplifying the use of business developers.

    During application startup, the Client obtains the proxy node list from the configService based on the namespace and establishes a connection with the back-end proxy. In normal protocol processing, such as the set command, the client selects the proxy node with the lowest load based on the load balancing policy, initiates a SET request, and receives the response from the proxy and returns it to the service caller.

    The Client reconstructs the proxy connection list based on the changes of the proxy node pushed by the configService. In addition, the Client also performs some DISASTER recovery (Dr). If the proxy node is faulty, the Client removes the proxy and periodically detects whether the proxy node is recovered.

At present, the Cache service of some service subsystems of Weibo platform has been migrated to CacheService, which has achieved good performance in actual operation. At present, the entire cluster supports more than 300W QPS per day online, and the average response time is less than 1ms.

It itself has the following characteristics:

  • High availability guarantee

    CacheService double-writes all data requests to the HA node. In this way, when the Main-Node fails, the CacheService reads data from the HA-Node, preventing the failure of the node from putting too much pressure on back-end resources (DB).

  • Horizontal expansion of services

    The CacheService proxy node is stateless. If the proxy cluster has performance problems, you can expand the capacity by adding or deleting nodes. For Cache resources at the backend, the request pressure for main-Node is shared by increasing or decreasing Cache resource groups at L1. In this way, most requests for hotspot data will fall into L1 layer, which can be easily scaled by adding or deleting Cache resource groups.

  • Real-time operation and maintenance changes

    By integrating the internal Config Service system, o&M changes such as resource expansion and node replacement can be implemented in seconds.

  • Cross-room features:

    The weibo system is deployed in multiple computer rooms, and the network delay and packet loss rate of servers in different computer rooms are much higher than those in the same computer room. For example, the delay between the guangzhou computer room and the Beijing computer room is more than 40ms. CacheService is deployed across equipment rooms. The Cache query request adopts the nearest access principle, and the Cache update request supports synchronous update across multiple equipment rooms.

At present, the distributed CacheService architecture of Weibo simplifies the use of business development and improves the operation, maintenance and availability of the system. The next direction of architecture transformation is to provide a low-cost solution for back-end Cache resources, which is optimized in terms of storage capacity and extreme performance of a single machine. In microblog service scenarios, hot and cold data are relatively obvious, and the proportion of long mantis data requests is not small. Therefore, if the Cache capacity is reduced, the backend resources cannot withstand requests, and expanding the Cache capacity leads to cost waste. The cost of a full memory solution is relatively high, so hot data is stored in memory and an LRU-based strategy to swap cold data to solid drives (SSDS) is one possible direction.

Thanks cui Kang for correcting this article.

To contribute or translate InfoQ Chinese, please email [email protected]. You are also welcome to follow us on Sina Weibo (@InfoQ) or Tencent Weibo (@InfoQ) and communicate with our editors and other readers.