Two places and three centers, say no to "business interruption"!

With the rapid development of Internet business, the business support capacity and requirements of IDC are also gradually improved. The “two places and three centers” scheme is more popular in the industry. Today, I would like to introduce to you what it is in the end.

“Two places” refers to the same city and different places; The three centers are the production center, same-city Dr Center, and remote Dr Center.

In the early stage, banks at home and abroad typically adopted the construction plan of “two places and three centers”. In this mode, multiple DCS work in active/standby mode, that is, they are in the primary and secondary mode, and their service deployment priorities are different. The disaster response and switchover cycle is very long. The RTO and RPO targets cannot achieve zero service interruption, resource utilization is low, and the return on investment cannot meet the expectations. In essence, the two centers and three centers improve availability through simple resource accumulation. The improvement of high availability and the guarantee of business continuity are only quantitative changes, but business continuity and DISASTER recovery (Dr) backup have not been substantially improved.

At present, many industry users, such as banks, including government, public transportation, energy and power, are turning their attention to “distributed multi-active data center”.

The hypermetro scheme has two characteristics.

Is a more equal status between IDC center, working in normal mode, parallel provide services for business visit, achieved to make full use of resources, avoid one or two idle backup center, cause the waste of resources and investment, through resource integration, live data center service ability can double or even several times the master for data center mode; Second, in the case of a fault or disaster in one data center, other data centers can run normally and take over key services or all services. In this way, users are mutually backed up and are not aware of faults.Copy the code

Considering the current business operation of the company, IDC rooms are mainly located in XXXX and XXX, and some IDC rooms are also deployed in XXXX area. Data centers are mainly located in XXX. Therefore, same-city active-active is in line with the development trend in the two-site, three-center solution.

The design of the two-place three-center scheme requires not only the transformation of the database layer based on distribution, but also the adaptation of relevant schemes in the business layer, system layer and network layer.

This section describes the five solutions for the two centers and three centers

Goals and plans:

The design principles of the two centers and three centers are same-city active-active and remote Dr. HB30 and HB21 are set for same-city and IDCs in central or eastern China for remote Dr
The transformation design needs to closely cooperate with the service end to select an appropriate solution based on the service scenario
Considering cross-room support, consul solution needs to be introduced to implement high availability management of Service_name
Same-city hypermetro data must be consistent. Services are temporarily closed for remote Dr And can be quickly recovered within 30 minutes
Short-term goals and long-term goals can be set. Short-term goals can be implemented with full use of open source dividends and business scenarios, and continuous iterative improvement can be made during the implementation process. Long-term goals can be more general, more technical challenges, and better business results (such as live in different locations).
To ensure the program is effective, regular drills are needed

Project introduction

In the geo-redundant solution, you can determine the solution combination of same-city active-active and remote Dr Based on the short-term goals.

The design focuses on same-city hypermetro, that is, data centers in the same city are generally connected through high-speed optical fibers. On the premise that the network bandwidth is guaranteed, the network delay is within the acceptable range, and the two equipment rooms can be considered to be on the same LAN.

It can be combined with the application layer in design. There are two deployment modes: application layer Active-active and database active-active, application layer active-active and database active-active.

1). Application layer active-active and database active-active schemes:

Active-active at the application layer and single-active database: The applications in the two equipment rooms provide services externally at the same time, but only the database in one equipment room provides read and write services. The applications in the other equipment room need to access the database across the equipment room. One-way replication is performed between databases. This mode works well in the same city environment with relatively low network latency, but if the distance is more than 100 km, the network latency between machine rooms can exceed 2ms (or higher), which has a significant impact on the performance of database requests accessed across machine rooms.

In view of the city network latency is low, can be seen as the characteristics of the same local area network (LAN), the application of double single live live + database, application across the room to access the database, once a computer failure, will be another room application access request to switch to the computer room of the database, so as to realize any one data center city fails, will not affect the overall business operation.

Because the network conditions between cities are relatively good, the native replication mode of MySQL database can meet most service scenarios. The parallel replication introduced by MySQL 5.7 can effectively solve the problem of log back slowing down in the DISASTER recovery room. The MGR/InnoDB Cluster introduced in 5.7.17 can achieve strong data consistency requirements.

Solution 1: MGR cluster live architecture

The whole architecture is designed based on distributed scheme, and node communication is based on Paxos protocol. As the core component of InnoDB Cluster, MGR supports single-master mode and multi-master mode at present. This scheme gives priority to single-master mode, and the number of nodes is at least 2-9.

The mgr-based multi-activity design scheme is as follows: in the database layer, the weight of the instance node in the machine room is set first, and the machine room is switched to the same machine room. In the case of failure in the same machine room, the machine room is switched to another machine room in the same city.

The preceding solution has low implementation cost and less service intrusion, and is suitable for users in the initial stage of cross-equipment room Dr.

2) Application layer active-active and database active-active schemes

Application Layer hypermetro and Database hypermetro: Two room application provides services at the same time, two rooms of the database also provides read and write at the same time, each room of the database application to read and write in the same room, the bidirectional replication between two databases, two-way write conflict through agreement to solve the problem, the model theory to realize the database write more, but in the actual scene across the room, Especially in service scenarios with intensive write conflicts, the performance deteriorates greatly and is not applicable to cross-room OLTP systems.

In the hypermetro + database multi-activity scenario, data delay and data synchronization must be considered. First, business isolation is required. The goal of data consistency is final consistency. Currently, there are five types of solutions.

Solution 1: MGR cluster live architecture

Based on the multi-activity feature of MGR, data writing can be replicated among multiple nodes to meet the requirement of strong data consistency, and service degradation can be automatically realized in case of communication delay between nodes.

For this kind of scheme, we can use the same room write more, the same city different room read – only scheme.

Scheme 2: Distributed data synchronization

Based on the distributed design, syncer and Writer can be introduced to meet the multi-activity service requirements of the equipment room. Syncer and Writer are publishers and consumers of data and are processed based on distributed protocols.

There are three key techniques in the process:

1) Data processing is based on distributed ID, which can uniquely locate the data processing operation, and the operation has an increasing trend.

2) The stability of synchronous component. Synchronous component can be understood as a universal service. Data delay and data conflict processing mechanism of different machine rooms should be considered to ensure the stability and efficiency of synchronous component service.

3) High availability of synchronization components. The weight of synchronization components needs to be weighted according to service characteristics, considering the non-IDC services, and focusing on the data redundancy design of synchronization components to ensure timely data recovery in case of exceptions.

This solution is difficult to achieve in the short term, but in the long run, it can support the machine room to live more, and the service value is higher.

Scheme three: double master mode of multi – activity

In the original dual-master mode of the database, both nodes can write data to achieve cross-machine room data replication with low latency. Isolation is required at the service layer. When a fault occurs, data can be quickly switched to the Slave node in the same machine room.

This solution is practical for the scenario of two IDC rooms, but not suitable for the scenario of multiple IDC rooms.

Solution 4: Service interleaved hypermetro solution

This solution is an adaptive implementation of the hypermetro technology. That is, there are two types of services A and B. Data is stored at the Database schema level and data is written to disconnected IDC nodes respectively. When a problem occurs, you can switch to the specified IDC node by domain name.

This solution relies heavily on services and is not suitable for the scenario with multiple equipment rooms.

Solution 5: Upgrade solution based on NewSQL

You can refer to the NewSQL open source solution in the industry, which supports the MySQL protocol natively.

Such as PolarDB,Sequoia, TiDB, etc.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Two places and three centers, say no to “business interruption”!

This section describes the five solutions for the two centers and three centers

1). Application layer active-active and database active-active schemes:

Solution 1: MGR cluster live architecture

2) Application layer active-active and database active-active schemes

Solution 1: MGR cluster live architecture

Scheme 2: Distributed data synchronization

Scheme three: double master mode of multi – activity

Solution 4: Service interleaved hypermetro solution

Solution 5: Upgrade solution based on NewSQL

Two places and three centers, say no to “business interruption”!

This section describes the five solutions for the two centers and three centers

1). Application layer active-active and database active-active schemes:

Solution 1: MGR cluster live architecture

2) Application layer active-active and database active-active schemes

Solution 1: MGR cluster live architecture

Scheme 2: Distributed data synchronization

Scheme three: double master mode of multi – activity

Solution 4: Service interleaved hypermetro solution

Solution 5: Upgrade solution based on NewSQL

Related Posts

Advantages and disadvantages of microservices architecture

Share a collection of major CMS collection resource websites (latest in 2021)

Mastering these learning roadmap, entering BATJ annual salary of 300,000 to 400,000 is not a dream!!