This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

Virtualization can improve the reliability of the system, but can’t solve all problems, to improve the reliability of the key business application system, still need to use double machine cluster and fault-tolerant system, from the aspect of architecture, among them, the double machine cluster cluster software, high technical requirements, managing complex, when the system produces unexpected outage, fault is very difficult to locate, fault recovery will take time, There is no way for the system to provide uninterrupted services. The following are the physical architecture aspects.

Network biplane:

Network adapters, cabling, access switches, aggregation switches, and firewalls are all connected in two planes. This physically ensures that communication across the platform is reliable.

Communication sub-plane:

A virtualization platform can be logically divided into three planes: management plane, storage plane, and service plane. To ensure the reliability and security of data on various network planes, network planes are isolated by vlans. The failure of one plane does not affect the work of other planes.

Active and standby management nodes:

The active and standby management nodes use heartbeat detection on the management plane. The standby management node checks the health status of the active node in real time. Once the active management node is found to be faulty, the standby management node immediately takes over the services of the active management node and continues to provide services.

Flow control:

To provide users with stable and highly available concurrent services and avoid system crashes caused by heavy traffic, management nodes design a complete flow control mechanism for key system processes.

Fault detection:

The system provides fault detection and alarm functions, and includes a tool for displaying fault information in a Web browser. Once the cluster into the normal state, the system to provide the use of data visualization tool for the function of the cluster management and distribution of load, can help the user to determine whether there is a load balancing problem, out of control process or hardware performance decline trend, for reasonable adjustment, allocation of system resources, improve overall system performance play an important role.

Data consistency verification:

Virtualization provides the data consistency audit function. In addition to the self-audit and recovery capabilities provided by the system for key resources, the system periodically audits the data and status consistency of key resources, such as VMS, volumes, and networks. If exceptions are detected, the system automatically records the data and guides management personnel to rectify the faults.

Management data backup and recovery:

The system supports regular local and remote backup of management node configuration data and service data, and interconnection and configuration with a third-party FTP Server. If the management node service is abnormal and cannot be automatically restored, the local backup data can be used to restore the management node service immediately. If both management nodes fail due to a catastrophic fault and cannot be recovered by restarting, you can use the remote backup data to restore the node immediately, shortening the fault recovery time.

Get a thumbs up and a comment