Author: JackTian

Wechat Official Account: Jack’s IT Journey (ID: Jake_Internet)

This is the fifth day of my participation in the More text Challenge. For details, see more text Challenge


What is high availability?

High Availability is one of the factors that must be considered in the design of distributed system architecture. It usually refers to the design to reduce the time that the system cannot provide services.

How to measure high availability?

For example, if you have a system that’s been working for you all the time, it’s 100% available, and when you get to 100 units of time, and you’re not working for 1-2 units of time, it’s 99% and 98% available, A system with 99 percent availability over a one-year period can have up to 3.65 days of downtime (1 percent). These values are calculated based on several factors, including scheduled and unscheduled maintenance cycles, and time to recover from possible system failures.

The current high availability goal for most businesses is four nines, which is 99.99%. The number of nines represents your availability.

  • 2 nines: basically available, the site is unavailable for less than 88 hours per year;
  • 3 nines: highly available, site unavailable for less than 9 hours per year;
  • Four nines: high availability with automatic recovery, with the annual site unavailable for less than 53 minutes;
  • 5 nines: Extremely usable, that is, an ideal state where the site is unavailable for less than 5 minutes per year;

How do you calculate 9 for usability?

  • Time when the site was unavailable = point in time when the site was recovered – Point in time when the site was discovered
  • Annual site usability indicator = (1 – Time the site is unavailable/total time of year) * 100%

Usability assessment: Website usability is related to the performance assessment of various aspects such as technology, operation and so on. Therefore, in the early architecture design, the problem of system high availability was discussed for a large part. Different Internet companies have different strategies, and various factors will directly affect the high availability of the system. Business growth of the website will also be faced with the growth rate of users, but also slowly reduce the high availability standards, so it will do some relevant strategies or back-end equipment support for the website;

It is generally divided by faults, and it is also a method to calculate the fault responsibility by classifying and weighting faults on websites. Generally, A weight is assigned to each fault category (for example, the weight of A fault is 100, and that of A fault is 20). The calculation formula is as follows: Fault number = Fault time (minutes) x fault weight.

3. What is the purpose of the design of the high availability website architecture?

Frequent data reads and writes from cluster devices may cause hardware faults.

Its high availability architecture is designed to ensure that services are still available and data is still saved and accessible when server hardware fails.

What are the main means to achieve high availability?

  • Data layer: redundant backup

Once a server goes down, switch services to other available servers;

Redundancy backup is classified into cold backup and hot backup

Cold backup is periodic replication and cannot guarantee data availability.

Hot backup is divided into asynchronous hot backup and synchronous hot backup. Asynchronous hot backup means that data is written to multiple data copies asynchronously. Synchronous hot backup means that data is written to multiple data copies simultaneously.

  • Service level: failover

If a disk is damaged, data is read from the backup disk. (First, the data synchronization operation has been done in advance);

If any server in the data server cluster goes down, all application reads and writes to that server are rerrouted to other servers to ensure that data access does not fail.

5. Highly available applications

The application layer handles the business logic of web applications. The most notable feature is the stateless nature of applications.

Stateless applications are as follows: The application server does not store the context information of the business, but only processes the corresponding business logic according to the data submitted in each request. In addition, multiple service instances (servers) are completely peer, and the processing result is the same when the request is submitted to any server.

1) byLoad balancingPerform failover of stateless services

The application of not saving state brings great convenience to the high availability architecture. The server does not save the state of the request, and all servers are completely equal.

When one or more servers go down, the request is submitted to any other available server in the cluster for processing. To the client user, the request is always successful and the whole system is still available.

For application server clusters, the mechanism for real-time detection of server availability and automatic transfer of failed tasks is load balancing. When the traffic and data volume are frequently used, a single server cannot bear all the load pressure. In this case, you can use the load balancing method to average the traffic and data to other servers in the cluster to improve the overall load processing capability.

No matter whether the open-source free load balancing software or hardware devices are used in the future work, the failure transfer function is required. In the website application, when the servers in the cluster are stateless peers, the load balancing can play the role of high availability in fact.

When a Web server in the cluster server is available, the load balancing server will send the client to the access request distribution to any server for processing, appears when the server 2 is down at this moment, the load balancing server through the heartbeat detection mechanism found that the server response, will put it deleted from the server list, Instead of sending the request to other servers in the Web server cluster, these servers are exactly the same, and the final result is unaffected by the processing of the request in either server.

In a real environment, load balancing at the application layer have the effect of high availability of the system, even when an application traffic is small, only enough to support a server and provides the service, once high availability, the need to make sure that the service must be deployed at least two servers, to use load balancing technology to build a small Web server cluster.

2) Application server clusterSessionmanagement

In Web applications, the context object used for multiple requests for modification is called a Session. In single-node scenarios, a Session can be deployed on a server and managed by a Web container (such as IIS or Tomcat).

In a cluster environment where load balancing is used, the load balancing server may distribute requests to any application server in the Web server cluster, so ensuring that the correct Session is obtained on each request is much more complex than in a stand-alone environment.

In a clustered environment, there are several common methods of Session management:

  • Session replication

Session replication: Simple and easy to implement, it is a server cluster Session management mechanism widely used in early enterprise application systems. If Session replication is enabled for the Web container on the application server, Session objects will be synchronized between other servers in the cluster, so that each server will store the Session information of all users.

When any server in the cluster breaks down, the Session data will not be lost, and when the server uses Session, it only needs to obtain the Session data on the local computer.

Session replication is only suitable for small-scale cluster environment. When the scale is large, a large number of Session replication operations will occupy a large number of server and network resources, and the system will face great pressure.

The Session information of all users is backed up on each server. When a large number of users access, the server memory may even be insufficient for Session use. The core application cluster of large websites is more than thousands of servers, and the online users can reach tens of millions at the same time.

  • The Session binding

Session binding is implemented using the load balancing source address Hash algorithm. The load balancing server always sends requests from the same IP address to the same server. During the entire Session, all user requests are processed on the same server. The guarantee that the Session is always available on this server is referred to as Session stickiness.

However, Session binding does not meet the requirements for high availability of the system. Once a server breaks down, the Session on the machine will not exist, and the user’s request will be switched to another server, so business processing will not be completed without Session. Most load balancing servers provide source address load balancing algorithms, but few websites use this algorithm for Session management.

  • Cookie records the Session

Early enterprise applications used the C/S (Client/server) architecture. One way to manage sessions was to record sessions on the client. When the client requested the server, it sent the Session in the request to the server. After processing the request, the server responds with the modified Session to the client. Since the site does not have a client, the Session is logged using cookies supported by the browser.

Disadvantages of using browser-supported cookies to record sessions:

  • Limited by Cookie size, the information that can be recorded is limited
  • The Cookie needs to be transmitted on each request and response, affecting performance
  • If the user closes the Cookie, the access will be abnormal

Cookies are easy to use, high availability, and support linear scaling of application servers. Most applications need to record a small amount of Session information. Therefore, many websites will use cookies to record sessions.

  • The Session server

An independently deployed Session server or cluster is used to centrally manage sessions. The application server accesses the Session server every time it reads and writes a Session. In fact, the state of the application server is separated into stateless application server and stateful Session server, and their architectures are designed according to the different characteristics of the two servers.

For stateful Session server, distributed cache and database are used and encapsulated on the basis of these products to make it meet the storage and access requirements of Session. If the service scenario has high requirements on Session management and the Session service can integrate sso and user services, you need to develop the Session service management platform.

Vi. Highly available services

A highly available service is a service module that provides basic public services for business products. In large websites, these services are usually independently deployed and remotely invoked by specific applications. Reusable services, like applications, are stateless services that can be implemented using a fail-over strategy similar to load balancing.

In practice, there are several high availability service policies:

  • Hierarchical management: Operations on the server were classified management, better hardware used by the core applications and services first, the operational also exceptionally rapid response speed, at the same time also make the necessary isolation in the service deployment, avoid failure of a chain reaction, lower priority service by different threads or deployed in different virtual machine isolation, High-priority services need to be deployed on different physical machines, and core services and data need to be deployed in data centers in different regions.

  • Timeout setting: set the timeout time of the service invocation in the application. Once the timeout, the communication framework throws an exception, the application will choose to retry or transfer the request to other servers that provide the same service according to the service scheduling policy.

  • Asynchronous invocation: It is completed in asynchronous mode, such as message queue, to avoid the failure of a service resulting in the failure of the entire application request.

  • Service degradation: During the peak visit period, when a large number of concurrent service calls occur, the performance of the website will be degraded, which may lead to service breakdown. To ensure the normal operation of core applications and functions, the service must be degraded.

There are two ways to downgrade:

1. Denial of service: denies calls from applications with lower priorities to reduce the number of concurrent service calls and ensure the normal operation of core applications.

2. Close the function, close some unimportant services, or close some unimportant functions within the service, in order to save the system overhead, and allocate resources for the core application services;

  • Idempotent designAfter the application fails to invoke the service, it will re-send the call request to other servers, which is unavoidable when the service is repeatedly invoked. The application layer does not care whether your service really fails, as long as there is no response to the successful invocation, it will consider the invocation failed and retry the service invocation. So it has to be at the service layerThe service is idempotent by ensuring that the result of repeated invocation of the service is the same as that of one invocation.

7. Common Layered Architecture of the Internet

The high availability of the whole system is achieved through the redundancy and automatic failover of each layer. The common distributed architecture of the Internet is as follows:

  • Client layer: The typical caller is a browser or mobile application
  • Reverse proxy layer: system entry, reverse proxy
  • Site application layer: Implements the core application logic and returns HTML or JSON
  • Service layer: This layer exists if you implement servitization
  • Data-cache layer: Cache speeds access to storage
  • Data-database layer: Database-cured data store

Layered high availability architecture

Client layer -> High availability of reverse proxy layer

High availability from the client layer to the reverse proxy layer is achieved by redundancy of the reverse proxy layer. Take the Nginx service as an example: two Nginx services need to be prepared, one for online services, the other for redundancy to ensure high availability, common practice is keepalived survival detection, the same virtual IP (virtual IP) to provide services.

Automatic failover: When a Nginx outage is detected by Keepalived, it is automatically failover, using the same virtual IP, and the switchover process is transparent to the caller.

High availability of reverse proxy layer -> site layer

High availability from the reverse proxy layer to the site layer is achieved through redundancy at the site layer. The reverse proxy layer is Nginx. Nginx.conf can be configured with multiple Web backends, and Nginx can detect the viability of multiple backends.

Automatic failover: When a Web-server is down, Nginx can detect it and automatically failover the traffic to another Web-server. The whole process is automatically completed by Nginx and is transparent to the caller.

Site layer -> High availability of service layer

High availability from the site layer to the service layer is achieved by redundancy of the service layer. The Service Connection Pool establishes multiple connections to downstream services, and each request selects connections “randomly” to access downstream services.

Automatic failover: When a service is down, the service-connection-pool detects the failure and automatically transfers traffic to other services. The connection pool automatically completes the process. It is transparent to the caller (the service connection pool in RPC-client is an important underlying component).

Service layer -> Cache layer high availability

High availability from the service layer to the cache layer is achieved through the redundancy of the cache data. The data redundancy of the cache layer can be achieved by using the encapsulation of the client, and the service can double read or double write the cache.

The cache layer can also solve the high availability problem of the cache layer through the cache cluster that supports master/slave synchronization. Redis naturally supports master/slave synchronization, and Redis also has sentinel mechanism to detect the survival of Redis.

Automatic failover: Sentinel can detect when the Primary Redis is down and notify the caller to access a new Redis. The process is done transparently by the Sentinel and Redis clusters.

Service layer -> the high availability of database layer, most of the Internet database layer will use the master and slave replication, read and write separation architecture, so the high availability of database layer is divided into read library high availability and write library high availability two categories.

Service layer -> Database layerreadThe high availability of

The high availability of reads from the service layer to the database is achieved by the redundancy of the read libraries. Generally speaking, there are at least two slave libraries. The database connection pool will establish multiple connections to the read libraries, and each request will be routed to these read libraries.

Automatic failover: When one read library is down, db-connection-pool can detect the failure and automatically migrate the traffic to another read library. The process is automatically completed by the connection pool and transparent to the caller. The database connection pool is an important basic component.

Service layer -> Database layerwriteThe high availability of

Service layer to the database write high availability, is through the redundancy of the write library to achieve, you can set up two MySQL dual master synchronization, one on line to provide services, the other one to do redundancy to ensure high availability, common practice is keepalived survival detection, the same virtual IP (virtual IP) to provide services.

Failover: When the library is down, Keepalived can detect it and automatically failover, migrating traffic to Shadow-db-Master using the same virtual IP. The switchover process is transparent to the caller.

Preparations for configuring ha

1. Prepare two Nginx servers (IP addresses: 192.168.1.10 and 192.168.1.11), install Keepalived on the two Nginx servers, and configure virtual IP addresses.

2. Nginx server 192.168.1.10 is installed on the server 192.168.1.11, so there is no need to reinstall it. The Nginx series (a) | taught you how to setup Nginx services under Linux environment “;

3. Install Keepalived service on two Nginx servers by using RPM packages or yum one click. Both installation methods are available.

RPM # yum -y install keepalived-1.2.7-3.el6.x86_64. RPM # yum -y install keepalived-1.2.7-3.el6.x86_64. RPM # yum -y install keepalivedCopy the code

4. Start Nginx service and Keepalived service on two Nginx servers.

# CD /usr/local/nginx-sbin #./nginx # service keepalived start keepalived:Copy the code

5. Enter 192.168.1.10 and 192.168.1.11 in the browser of the client to check whether the Nginx service can be accessed.

Example for configuring high availability in active/standby mode

Main preparation plan: this plan is also commonly used in the present enterprise a high availability solution, in simple terms, means a server when providing the service, another server for other services and standby state, when a server is down, will automatically jump to the standby server, so the client request will not be given out by the failure phenomenon.

In the above preparations introduced to the configuration of high availability will be implemented using Keepalived, so what is Keepalived?

Keepalived is a server status detection and failover tool, originally designed for LVS load balancing software to manage and monitor the status of various service nodes in the LVS cluster system. Later, the Virtual Router Redundancy Protocol (VRRP) function was added to achieve high availability.

Therefore Keepalived can be used as a high availability solution software for other services (e.g. Nginx, Haproxy, MySQL, etc.) in addition to managing LVS software, Keepalived can be configured in its configuration file with active/standby servers and status detection requests for this server. If the request returns a status code of 200, then the server is normal. If not, then keepalived will take the server offline and set the standby server to online. When the active server node recovers, the standby server node releases IP resources and services that it has taken over when the active node fails, and restores to the original standby role.

1. Configure the keepalived. Conf configuration file on the primary server

# vim /etc/keepalived/keepalived.conf 1 global_defs { 2 notification_email { 3 [email protected] 4 [email protected] 5 [email protected] 6 } 7 notification_email_from [email protected] 8 smtp_server 192.168.1.10 # IP address of the primary server 9 smtp_connect_timeout 30 10 Router_id LVS_DEVEL 11} 12 13 VRrp_script chk_HTTP_port {14 15 Script "/usr/local/src/nginx_check.sh" # nginx_check.sh script path 16 17 interval 2 # Check interval 18 19 weight 2 20 21} 22 23 Vrrp_instance VI_1 {24 state MASTER # 26 virtual_Router_id 51 # 27 priority 90 # specifies the priority of the current node. A larger value will give the node a higher priority. 28 advert_int 1 # Specifies the interval for sending VRRP adverts, in seconds 29 authentication {30 auth_type PASS # Authentication, The default value is 31 auth_pass 1111 # authentication password 32} 33 virtual_ipadress {34 192.168.1.100 # virtual IP address 35} 36}Copy the code

2. Configure the keepalived. Conf configuration file from the server

# vim /etc/keepalived/keepalived.conf 1 global_defs { 2 notification_email { 3 [email protected] 4 [email protected] 5 [email protected] 6 } 7 notification_email_from [email protected] 8 smtp_server 192.168.1.10 # IP address of the primary server 9 smtp_connect_timeout 30 10 Router_id LVS_DEVEL 11} 12 13 VRrp_script chk_HTTP_port {14 15 Script "/usr/local/src/nginx_check.sh" # nginx_check.sh script path 16 17 interval 2 # Check interval 18 19 weight 2 20 21} 22 23 Vrrp_instance VI_1 {24 state BACKUP # specify the current node as the BACKUP node 25 interface eth1 # 26 virtual_Router_id 51 # 80 # specifies the priority of the current node. A larger value will give the node a higher priority. 28 advert_int 1 # Specifies the interval for sending VRRP adverts, in seconds 29 authentication {30 auth_type PASS # Authentication, The default value is 31 auth_pass 1111 # authentication password 32} 33 virtual_ipadress {34 192.168.1.100 # virtual IP address 35} 36}Copy the code

The nginx_check.sh script is placed in /usr/local/src/on the two Nginx servers. It is used to check whether the Nginx master server is alive by Keepalived.

# vi /usr/local/src/nginx_check.sh 1 #! 2 A = / bin/bash ` ps - C nginx ¨ Cno - the header | wc -l ` 3 if [$A - eq 0]; then 4 /usr/local/nginx/sbin/nginx 5 sleep 2 6 if [ `ps -C nginx --no-header |wc -l` -eq 0 ]; then 7 killall keepalived 8 fi 9 fiCopy the code

4. Configure virtual IP addresses on the two Nginx servers. In response to the client, the real server directly returns the response to the client.

Primary server # ifconfig eth1:1 192.168.1.100 netmask 255.255.255.0 Secondary server # ifconfig eth1:1 192.168.1.100 netmask 255.255.255.0Copy the code

Restart Nginx server and Keepalived server on two Nginx servers.

# ./nginx -s stop
# ./nginx
# service keepalived restart
Copy the code

6. Enter the virtual IP address 192.168.1.100 in the address box of the browser on the client.

Simulate the primary server failure to verify the effect of high availability

Stop both Nginx and Keepalived services on the master server.

#./nginx -s stop service keepalived stop keepalived:Copy the code

Enter the virtual IP address 192.168.1.100 in the browser of the client again for verification, and you can find that the Nginx service can be accessed normally. In other words, when the primary server is down, it is automatically switched to the secondary server, so it is not affected by the client access.

conclusion

Through this article introduces what is the high availability, how to measure the high availability, high availability website architecture design and realize the purpose of the principal means of high availability, high availability of applications and services, common Internet layered architecture, hierarchical, a high availability architecture, configuration, high availability of preparation, the main standby mode field of the high availability case and the main service simulation to verify the high Available effects.

The high availability of the entire Internet layered system architecture is achieved by redundancy + automatic failover at each layer, specifically:

  • High availability from [client layer] to [Reverse proxy layer] : This is achieved through redundancy at the reverse proxy layer, a common practice is keepalived + virtual IP failover;

  • High availability from [reverse proxy layer] to [site layer] : It is achieved through redundancy at the site layer. The common practice is survival detection and automatic failover between Nginx and web-server.

  • High availability from [site layer] to [service layer] : It is achieved through redundancy of the service layer. A common practice is to ensure automatic failover through service-connection-pool.

  • High availability from [service layer] to [cache layer] : It is achieved through the redundancy of cache data. Common practices are double read and double write for cache clients, or using the primary/secondary data synchronization with Sentinel and automatic failover of cache cluster. More business scenarios that do not have high availability requirements for caching can use caching servitization to mask the underlying complexity to callers;

  • High availability from [service layer] to [database “reads”] : It is achieved by redundancy of read libraries. A common practice is to ensure automatic failover through DB-connection-pool.

  • High availability [service layer] to [database “write”] : this is achieved by redundancy of write libraries, common practice is keepalived + virtual IP failover;


Recommended reading

Nginx series (a) | taught you how to setup Nginx services under Linux environment

Nginx series (2) | article take you read Nginx forward and reverse proxy

Nginx series (3) | show you read Nginx load balancing

Nginx series (4) | show you read Nginx noise separation


Original is not easy, if you think this article is a little useful to you, please give me a like, comment or forward for this article, because this will be my power to output more quality articles, thanks!

See you next time!