Although HAProxy is very stable, it cannot avoid the risks of operating system failures, host hardware failures, network failures, and even power outages. Therefore, a high availability solution must be implemented for HAProxy.

The following will introduce the HAProxy hot standby solution using Keepalived. That is, two HAProxy instances on two hosts are online at the same time. The HAProxy instance with a higher weight is the MASTER instance. If the MASTER instance fails, the other HAProxy instance automatically takes over all traffic.

The principle of

There is a Keepalived instance running on two HAProxy hosts. The two Keepalived instances compete for the same virtual IP address, and the two HaProxies attempt to bind a port on the same virtual IP address. Obviously, only one Keepalived host can grab the VIP at a time, and the HAProxy on the Keepalived host is the current MASTER. Keepalived maintains an internal weight value, and the Keepalived instance with the highest weight value can grab the virtual IP. Keepalived also checks the HAProxy status on the host phone periodically. When the status is OK, the weight value increases.

Haproxy in my other articles have a detailed introduction, here is just to introduce keepalived information

Keepalived profile

Keepalived is a cluster management service that ensures high cluster availability. It functions like heartbeat to prevent a single point of failure.

Keepalived is based on Virtual Router Redundancy Protocol (VRRP), which stands for Virtual Router Redundancy Protocol.

Virtual route redundancy protocol can be considered as a protocol to realize router high availability. That is, N routers providing the same function form a router group, in which there is a master and multiple backup. The master has a VIP that provides services (the default route is this VIP for other machines in the LAN where the router is located). The master sends multicast packets. If the backup fails to receive VRRP packets, the master considers that it has broken down. In this case, you need to elect a backup as the master based on the VRRP priority. This ensures that the router is highly available.

Keepalived has three main modules, namely core, Check and VRRP. The core module is the core of Keepalived, which is responsible for the startup and maintenance of the main process as well as the loading and parsing of the global configuration file. Check is responsible for health checks, including common checks. The VRRP module implements the VRRP protocol.

Keepalived configuration

Keepalived has only one configuration file, Keepalive.conf, which consists of the following configuration areas: They are global_defs, static_ipaddress, static_routes, vrrp_script, vrrp_instance, and virtual_server.

Global_defs area

The main task is to configure the notification object and machine id when a fault occurs

global_defs {
    notification_email {
        [email protected]
        [email protected]
        ...
    }
    notification_email_from [email protected]
    smtp_server smtp.abc.com
    smtp_connect_timeout 30
    enable_traps
    router_id LVS_DEVEL
}
Copy the code
  • Notification_email Who sends email notification when a fault occurs.
  • Notification_email_from The address from which notification emails are sent.
  • Smpt_server SMTP address of the notification email.
  • Smtp_connect_timeout Timeout period for connecting to the SMTP server.
  • Enable_traps Enables Simple Network Management Protocol (SNMP) traps.
  • Router_id Specifies the node ID. It is usually, but not necessarily, hostname. When a fault occurs, email notifications are used.

The static_ipaddress and static_routes areas

For static_ipaddress and static_routes, the IP address and route information of this node are configured. If IP and routing are already configured on your machine, these two zones can be omitted. In fact, your machine usually has IP address and routing information, so there is no need to configure these two areas.

Static_ipaddress {10.210.214.163/24 BRD 10.210.214.255 dev eth0... } static_routes {10.0.0.0/8 via 10.210.214.1 dev eth0... }Copy the code

The following commands are executed on the machine to enable/disable Keepalived respectively:

# /sbin/ip addr add 10.210.214.163/24 brd 10.210.214.255 dev eth0
# /sbin/ip route add 10.0.0.0/8 via 10.210.214.1 dev eth0
# /sbin/ip addr del 10.210.214.163/24 brd 10.210.214.255 dev eth0
# /sbin/ip route del 10.0.0.0/8 via 10.210.214.1 dev eth0
Copy the code

Note: Please ignore these two areas, as I’m sure your machine is already configured with IP and routing.

Vrrp_script area

Keepalived can only monitor network failure and Keepalived itself, that is, when there is a network failure or keepalived itself has a problem, switch. But these are not enough, we also need to monitor keepalived on the server of other business processes, such as HaProxy, Keepalived + HaProxy implementation of haProxy load balancing high availability, if haProxy exception, only keepalived normal. Therefore, you need to determine whether to perform an active/standby switchover based on the running status of service processes. At this point, we can write scripts to detect and monitor business processes.

vrrp_script check_haproxy {
        script  "/etc/keepalived/bin/keepalived_check.sh"
        interval 5
        fall 3
        rise 1
}
Copy the code

It is used for health check and IP address drift in combination with other configurations. Fail 3 indicates that the node is unavailable if the node fails three times every 5s. Rise 1 indicates that the node is available if the node is successfully checked once every 5s

Priority update policy

If the weight is configured, the following is displayed:

vrrp_script checkhaproxy
{
    script "/etc/check.sh"
    interval 3
    weight -20
}
Copy the code

Keepalived executes scripts periodically and analyzes the results of script execution, dynamically adjusting the priority of VRrP_instance.

  • If the script execution result is 0 and the weight configuration value is greater than 0, the priority is increased accordingly
  • If the script execution result is non-zero and the weight configuration value is less than 0, the priority is reduced accordingly
  • In other cases, keep the original priority, that is, the value of priority in the configuration file.

Here are some things to note:

  • 1) The priority will constantly increase or decrease
  • 2) You can write multiple detection scripts and set different weights for each detection script
  • 3) Regardless of whether the priority is raised or lowered, the final priority is in the range of [1,254], and the priority will not be less than or equal to 0 or greater than or equal to 255

In this way, scripts can be used to detect the status of service processes and dynamically adjust priorities to achieve active/standby switchover.

Switching strategy

In a Keepalived cluster, there are no strictly primary and secondary nodes. You can set the “state” option to “MASTER” in the Keepalived configuration file, but this does not mean that the node is always the MASTER. The “priority” value in the Keepalived configuration file controls the node role, but it does not control all node roles. Another option that can change the node role is the “weight” value set in the VRrP_script module. Both options correspond to an integer value. The “weight” value can be a negative integer, and a node’s role in the cluster is determined by the size of these two values.

Don’t set the weight

In the VRrP_script module, if the “weight” option is not set, the cluster priority is determined by the “priority” value in the Keepalived configuration file. When flexible control of the cluster priority is needed, This can be done by setting the “weight” value in the VRrp_script module.

Set up the weight

If script in vrrp_script returns 0, the detection is considered successful, and other values are considered failed.

If weight is positive, the weight will be added to priority when the script detects success, but not when the script detects failure.

  • Primary failure: primary < secondary priority + weight is switched.

  • Primary success: When primary priority + weight > secondary priority + weight, the primary is still primary

If weight is negative, priority is not affected if the script succeeds. If the script fails, priority – abs(weight)

  • Primary failure: Primary priority – Abs (weight) is less than secondary priority
  • Primary success: Primary priority > Secondary priority Primary still primary

Vrrp_instance and VRrP_SYNc_group areas

Vrrp_instance Defines the VIP area that provides services externally and its related attributes.

Vrrp_sync_group Is used to define a VRrp_INTtance group, whose members have the same actions. Here’s an example of what it does:

Both VRrp_instances belong to the same VRRP_SYNc_group. If failover occurs in one VRRP_instance, the other VRrp_instance will also be switched (even if this instance does not fail).

vrrp_sync_group VG1 { group { VI_1 } notify_backup "/etc/keepalived/bin/notify_backup.sh" notify_master "/etc/keepalived/bin/notify_master.sh" notify_fault "/etc/keepalived/bin/notify_fault.sh" } vrrp_instance VI_1 { state BACKUP nopreempt interface eth0 virtual_router_id 254 priority 80 advert_int 1 authentication { auth_type AH auth_pass K! Track_script virtual_ipaddress f26b90a0743cdf9591024c5e533b7152} {30.3.3.61/16} {check_haproxy}}Copy the code
  • Notify_master /backup/ Fault indicates the scripts executed when the switchover is primary, standby, or error, respectively.
  • State can be MASTER or BACKUP, but keepalived starts with other nodes and elects the node with higher priority as MASTER, so it has no real use.
  • Interface Indicates the network adapter with the inherent IP address (non-VIP) of the node, which is used to send VRRP packets.
  • The value of virtual_router_id ranges from 0 to 255 and is used to distinguish VRRP multicast groups of multiple instances. Note: The value of virtual_Router_id on the same network segment cannot be the same. Otherwise, an error may occur.
  • Priority is used to elect the master. To be master, the value must be 50 points higher than that of other machines. The value ranges from 1 to 255 (outside this range, it is recognized as the default value of 100).
  • Advert_int Interval for sending VRRP packets. That is, how often a master election takes place (think of it as a health check interval).
  • Authentication Authentication zone. The authentication type can be PASS or HA (IPSEC). PASS is recommended (only the first eight digits of the password are identified).
  • Virtual_ipaddress, no explanation.
  • Nopreempt allows a node with a lower priority to be the master, even if a node with a higher priority is started. First of all,Nopreemt must be on the node whose state is BACKUPIt is the BACKUP node that decides whether to become MASTER or not, and then it is something like shutdownauto failbackFor all nodes, set state to BACKUP, or set priority to lower than BACKUP for master nodes. I personally recommend setting the state of all nodes to BACKUP and the nopreempt option to complete the autofailback function. To manually switch a node to MASTER, simply remove nopreempt from that node and change priority to a higher value than the other nodes, then reload the configuration file (wait for the MASTER to cut back and reload the configuration file again).

The main/backup

Keepalived has the highest priority as MASTER. One of the responsibilities of the MASTER is to respond to VIP ARP packets and inform other hosts on the LAN of the MAPPING between VIP and MAC addresses. In addition, the MASTER sends A MULTICAST VRRP notification (destination address 224.0.0.18) to the LAN to inform its priority.

All the BACKUP nodes on the network are only responsible for processing multicast packets sent by the MASTER. If the MASTER has a lower priority than the MASTER or does not receive the VRRP notification from the MASTER, the BACKUP node switches to the MASTER state and performs the following functions:

  • 1. Respond to ARP packets.

  • 2. Send VRRP notification.

In addition, when the network does not support multicast (for example, in some cloud environments) or network partitions occur, keepalived BACKUP nodes cannot receive VRRP notifications from the MASTER, resulting in split brain. In this case, multiple MASTER nodes exist in the cluster.

You can perform an active/standby switchover in the following ways

1. Use the ability of Keepalive itself to realize IP drift 2.

  • Check whether the floating IP address of the node is connected
  • Check whether the HaProxy process exists
  • Operational inspection

Avoid brain split by adding a callback script to notice_backup/fault/ Master:

  • Delete or add a floating IP address