“This is the 12th day of my participation in the August More Text Challenge. For details, see: August More Text Challenge.”

1. Environment construction

1.1 Configuring Key Soft connections

ln -s /data/mysql/bin/mysqlbinlog    /usr/bin/mysqlbinlog
ln -s /data/mysql/bin/mysql          /usr/bin/mysql
Copy the code

1.2 Configuring Mutual Trust among Nodes

Db01: SSH mv id_rsa.pub authorized_keys SCP -r /root/. SSH 10.0.0.52:/root SCP -r SSH 10.0.0.53:/root Db01: SSH 10.0.0.51 date SSH 10.0.0.52 date SSH 10.0.0.53 date DB02: SSH 10.0.0.51 date SSH 10.0.0.52 date SSH 10.0.0.53 date DB03: SSH 10.0.0.51 date SSH 10.0.0.52 date SSH 10.0.0.53 dateCopy the code

1.3 Downloading the MHA Software

Mha website: https://code.google.com/archive/p/mysql-master-ha/ making download address: https://github.com/yoshinorim/mha4mysql-manager/wiki/DownloadsCopy the code

1.4 Install Node software dependency packages on all nodes

Yum install perl-dbd -mysql -y RPM -ivh mha4mysql-nod-0.56-0.el6.noarch.rpmCopy the code

1.5 Creating MHA users in the DB-master primary database

Grant all privileges on *.* to Mha @'10.0.0.%' IDENTIFIED by 'mha';Copy the code

1.6 Installing Manager Software (db-Slave02)

yum install -y perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes rpm -ivh Mha4mysql - manager - 0.56-0. El6. Noarch. RPMCopy the code

1.7 Preparing a Configuration File (db-slave02)

Mkdir -p /var/log/mha/app1 Edit the mha configuration file vim /etc/mha/app1.cnf [server default] manager_log=/var/log/mha/app1/manager manager_workdir=/var/log/mha/app1 master_binlog_dir=/data/binlog user=mha Password =mha ping_interval=2 REPL_password =123 REPL_user =repl ssh_user=root [server1] Hostname =10.0.0.51 port=3306 [Server2] Hostname =10.0.0.52 Port =3306 [Server3] Hostname =10.0.0.53 port=3306Copy the code

1.8 Status Check

Masterha_check_ssh --conf=/etc/mha/app1.cnf Fri Apr 19 16:39:34 2019 - [Warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Fri Apr 19 16:39:34 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Fri Apr 19 16:39:34 2019 - [info] Reading server configuration from /etc/mha/app1.cnf.. Fri Apr 19 16:39:34 2019 - [info] Starting SSH connection tests.. Fri Apr 19 16:39:35 2019 - [debug] Fri Apr 19 16:39:34 2019 - [debug] Connecting via SSH from [email protected] (10.0.0.51:22) to [email protected] (10.0.0.52:22). Fri Apr 19 16:39:34 2019 - [debug] ok. Fri Apr 19 16:39:34 2019 - [debug] Connecting via SSH from [email protected] (10.0.0.51:22) to [email protected] (10.0.0.53:22). Fri Apr 19 16:39:35 2019 - [debug] ok. Fri Apr 19 16:39:36 2019 - [debug] Fri Apr 19 16:39:35 2019 - [debug] Connecting Via SSH from [email protected](10.0.0.52:22) to [email protected](10.0.0.51:22).. Fri Apr 19 16:39:35 2019 - [debug] ok. Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from [email protected] (10.0.0.52:22) to [email protected] (10.0.0.53:22). Fri Apr 19 16:39:35 2019 - [debug] ok. Fri Apr 19 16:39:37 2019 - [debug] Fri Apr 19 16:39:35 2019 - [debug] Connecting Via SSH from [email protected](10.0.0.53:22) to [email protected](10.0.0.51:22).. Fri Apr 19 16:39:35 2019 - [debug] ok. Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from [email protected] (10.0.0.53:22) to [email protected] (10.0.0.52:22). Fri Apr 19 16:39:36 2019 - [debug] ok. Fri Apr 19 16:39:37 2019 - [info] All SSH connection tests passed successfully.Copy the code

1.9 Checking the Primary/Secondary Status

[root@db03 ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf Fri Apr 19 16:39:34 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Fri Apr 19 16:39:34 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Fri Apr 19 16:39:34 2019 - [info] Reading server configuration from /etc/mha/app1.cnf.. Fri Apr 19 16:39:34 2019 - [info] Starting SSH connection tests.. Fri Apr 19 16:39:35 2019 - [debug] Fri Apr 19 16:39:34 2019 - [debug] Connecting via SSH from [email protected] (10.0.0.51:22) to [email protected] (10.0.0.52:22). Fri Apr 19 16:39:34 2019 - [debug] ok. Fri Apr 19 16:39:34 2019 - [debug] Connecting via SSH from [email protected] (10.0.0.51:22) to [email protected] (10.0.0.53:22). Fri Apr 19 16:39:35 2019 - [debug] ok. Fri Apr 19 16:39:36 2019 - [debug] Fri Apr 19 16:39:35 2019 - [debug] Connecting Via SSH from [email protected](10.0.0.52:22) to [email protected](10.0.0.51:22).. Fri Apr 19 16:39:35 2019 - [debug] ok. Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from [email protected] (10.0.0.52:22) to [email protected] (10.0.0.53:22). Fri Apr 19 16:39:35 2019 - [debug] ok. Fri Apr 19 16:39:37 2019 - [debug] Fri Apr 19 16:39:35 2019 - [debug] Connecting Via SSH from [email protected](10.0.0.53:22) to [email protected](10.0.0.51:22).. Fri Apr 19 16:39:35 2019 - [debug] ok. Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from [email protected] (10.0.0.53:22) to [email protected] (10.0.0.52:22). Fri Apr 19 16:39:36 2019 - [debug] ok. Fri Apr 19 16:39:37 2019 - [info] All SSH connection tests passed successfully. [root@db03 ~]# masterha_check_repl --conf=/etc/mha/app1.cnf Fri Apr 19 16:40:50 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Fri Apr 19 16:40:50 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Fri Apr 19 16:40:50 2019 - [info] Reading server configuration from /etc/mha/app1.cnf.. Fri Apr 19 16:40:50 2019 - [info] MHA::MasterMonitor version 0.56. Fri Apr 19 16:40:51 2019 - [info] GTID Failover mode = 1 Fri Apr 19 16:40:51 2019 - [info] Dead Servers: Fri Apr 19 16:40:51 2019 - [info] Alive Servers: Fri Apr 19 16:40:51 2019 - [info] 10.0.0.51(10.0.0.51:3306) Fri Apr 19 16:40:51 2019 - [info] 10.0.0.52(10.0.0.52:3306) Fri Apr 19 16:40:51 2019 - [info] All Slaves Fri Apr 19 16:40:51 2019 - [Info] 10.0.0.52(10.0.0.52:3306) Version=5.7.20-log (Simple major Version between Slaves) log-bin:enabled Fri Apr 19 16:40:51 2019 - [info] GTID ON Fri Apr 19 16:40:51 2019 - [info] Replicating from 10.0.0.51(10.0.0.51:3306) Fri Apr 19 16:40:51 2019 - [Info] 10.0.0.53(10.0.0.53:3306) Version=5.7.20-log (10.0.0.53:3306) in simple major version between slaves) log-bin:enabled Fri Apr 19 16:40:51 2019 - [info] GTID ON Fri Apr 19 16:40:51 2019 - [info] Replicating from 10.0.0.51(10.0.0.51:3306) Fri Apr 19 16:40:51 2019 - [info] Current Alive Master: 10.0.0.51(10.0.0.51:3306) Fri Apr 19 16:40:51 2019 - [INFO] Checking Slave Configurations.. Fri Apr 19 16:40:51 2019 - [info] read_only=1 is not set on slave 10.0.0.52(10.0.0.52:3306). Fri Apr 19 16:40:51 2019 - [info] read_only=1 is not set on slave 10.0.0.53(10.0.0.53:3306). Fri Apr 19 16:40:51 2019 - [info] Checking Replication  filtering settings.. Fri Apr 19 16:40:51 2019 - [info] binlog_do_db= , binlog_ignore_db= Fri Apr 19 16:40:51 2019 - [info] Replication filtering check ok. Fri Apr 19 16:40:51 2019 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Fri Apr 19 16:40:51 2019 - [info] Checking SSH publickey authentication settings on the current master.. Fri Apr 19 16:40:51 2019 - [info] HealthCheck: SSH to 10.0.0.51 is reachable. Fri Apr 19 16:40:51 2019 - [Info] 10.0.0.51(10.0.0.51:3306) (current master) +--10.0.0.52(10.0.0.52:3306) +--10.0.0.53(10.0.0.53:3306) Fri Apr 19 16:40:51 2019 - [INFO] Checking Replication Health On 10.0.0.52.. Fri Apr 19 16:40:51 2019 - [info] OK. Fri Apr 19 16:40:51 2019 - [info] Checking Replication health on 10.0.0.53.. Fri Apr 19 16:40:51 2019 - [info] ok. Fri Apr 19 16:40:51 2019 - [warning] master_ip_failover_script is not defined. Fri  Apr 19 16:40:51 2019 - [warning] shutdown_script is not defined. Fri Apr 19 16:40:51 2019 - [info] Got exit code 0 (Not  master dead). MySQL Replication Health is OK.Copy the code

1.10 Enabling MHA(db-slave02) :

nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
Copy the code

1.11 Checking the MHA Status

[root@db03 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:4719) is running(0:PING_OK), master:10.0.0.51
[root@db03 ~]# mysql -umha -pmha -h 10.0.0.51 -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id     | 51    |
+---------------+-------+
[root@db03 ~]# mysql -umha -pmha -h 10.0.0.52 -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id     | 52    |
+---------------+-------+
[root@db03 ~]# mysql -umha -pmha -h 10.0.0.53 -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id     | 53    |
+---------------+-------+
Copy the code

2. Description of the MHA architecture software structure

2.1 Node Planning

Database node, must be at least 1 master and 2 slave independent instances. (Multiple instances per machine not supported)

MHA management node, preferably an independent machine. The MHA architecture is as follows:

  • Node planning
    • The manager end: db03
    • Node: DB01, DB02, and DB03
      • Main: db01
      • Two: DB02, DB03

2.2 Composition of MHA software

  • The Manager toolkit includes the following tools:

    • Masterha_manager: Starts the MHA
    • Masterha_check_ssh: checks the SSH configuration of the MHA
    • Masterha_cheak_repl: Checks the replication status of MySQL
    • Masterha_master_monitor: detects whether the master is down
    • Masterha_check_status: Checks the current MHA running status
    • Masterha_master_switch: Control failover (automatic or manual)
    • Masterha_conf_host: Adds or deletes configured server information
  • The Node toolkit consists of the following tools (these tools are usually triggered by MHA Manager scripts without human intervention) :

    • Save_binary_logs: Saves and copies the binary information of the master
    • Apply_diff_relay_log: The relay log event that identifies the difference and applies the time of its difference to the others
    • Purge_relay_logs: Clears relay logs (does not block SQL threads)

3. Details about the MHA configuration process

3.1 soft connection

ln -s /application/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
ln -s /application/mysql/bin/mysql /usr/bin/mysql
Copy the code

/usr/bin/mysqlbinlog /mysqlbinlog /mysqlbinlog /mysqlbinlog /mysqlbinlog /mysqlbinlog /mysqlbinlog

3.2 Configuring Mutual Trust

3.3 Installing Software Packages

3.4 Creating MHA Users on the DB01 primary database

3.5 Installing the Manager Software (DB03)

Preferably on a new machine or, if not, on a slave library.

Cannot be installed on the primary database, because the Manager is used to monitor the primary database if the primary database is powered off. Then there’s no way to monitor it.

3.6 Preparing configuration Files

It should be the same except for the server_id.

3.7 Status Check

3.8 Enabling the MHA function (DB03)

4. Explain the MHA Failover process

4.1 What Is Failover?

  • failover
  • Process when the primary database breaks down until services are restored (automatic)

4.2 What does Failover allow you to do?

  1. Quick monitoring of primary database downtime: mysqladmin ping

  2. Choosing a new master

    1. show slave status\G
      1. Retrieved_Gtid_Set
      2. Executed_Gtid_Set
  3. Data compensation (prerequisite: SSH connected)

  4. Deactivate the slave database

  5. The remaining slave libraries and the new master library build a master-slave relationship

  6. The application of transparent

  7. Failed node self-healing (to be developed…)

  8. Failure to remind

4.3 How Can I Implement the MHA Failover?

Start -> Fault -> Failover -> Service recovery

  1. MHA uses the masterHA_manager script to start the MHA function

  2. Before manager starts, SSH trust (masterha_check_SSH) and master/slave status (masterha_check_repl) are automatically checked.

  3. Mha-manager runs the masterha_master_monitor script (every ping_interval seconds)

  4. The masterha_MASTER_monitor detects that the primary database has no heartbeat for three times and considers that the primary database is down.

  5. Conduct the main selection process

    1. Algorithm is a:
      1. Read whether the primary parameter is mandatory in the configuration file

      2. candidate_master=1

      3. If you want to preselect that as the primary database, set it under the configuration

      4. check_repl_delay=0

        1. The check_REPL_delay is used to check for differences between the primary and secondary libraries, and if the differences are too large, the library is not selected as the master
        2. Set check_REPL_delay =0 if you do not want to check the difference between the primary and secondary delays
      5. Do not set the mandatory main parameter, enter algorithm two

    2. Algorithm 2:
      1. Automatically determines the log quantity of all slave databases and considers the slave database closest to the master database as the new master database
    3. Algorithm 3:
      1. Select a new master according to the configuration file sequence
  6. Data compensation

    1. Determine the SSH connectivity of the primary database
      1. Situation one: can connect

        1. Call the save_binary_logs script to immediately save the missing binlog to each slave node and recover
      2. Situation two: Unable to connect

        1. Call the apply_diff_relay_logs script to calculate the difference in relaylog of the slave library and restore to slave library number 2
  7. Deactivate the slave database

  8. The remaining slave libraries and the new master library build a master-slave relationship

  9. The application of transparent

  10. Failed node self-healing (to be developed)

  11. Failure to remind

Extension: Candidate_master =1 Application scenario?

  1. MHA + KeepAlive VIP (early MHA architecture)
  2. Many places and many centers

5. MHA Application Transparency (VIP)

  • Copy the file to/usr/local/bindirectory
cp /root/master_ip_failover.txt /usr/local/bin/master_ip_failover
Copy the code
  • Modifying a Configuration File
Vim /usr/local/bin/master_ip_failover My $VIP = '10.0.0.55/24'; my $key = '1'; my $ssh_start_vip = "/sbin.ifconfig eth0:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";Copy the code
  • Change Chinese characters to English:
dos2unix master_ip_failvoer
Copy the code
  • Add execute permission to the file
chmod +x master_ip_failvoer
Copy the code
  • Add the script to the configuration file
vim /etc/mha/app1.cnf
master_ip_failover_script=/usr/local/bin/master_ip_failover
Copy the code
  • Db01: Manually add a VIP
The ifconfig eth0:1 10.0.0.55/24Copy the code
  • Db03: Restarts the MHA
masterha_stop --conf=/etc/mha/app1.cnf

nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/app1/manager.log 2>&1 &

masterha_check_status --conf=/etc/mha/app1.cnf
Copy the code

6. MHA fault notification

MHA is a one-time function. Once a fault occurs, the MHA function cannot be provided again. Therefore, when a problem occurs, the architecture needs to be restored immediately.

cp -r email /usr/local/bin/
cd /usr/local/bin/email/
chmod -x *
Copy the code
vim /etc/mha/app1.cnf
report_script+/usr/local/bin/email/send
Copy the code

Restart the MHA

masterha_stop --conf=/etc/mha/app1.cnf

nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/app1/manager.log 2>&1 &

masterha_check_status --conf=/etc/mha/app1.cnf
Copy the code

7. Additional data compensation (binlog_server)

  1. Find an extra machine, must have version 5.6 or above, GTID support and open, we directly use the second slave (DB03)
/etc/mha.app1.cnf [binlog1] no_master=1 Hostname =10.0.0.53 master_binlog_dir=/data/mysql/binlogCopy the code
  1. Create the necessary directories
mkdir -p /data/mysql/binlog
chown -R mysql.mysql /data/*
Copy the code
  1. Pull the binlog log of the primary library
Mysqlbinlog -r --host=10.0.0.51 --user=mha --password=mha --raw --stop-never mysql-bin.000001 &Copy the code

Note: The starting point for the pull log is the binlog currently used by the primary library.

  1. Restart the MHA – manager
masterha_stop --conf=/etc/mha/app1.cnf

nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/app1/manager.log 2>&1 &

masterha_check_status --conf=/etc/mha/app1.cnf
Copy the code

7.1 Why Must BinLog Data Be Saved Successfully

On a single node, when we enter commit, the data will be saved in the redo log, the binlog, and then the state will be set to COMMIT.

There is a binlog server. The binlog will only be executed successfully if it is successfully synchronized to the remote server. One more step (a little doubt)

8. Fault simulation and troubleshooting

8.1 DB01 Database Is Down

The/etc/init. D/mysqld stop observation manager log tail -f/var/log/mha/app1 / manager must show successfully, at the end of switch is normal.Copy the code

8.2 Fault Recovery

  1. Starting the faulty node
[root@db01 ~]# /etc/init.d/mysqld start
Copy the code
  1. Restore 1 master 2 slave (now 1 master 1 slave)

Do this on 10.0.0.51:

CHANGE MASTER TO
MASTER_HOST='10.0.0.52',
MASTER_PORT=3306,
MASTER_AUTO_POSITION=1,
MASTER_USER='repl',
MASTER_PASSWORD='123';
start slave ;
Copy the code
  1. Restoring configuration Files (DB03)

Server1 will be lost because the configuration information of the faulty node will be cleared when the server breaks down. Therefore, it needs to be added

The hostname [server1] = 10.0.0.51 port = 3306Copy the code
  1. Start the MHA
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
Copy the code
  1. Restore the binlog server
CD /data/mysql/binlog rm -rf /data/mysql/binlog/* mysqlbinlog -r --host=10.0.0.52 --user=mha --password=mha --raw --stop-never mysql-bin.000001 &Copy the code

9. Rectify the MHA fault

9.1 Troubleshooting During Construction

  1. Check the script
masterha_check_ssh --conf=/etc/mha/app1.cnf
masterha_check_repl --conf=/etc/mha/app1.cnf
Copy the code

1 primary 2 secondary replication environment

  1. The configuration file

    1. The node address
    2. port
    3. VIP and send mail, the script specifies the location and permissions
  2. Soft connection

9.2 Faults During switchover

Look at the/var/log/mha/app1 / manager

Scripting is a bit more problematic

  • vip
  • send
  • binlog

9.3 Recovering the Fault Environment

  1. Check whether each node is started

  2. Find out who the master repository is?

    1. show slave status\G
  3. Restore one master and two slaves

    1. change master to;
    2. start slave;
  4. Check the configuration file and restore node information

  5. Check VIP and Binlog Server

    1. Check whether the VIP is in the primary library, if not, manually adjust to the primary library

      1. ifconfig
    2. Restart binlog Server pull

  6. Start the Manager

10. Manager Describes additional parameters

Who takes over when the primary library is down?

  1. All slave logs are consistent, and by default a new master is selected in the order of the configuration file.

  2. If the logs of the secondary node are inconsistent, the secondary node closest to the primary node is automatically selected

  3. If a weight is set for a node (candidate_master=1), the weight node will be preferentially selected. However, if the number of logs in this node is 100 MB behind the primary database, it will not be selected. You can use check_REPL_delay =0 to turn off the log quantity check and force the selection of candidate nodes.

  4. Ping_interval = 1:

    1. Set the interval for sending ping packets from the primary database. If no response is received after three attempts, failover is automatically performed
  5. Candidate_master = 1:

    1. Set this parameter to candidate master. If this parameter is set, the slave database will be promoted to the master database after a master/slave switchover occurs, even if the master database is not the latest slave in the cluster
  6. Check_repl_delay = 0:

    1. By default, if a slave falls 100M behind the master’s relay logs, MHA will not select this slave as a new master, because it will take a long time to recover. By setting check_REPL_delay =0, MHA triggers the switch to select a new master ignoring the replication delay. This parameter is useful for hosts with candidate_master=1. Because this candidate master must be the new master during the switchover

11. VIP function of MHA

  1. Prepare a script
  2. Modifying script Configuration
  3. Example Modify the Manager configuration file
  4. The first VIP address is generated on the primary repository
  5. Restart the MHA