What is the experience of unplugging a server during peak hours?

Hello everyone, I am Glacier ~~

I am not afraid of god-like opponents, but I am afraid of pig-like teammates. I experienced a “painful” experience that my teammates “pulled out” the power supply of the server when I was unprepared during the peak of business.

At that time of I play outside of be up to strength, suddenly a phone call call: “glacier, where are you? The server is suddenly inaccessible!” . Me: “again have what circumstance?” . “I accidentally stepped on the power socket of the server, the server is powered off, and the restart prompt cannot start”. My heart ten thousand words, asked: “which server power”. “I accidentally stepped on the switch of the little switchboard, and all the servers connected to the switchboard are out of power.” Me: What the hell, that’s a database.

So I hurried back to the company, began the helpless painful data recovery process…

The article has been included in:

Github.com/sunshinelyz…

Gitee.com/binghe001/t…

Solve the primary database problem

Master database problem reappears

Back to the company a look, the power is the company’s message service subsystem database, a total of three databases, two from one, and the use of sub-database sub-table way to store data. I started the three servers, but found that the process of the primary database could not be started, and the synchronization status of the two secondary databases to the primary database was abnormal. In order, I first look at the log information of the primary database, and found that the MySQL error log output is as follows:

-----------------------------------------161108 23:36:45 mysqld_safe Starting mysqld daemon with 
databases from /usr/local/mysql/var2021-02-28 23:36:46 0 [Warning] TIMESTAMP with implicit DEFAULT 
value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation 
for more details).2021-02-28 23:36:46 5497 [Note] Plugin 'FEDERATED' is disabled.2021-02-28 23:36:46 

7f11c48e1720 InnoDB: Warning: Using innodb_additional_mem_pool_size is DEPRECATED. This option may be 
removed in future releases, together with the option innodb_use_sys_malloc and with the InnoDB's internal memory allocator.2021-02-28 23:36:46 5497 [Note] InnoDB: Using atomics to ref count buffer pool pages2021-02-28 23:36:46 5497 [Note] InnoDB: The InnoDB memory heap is disabled2021-02-28 23:36:46 5497 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins2021-02-28 23:36:46 5497 [Note] InnoDB: Memory barrier is not used2021-02-28 23:36:46 5497 [Note] InnoDB: Compressed tables use zlib 1.2.32021-02-28 23:36:46 5497 [Note] InnoDB: Using CPU crc32 instructions2021-02-28 23:36:46 5497 [Note] InnoDB: Initializing buffer pool, size = 16.0m2021-02-28 23:36:46 5497 [Note] InnoDB: Completed initialization of buffer poolInnoDB: Database page corruption on disk or a failedInnoDB: file read of page 5.InnoDB: You may have to recover from a backup.2021-02-28 23:36:46 7f11c48e1720 InnoDB: Page dump in ascii and hex (16384 bytes): len 16384; hex 7478d078000000050000000000000000000000000f271f4d000700000000000000000000000000000000001b4000000000000 000000200 f20000000000000006000000000000002d000000000000002e000000000000002f0000000000000030000000000 InnoDB (omit many similar code) : End of page dump2021-02-28 23:36:46 7f11c48e1720 InnoDB: uncompressed page, stored checksum in field1 1954074744, calculated checksums for field1: crc32 993334256, innodb 2046145943, none 3735928559, stored checksum in field2 1139795846, calculated checksums for field2: crc32 993334256, innodb 1606613742, none 3735928559, page LSN 0 254222157, low 4 bytes of LSN at page end 254221236, Page number (if created with >= mysql-4.1.1 and stored already) 0InnoDB: Page may be a transaction system pageInnoDB: Database page corruption on disk or a failedInnoDB: file read of page 5.InnoDB: You may have to recover from a backup.InnoDB: It is also possible that your operatingInnoDB: system has corrupted its own file cacheInnoDB: and rebooting your computer removes theInnoDB: error.InnoDB: If the corrupt page is an index pageInnoDB: you can also try to fix the corruptionInnoDB: by dumping, dropping, and reimportingInnoDB: the corrupt table. You can use CHECKInnoDB: TABLE to scan your table for corruption.InnoDB: See also http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.htmlInnoDB: about forcing recovery. InnoDB: Ending processing because of a corrupt database page.2021-02-28 23:36:46 7f11c48e1720 InnoDB: Assertion failure in thread 139714288817952 in file buf0buf.cc line 4201InnoDB: We intentionally generate a memory trap.InnoDB: Submit a detailed bug report to http://bugs.mysql.com.InnoDB: If you get repeated assertion failures or crashes, evenInnoDB: immediately after the mysqld startup, there may beInnoDB: corruption in the InnoDB tablespace. Please refer toInnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.htmlInnoDB: about forcing recovery.03:36:46 UTC - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binaryor one of the libraries it was linked against is corrupt, improperly built,or misconfigured. This error can also be caused by malfunctioning hardware.We will try our best to scrape up some info that will hopefully helpdiagnose the problem, but since we have already crashed,something is definitely wrong and this may fail.key_buffer_size=16777216read_buffer_size=262144max_used_connections=0max_threads=1000thread_coun t=0connection_count=0It is possible that mysqld could use up tokey_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 798063 K bytes of memoryHope that's ok; if not, decrease some 
variables inthe equation.Thread pointer: 0x0Attempting backtrace. You can use the following information to find outwhere mysqld died. If you see no messages after this, something wentterribly wrong... stack_bottom = 0 thread_stack 0x40000/usr/local/mysql/bin/mysqld(my_print_stacktrace+0x35)
[0x8e64b5]/usr/local/mysql/bin/mysqld(handle_fatal_signal+0x41b)
[0x652fbb]/lib64/libpthread.so.0(+0xf7e0)[0x7f11c44c77e0]/lib64/libc.so.6(gsignal+0x35)
[0x7f11c315d625]/lib64/libc.so.6(abort+0x175)

[0x7f11c315ee05]/usr/local/mysql/bin/mysqld[0xa585c5]/usr/local/mysql/bin/mysqld[0xa6c7b4]/usr/local/
mysql/bin/mysqld[0xa6cbc7]/usr/local/mysql/bin/mysqld[0xa5bce2]/usr/local/mysql/bin/mysqld[0xa1e2ba]/usr/local/mysql/bin/mysqld[0xa0bf60]/usr/local/mysql/bin/mysqld[0x95a427]/usr/local/mysql/bin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x48)
[0x58f788]/usr/local/mysql/bin/mysqld[0x6e4a36]/usr/local/mysql/bin/mysqld(_Z11plugin_initPiPPci+0xb3e)
[0x6e826e]/usr/local/mysql/bin/mysqld[0x582d85]/usr/local/mysql/bin/mysqld(_Z11mysqld_mainiPPc+0x4d8)
[0x587d18]/lib64/libc.so.6(__libc_start_main+0xfd)
[0x7f11c3149d5d]/usr/local/mysql/bin/mysqld[0x57a019]The manual page at 
http://dev.mysql.com/doc/mysql/en/crashing.html containsinformation that should help you find out 
what is causing the crash.161108 23:36:46 mysqld_safe mysqld from pid file 
/usr/local/mysql/var/VM_241_49_centos.pid 
ended------------------------------------------------------------------------------
Copy the code

Main database problem analysis

From the log, it can be seen that innoDB engine has a problem. Note the log in dev.mysql.com/doc/refman/… View the forcibly restore method. Innodb_force_recovery =1 innodb_force_recovery=1 innodb_force_recovery=1 innodb_force_recovery=1

[mysqld]innodb_force_recovery = 1
Copy the code

If innodb_force_recovery = 1 does not work, try numbers 2 — 6

Then restart mysql. The restart succeeds. Then use mysqldump or PMA to export data, perform repair operations, and so on. When the fix is complete, comment out the parameter and revert to the default value of 0.

Configuration file parameter: innodb_force_recovery

Innodb_force_recovery Affects the recovery status of the entire InnoDB storage engine. The default value is 0, which means that all restore operations (purge undo/ Insert Buffer merge/ Rolling back&Forward) are performed when the restore operation is needed. If the restore operation is not performed, mysql may fail to start and record an error log.

Innodb_force_recovery can be set to 1-6, with large numbers containing the impact of all previous numbers. When the parameter value is greater than 0, select,create, and drop operations can be performed on the table, but insert, UPDATE, or DELETE operations are not allowed.

(SRV_FORCE_IGNORE_CORRUPT): Ignores the detected corrupt page.
(SRV_FORCE_NO_BACKGROUND): Blocks the main thread from running. For example, the main thread needs to run the Full purge operation, causing crash.
(SRV_FORCE_NO_TRX_UNDO): The transaction rollback operation is not performed.
(SRV_FORCE_NO_IBUF_MERGE): Do not perform a merge operation to insert the buffer.
(SRV_FORCE_NO_UNDO_LOG_SCAN): Without viewing redo logs, InnoDB storage engine will treat uncommitted transactions as committed.
(SRV_FORCE_NO_LOG_REDO): Do not perform rollback forward.

Master repository solution

General repair method reference:

The first method

Create a new table:

create tableDemo_bak # is the same as the original table structure, but INNODB is changed to MYISAM.Copy the code

I’m going to import the data in

insert into demo_bak select * from demo;
Copy the code

Delete from table_name;

drop table demo;
Copy the code

After innodb_force_recovery is commented out, restart.

Rename:

rename table demo_bak to demo;
Copy the code

Finally change back to the storage engine:

alter table demo engine = innodb
Copy the code

The second way

Another method is to use mysqldump to export the table and then import it back to the InnoDB table. The result of both methods is the same. Backup export (including structure and data) :

mysqldump -uroot -p123 test > test.sql
Copy the code

Reduction method 1:

use test; source test.sqlCopy the code

Restore method 2 (system command line) :

mysql -uroot -p123 test < test.sql;
Copy the code

Note that the CHECK TABLE command is basically useless in InnoDB databases.

The third way

(1) Configure my.cnf

Innodb_force_recovery = 1 or 2 — 6 to restart MySQL

(2) Export data script

mysqldump -uroot -p123 test > test.sql
Copy the code

Export SQL scripts. Or use Navicat to import all databases/tables into databases on other servers. Note: The data must be backed up successfully. Then delete the data in the original database.

(3) Delete ib_logfile0, ib_logfile1, and ibdata1

Back up the ib_logfile0, ib_logFile1, and ibdata1 files in the MySQL data directory and delete the three files

(4) Configure my.cnf

Delete innodb_force_recovery = 1 or 2 — 6 from my.cnf or set innodb_force_recovery = 0 and restart MySQL service

(5) Import data into the MySQL database

mysql -uroot -p123 test <test.sql; Or use Navicat to import the backup data into the database.Copy the code

This method should pay attention to the following issues:

The ib_logfile0, ib_logfile1, and ibdata1 files must be backed up before being deleted.
Make sure the original data is exported successfully
If a message is displayed indicating that the data in the database cannot be deleted after the data is successfully exported, you can access the MySQL data directory on the CLI and manually delete the database folder or the data table files in the database folder. The prerequisite is that the data is successfully exported or backed up.

Here, I used the third method to recover data from the main database.

Next, let’s look at restoring data from the database.

Solve the slave library problem

Principle of master-slave replication

Here, I will briefly say MySQL database master-slave replication principle.

MySQL master-slave replication principle, also known as A/B principle.

(1) The Master records the data changes to binary logs, which are specified by the log-bin configuration file. These records are called binary log events.

(2) The Slave reads binary log events in the Master and writes them to its relay log via the I/O thread.

(3) Slave redoes the events in the relay log and executes the event information in the relay log one by one locally to complete the local storage of data, so as to reflect the changes to its own data (data replay).

Replication filtering allows you to replicate only a portion of the data in the server. There are two types of replication filtering:

(1) Filter events in binary logs on Master;

(2) Filter events in relay logs on the Slave. As follows:

Relay_log Configures the relay log. Log_slave_updates indicates that slave writes replication events to its binary log. When setting log_slave_updates, you can make a slave act as the master of other slaves. At this point, the slave writes the events executed by the SQL thread to its binary log, where its slave can retrieve the events and execute them. Send replication events to other slaves as shown in the figure below:

The slave library problem reappears

After data is recovered from the primary database, a batch of about 1000 test data is inserted into the primary database. However, data is not synchronized from the secondary database after data is inserted. So I log in to the primary database and run the following command.

mysql>show processlist;
Copy the code

Check whether the process sleeps too much. That’s normal.

Check the status of the primary database.

show master status;
Copy the code

Also normal.

mysql> show master status;
+-------------------+----------+--------------+-------------------------------+
| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB              |
+-------------------+----------+--------------+-------------------------------+
| mysqld-bin000001. |     3260 |              | mysql,test,information_schema |
+-------------------+----------+--------------+-------------------------------+
1 row in set (0.00 sec)
Copy the code

Then go to the slave library to check the status of the slave library.

mysql> show slave status\G                                                
 
Slave_IO_Running: Yes
Slave_SQL_Running: No
Copy the code

The Slave is not synchronized. In this case, there are two solutions if the master and slave database versions are consistent or inconsistent.

Master and slave version consistency solution

Here are two solutions

Method 1: Ignore the error and continue synchronization

This method is suitable for the situation that the data of master and slave database are not different, or the data can not be completely unified, and the data requirements are not strict

Solution:

stop slave; # indicates a skipped step error, and the following number is variableset global sql_slave_skip_counter =1;
start slave;
Copy the code

Mysql > show slave status\G

mysql> show slave status\G
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Copy the code

Ok, now the master/slave synchronization status is normal…

Way two: remaster from, complete synchronization

This method is applicable to the situation where the data of master and slave databases differ greatly or the data is required to be completely unified

The solution is as follows:

(1) Advanced master library, lock table, prevent data writing

Using a command:

mysql> flush tables with read lock;
Copy the code

Note: The statement is locked as read-only and is case insensitive

(2) Back up data

# Back up data to mysql.bak.sql

mysqldump -uroot -p -hlocalhost > mysql.bak.sql
Copy the code

One thing to note here is that database backups must be done regularly. You can use shell or Python scripts to make sure your data is safe.

(3) Check the Master status

mysql> show master status;
+-------------------+----------+--------------+-------------------------------+
| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB              |
+-------------------+----------+--------------+-------------------------------+
| mysqld-bin000001. |     3260 |              | mysql,test,information_schema |
+-------------------+----------+--------------+-------------------------------+
1 row in set (0.00 sec)
Copy the code

(4) Upload the mysql backup file to the slave library machine for data recovery

SCP mysql. Bak. SQL [email protected]: / TMP /Copy the code

(5) Stop the slave library

mysql> stop slave;
Copy the code

(6) Then run the mysql command to the slave library to import the data backup

mysql> source /tmp/mysql.bak.sql
Copy the code

(7) set from the repository synchronization, pay attention to the synchronization points here, is the main library show master status information in | File | Position two

change master to master_host = '192.168.128.100', master_user = 'rsync',  master_port=3306, master_password=' ', master_log_file =  'mysqld-bin.000001', master_log_pos=3260;
Copy the code

(8) Enable secondary synchronization again

mysql> start slave;
Copy the code

(9) Check the synchronization status

mysql> show slave status\G  

Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Copy the code

(10) Go back to the primary database and run the following command to unlock the table.

UNLOCK TABLES;
Copy the code

If the versions of the primary and secondary databases are the same, this is ok. If the versions of the primary and secondary databases are not the same, this may cause problems.

Solution the master and slave versions are inconsistent

If the versions of the master and slave MySQL databases are different, run the show slave status \G command to check the slave database status, the following information may be displayed:

Slave_IO_Running: Yes Slave_SQL_Running: No...... Last_Errno: 1755 Last_Error: Cannot execute the current event groupin the parallel mode. Encountered event Gtid, relay-log name ./mysql-relay.000002, position 123321 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly...
Copy the code

Note the following output.

Last_Errno: 1755
Last_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name ./mysql-relay.000002, position 123321 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly...
Copy the code

This is due to a database version inconsistency between the master library using MySQL5.6 and the slave library using MySQL5.7.

The error of 1755 is caused by the parallel copy bug. The solution is as simple as turning off parallel replication.

stop slave;
set global slave_parallel_workers=0;
start slave;
Copy the code

If the slave database still has problems, you can restore the data of the slave database according to the scheme that the master and slave versions are consistent.

Here, attach a description of error code 1755 from the MySQL official documentation.

Bugs.mysql.com/bug.php?id=…

After half a day and half a night of data recovery, the database was finally restored, but this lesson is also very painful. In order to avoid the occurrence of similar accidents, I later made the database of remote multi-live, and then the database of remote multi-live plan to share with you!

Well, that’s all for today, I’m Glacier, and I’ll see you next time