The authors introduce

Liu Shuhao, DBA of China Mobile, is responsible for database operation, maintenance and standardization of “mobile cloud” business system. Good at MySQL technology, familiar with MySQL replication structure, Cluster architecture and operation and maintenance optimization; With automatic operation and maintenance experience, responsible for the establishment of “mobile cloud” database management platform.

In performance monitoring service scenarios, a large amount of data is imported and tablespaces grow rapidly. To prevent disk space from being used up and improve SQL execution efficiency, you need to periodically clear historical data. According to the data collection frequency and retention period, you can implant different timers in the application program to delete historical data. This simple periodic clearing mechanism is effective in the early stage of services. However, as services grow, especially when data surge occurs, the timers may fail to clear data. In addition, transactions may hold table locks for a long time, causing database congestion and flow control.

I will share with you an example of database failure caused by the failure of the cleansing mechanism, and show you how to use partitioned tables and stored procedures to clean up the data.

1. Review of problems

We had a brief cloud monitoring system failure in our production environment earlier this year. The fault was found to be caused by the OP application periodically deleting data in the performance library. The specific reason was that the DELETE transaction was too large and exceeded the PXC cluster synchronous replication write set. After the local logic was committed, the transaction could not be synchronized in the other two nodes of the cluster and was eventually rolled back locally. Because the table Lock is held for a long time, a large number of threads are blocked and System Lock is triggered, which leads to database flow control and finally leads to slow update of monitoring data of Huabei node cloud.

The following describes the troubleshooting process:

Zabbix generates an alarm indicating that the memory usage of the OP performance database on the North China node exceeds 80% at 2018/02/27 06:14:05.

Note: OP is short for “mobile cloud” portal system; The OP performance library is used to store the performance data of cloud products ordered by users. The architecture type is 3-node PXC multi-master cluster architecture.

After logging in to the database, the number of threads waiting for execution increases rapidly and the database is in flow control state. The SQL statement that causes a database block is:

DELETE FROM perf_biz_vm WHERE ‘2018-02-25 02:00:00’>CREATE_TIME

This statement is initiated by the OP application and is used to delete the historical data of the perf_biz_VM table from two days ago. At the time of the failure, the execution time was more than four hours, and the execution plan is expected to delete 200 million rows.

In the end, the statement failed and database flow control was triggered.

Here, we analyze the mechanism of failure based on the replication principle of Galera Cluster.

First of all, synchronous replication between Galera cluster nodes, mainly based on broadcast write set and transaction verification to achieve multi-node simultaneous commit, conflict transaction rollback and other functions.

In addition, the optimistic policy is adopted when the transaction is executed on the local node. The conflict detection is performed after the transaction is successfully broadcast to all nodes. When the conflict is detected, the local transaction is rolled back first. If no conflicts are detected, each node executes the write set in the queue independently and asynchronously.

Finally, after the transaction is successfully executed on the local node and returned to the client, other nodes guarantee that the transaction will be executed. The architecture diagram copied by Galera is as follows:

According to the Galera replication principle, when the local node submits the transaction successfully, the local node copies the transaction to the other two nodes in the cluster through write set. Then each node carries out the certification test independently and asynchronously. As the amount of data to be deleted is very large, The transaction has exceeded the synchronous replication write set (a write set value of 1G in the production environment), so the local node cannot get certification information and the transaction is not inserted into the queue for physical commit, but is rolled back locally first.

The error log is as follows:

As transactions hold the X lock of perf_biX_VM table for a long time, the cloud host monitoring data of local nodes cannot be entered into the database. With the accumulation of waiting threads, the execution queue of local nodes will accumulate longer and longer, triggering the Flow Control mechanism of PXC cluster.

The mechanisms used to ensure all cluster nodes perform transactions faster than queue growth, so as to avoid slow node lost transactions, implementation principle is at the same time there is only one node in the cluster can be broadcast messages, each node will receive radio messages, when the execution of slow node queue after more than a certain length, it will broadcast a FC_PAUSE message, After receiving the message, other nodes will postpone broadcasting the message. As the transactions of the slow node (local node) are rolled back until the execution queue length of the slow node is reduced to a certain extent, data synchronization of Galera cluster starts to resume and flow control is relieved.

When flow control occurs in the OP performance library, The “DELETE FROM perf_biz_VM WHERE ‘2018-02-25 02:00:00’>CREATE_TIME” statement on the local node filled the Buffer Pool (innodb_buffer_ in the production environment) Pool_size =128G), plus the memory occupied by the normal operation of the database itself, so that the system memory usage exceeds the warning value of 80%.

As of February 28, the historical data clearing mechanism failed, resulting in the amount of service form data up to 250 GB. The database storage space is severely insufficient and needs to be expanded. In order to eliminate database security risks and free disk space, we decided to use partitioned tables + stored procedures + events to rebuild the data cleaning mechanism on the database side.

Second, rebuild the clean-up mechanism

By analyzing the above fault cases, we decided to establish a safe, robust and efficient database cleaning mechanism based on partitioned tables and stored procedures.

By looking at the execution plan, you can see that deleting data with a Delete statement is inefficient and can trigger a System lock even when an index is hit. Therefore, the fundamental solution to the problem of large table data cleaning is to introduce partitioned table, delete data does not perform DML operation, but directly drop the early partitioned table (DDL).

When deleting a table, write set records the information about each row. When deleting a table, write set records the physical storage location, table structure, and dependent constraints, triggers, indexes, and stored procedures. When a table has a large amount of data, the DROP operation is several orders of magnitude faster.

Partition table of another benefit is don’t need to modify the code for the application, through to the back-end database Settings, in time field to do partition table, you can easily break up, it is important to note the query field must be a partitioning key, otherwise it will iterate through all the partition table, here have a look at the specific implementation process:

Step 1: First, create a partition table. Create a new table with the same structure as perf_biz_VM and name it perf_biz_vm_new. Create a new table with the same structure as perf_biz_VM_new. Create a new table with the same structure as perf_biz_VM_new and use the create_time index field as the partition field.

The code is as follows:

CREATE TABLE `perf_biz_vm_new` (

‘CREATE_TIME’ datetime NOT NULL COMMENT ‘CREATE_TIME ‘,

‘VM_ID’ varchar(80) NOT NULL COMMENT ‘varchar ‘,

‘PROCESSOR_USED’ varchar(100) DEFAULT NULL COMMENT ‘CPU usage (%)’,

‘MEM_USED’ varchar(100) DEFAULT NULL COMMENT ‘Memory usage (%)’,

‘MEM_UTILITY’ varchar(100) DEFAULT NULL COMMENT ‘Available memory (bytes)’,

‘BYTES_IN’ varchar(100) DEFAULT NULL COMMENT ‘Mbps ‘,

‘BYTES_OUT’ varchar(100) DEFAULT NULL COMMENT ‘Outbound traffic rate (Mbps)’,

‘PROC_RUN’ varchar(100) DEFAULT NULL COMMENT ‘CPU running queue number of processes ‘,

‘WRITE_IO’ varchar(100) DEFAULT NULL COMMENT ‘Varchar (Mb/s)’,

‘READ_IO’ varchar(100) DEFAULT NULL COMMENT ‘Virtual disk read rate (Mb/s)’,

            `PID` varchar(36) NOT NULL,

             PRIMARY KEY (`PID`,`CREATE_TIME`),

             KEY `mytable_categoryid` (`CREATE_TIME`) USING BTREE,

             KEY `perf_biz_vm_vm_id_create_time` (`VM_ID`,`CREATE_TIME`)

ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT=’

/ *! 50500 PARTITION BY RANGE COLUMNS(CREATE_TIME)

(PARTITION p20180225 VALUES LESS THAN (‘20180226’) ENGINE = InnoDB,

 PARTITION p20180226 VALUES LESS THAN (‘20180227’) ENGINE = InnoDB,

 PARTITION p20180227 VALUES LESS THAN (‘20180228’) ENGINE = InnoDB,

 PARTITION p20180228 VALUES LESS THAN (‘20180229’) ENGINE = InnoDB,

 PARTITION p20180229 VALUES LESS THAN (‘20180230’) ENGINE = InnoDB) */

(Drag up and down to see the full code)

Step 2: Replace the old table with a new partition table. Note that the rename operation will modify the Metadata of the perf_biz_VM table. You need to check whether Delete, Update, Insert transactions and DDL operations are performed on the table in advance. Otherwise, a Metadata Lock will occur.

Our approach is to stop the timer on the business side in advance, and execute the following statement when the business is in a downturn. Swap the old table with the new table by rename, and let the new table be used. If services are invoked during this period, services are temporarily interrupted.

rename table perf_biz_vm to perf_biz_vm_old;

rename table perf_biz_vm_new to perf_biz_vm;

Step 3: If data is written to a new table and the cloud monitoring page displays normal data, services are restored. The storage period for cloud host monitoring data is two days. Therefore, you need to copy the data generated two days ago from the old table to the new table. To complete this step, refer to the following script:

The code is as follows:

#! /bin/bash

function insert(){

end_time=”$1 $2″

start_time=”$3 $4″

mysql -u’user’ -p’passwd’ << !

use monitor_alarm_openstack;

set innodb_flush_log_at_trx_commit=0;

start transaction;

insert into perf_biz_vm select * from perf_biz_vm_old where create_time < ‘$end_time’ and create_time > ‘$start_time’;

commit;

select TABLE_ROWS from information_schema.tables where TABLE_SCHEMA =”monitor_alarm” and TABLE_NAME=”perf_biz_vm”;

!

}

base_time=”2018-02-27 2:00:00″

while true

do

        #end_time=$(date -d “-1hour $base_time” +%Y-%m-%d” “%H:%M:%S)

        end_time=$base_time

        start_time=$(date -d “-1hour $end_time” +%Y-%m-%d” “%H:%M:%S)

        #base_time=$end_time

        base_time=$start_time

        echo “Cur_time: $(date +%Y%m%d” “%H%M%S)” | tee -a 1.log

        echo “Range: $end_time $start_time” | tee -a 1.log

        insert ${end_time} ${start_time} | tee -a 1.log

        sleep 2

done

(Drag up and down to see the full code)

Write a stored procedure to periodically create new partitions and delete the old partition from a few days ago:

The code is as follows:

delimiter ?

CREATE PROCEDURE ‘clean_partiton’ (SCHEMANAME VARCHAR(64), TABLENAME VARCHAR(64), reserve INT)

BEGIN

Note: This stored procedure applies to a partitioned table whose field type is datetime, partitioned by day and named p20180301

– Obtain the oldest partition and check whether the partition is reserved days ago. If yes, delete the partition each time

Create a partition 14 days in advance. If the partition name does not duplicate, create it

Create a history_partition table of type VARCHAR (200) and datetime. Records the SQL statements that were successfully executed

        DECLARE PARTITION_NAMES VARCHAR(16);    

        DECLARE OLD_PARTITION_NAMES VARCHAR(16); 

        DECLARE LESS_THAN_TIMES varchar(16);     

        DECLARE CUR_TIME INT;                  

        DECLARE RETROWS INT;

        DECLARE DROP_PARTITION VARCHAR(16);

SET CUR_TIME = DATE_FORMAT(NOW(),’%Y%m%d’);

        BEGIN

        SELECT PARTITION_NAME INTO DROP_PARTITION FROM information_schema.partitions WHERE table_schema = SCHEMANAME AND table_name = TABLENAME order by PARTITION_ORDINAL_POSITION  asc limit 1 ;

        IF SUBSTRING(DROP_PARTITION,2) < DATE_FORMAT(CUR_TIME – INTERVAL reserve DAY, ‘%Y%m%d’) THEN

SET @sql = CONCAT( ‘ALTER TABLE ‘, SCHEMANAME, ‘.’, TABLENAME, ‘ drop PARTITION ‘, DROP_PARTITION, ‘;’ );

        PREPARE STMT FROM @sql;

        EXECUTE STMT;

        DEALLOCATE PREPARE STMT;

        INSERT INTO history_partition VALUES (@sql, now());

END IF;

        end;

        

        SET @__interval = 1;

        create_loop: LOOP

                IF @__interval > 15 THEN

                        LEAVE create_loop;

                END IF;

                SET LESS_THAN_TIMES = DATE_FORMAT(CUR_TIME + INTERVAL @__interval DAY, ‘%Y%m%d’);

                SET PARTITION_NAMES = DATE_FORMAT(CUR_TIME + INTERVAL @__interval -1 DAY, ‘p%Y%m%d’);

IF(PARTITION_NAMES ! = OLD_PARTITION_NAMES) THEN

SELECT COUNT(1) INTO RETROWS FROM information_schema.partitions WHERE table_schema = SCHEMANAME AND table_name = TABLENAME AND LESS_THAN_TIMES <= substring(partition_description,2,8);

                     IF  RETROWS = 0 THEN

                     SET @sql = CONCAT( ‘ALTER TABLE ‘, SCHEMANAME, ‘.’, TABLENAME, ‘ ADD PARTITION (PARTITION ‘, PARTITION_NAMES, ‘ VALUES LESS THAN ( “‘,LESS_THAN_TIMES, ‘” ));’ );

                     PREPARE STMT FROM @sql;

                     EXECUTE STMT;

                     DEALLOCATE PREPARE STMT;

INSERT INTO history_partition VALUES (@sql, now());

                     END IF;

                     

                END IF;

                SET @__interval=@__interval+1;

                SET OLD_PARTITION_NAMES = PARTITION_NAMES;

        END LOOP;

END

?

delimiter ;

(Drag up and down to see the full code)

Create an event named clean_perf_biz_VM and call the clean_partition stored procedure at 00:30:00 every day to create the next new partition and delete the old partition from two days ago.

delimiter |

CREATE DEFINER= ‘root’ @ ‘localhost’ event clean_perf_biz_VM on schedule every 1 day starts DATE_ADD(DATE_ADD(CURDATE(),INTERVAL 1 DAY),INTERVAL 30 MINUTE)

ON COMPLETION PRESERVE

do

begin

Call clean_partition (‘ monitor_alarm ‘, ‘perf_biz_vm’, ‘2’);

end |

Delimiter.

Step 6: Process the old perf_biz_VM_OLD table and perform the following operations during service downturns: Drop table if exists perf_biz_VM_OLD Drops the entire old table for about 3 minutes and releases 150 GB of disk space. Note that even if the time of drop table is short, it will still cause a temporary block because the drop table triggers an instance lock. Therefore, you need to perform operations during service downturns and observe the database status in real time.

As can be seen from the following figure, the instantaneous length of the waiting receiving queue recorded in the actual DROP process is 169 and the maximum length reaches 202:

At this point, the transformation is complete, we have established a safe, robust and efficient data cleaning mechanism in the database side.

Three, endnotes

Although this scenario emphasizes the use of stored procedures, the stored procedures described above are based on simple CREATE and DROP operations and do not involve complex logic or computation. MySQL is an OLTP application and is good at simple operations such as adding, deleting, searching, and modifying. It is not suitable for logical calculation and analysis applications, so try to avoid using complex stored procedures.

Of course, not all scenarios are suitable for using partitioned tables. In the opinion of many DBAs, partitioned tables are forbidden in some scenarios and are generally divided by the form of cutting table. In this scheme, time is used to partition fields, and the query statements in the application program can basically hit partitions. Performance is improved for Select, Insert, and other statements.

The above is about data cleaning to do some sharing, more content please continue to follow the dBAPlus community article!