Issue 31: Index Design (Discussion on the Number of Indexes)

When it comes to indexes, we tend to focus on the benefits of indexes. A good index can really improve the efficiency of query requests, or write requests (UPDATE, DELETE) that are filtered by index keys. Admittedly, indexes provide a huge performance boost in such scenarios, but the opposite is true for write requests for the entire table, where the presence of indexes reduces the efficiency of write requests to a certain extent. In order to maintain the quasi-real-time performance of index data and enable the optimizer to make a better execution plan based on the index, the database will automatically update the distribution of index data, such as the splitting and merging of index pages, etc.

How much does the existence of the index affect write requests? In other words, is there a correlation between the number of indexes and the decrease in the efficiency of write requests? In this post I’ll use a few simple examples to see if I can reach a clear conclusion. Simple laptop based virtual machine test, mainly based on MySQL single instance.

My example focuses on four types of statements: INSERT, UPDATE, DELETE, AND LOAD DATA.

Deploy a clean instance with mysqlsh :(bring in port, remote administrator, report_host option)

 MySQL  Py > dba.deploy_sandbox_instance(3500,{"password":"root","allowRootFrom":"%","mysqldOptions":["report_host=debian-ytt1"]})
A new MySQL sandbox instance will be created on this host in
/home/ytt/mysql-sandboxes/3500

...

Deploying new MySQL instance...

Instance localhost:3500 successfully deployed and started.
Use shell.connect('root@localhost:3500') to connect to the instance.

There are 10 tables (t1 to t10) that are used to test the number of indexes. Except for the number of indexes on each table, the field definitions and so on are the same: T1 has one index, T2 has two, and so on, and T10 has 10 indexes, all of which are int types (the number of indexes does not include primary keys, only secondary indexes). Table T0 has the following table structure:

(debian-ytt1:3500)|(ytt)>show create table t0\G *************************** 1. row *************************** Table: t0 Create Table: CREATE TABLE `t0` ( `id` int unsigned NOT NULL AUTO_INCREMENT, `r0` int DEFAULT NULL, `r1` int DEFAULT NULL, `r2` int DEFAULT NULL, `r3` int DEFAULT NULL, `r4` int DEFAULT NULL, `r5` int DEFAULT NULL, `r6` int DEFAULT NULL, `r7` int DEFAULT NULL, `r8` int DEFAULT NULL, `r9` int DEFAULT NULL, `r10` int DEFAULT NULL, PRIMARY KEY (' id ')) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 1 row in set (0.00 SEC)

Create a batch of these 10 tables from the shell command line, and add the corresponding indexes to each table:

root@debian-ytt1:~# for i in `seq 1 10`; do mysql --login-path=root_ytt -e "use ytt; create table t$i like t0"; done; root@debian-ytt1:~# for i in `seq 1 10`; do for j in `seq 1 "$i"`; do mysql --login-path=root_ytt -e"use ytt; alter table t$i add key idx_r$j (r$j)"; done; done;

Here I write a simple stored procedure to simulate Insert, update, delete write operations:

DELIMITER $$

USE `ytt`$$

DROP PROCEDURE IF EXISTS `sp_batch_write`$$

CREATE DEFINER=`root`@`%` PROCEDURE `sp_batch_write`(
    IN f_write ENUM('insert','update','delete'),
    IN f_table_name VARCHAR(64),
    IN f_num INT UNSIGNED
    )
BEGIN
  DECLARE i INT UNSIGNED DEFAULT 0;
  
  IF f_write = 'insert' THEN
    SET @stmt = CONCAT('insert into ',f_table_name,'(r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10)  
                        values (ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000),
                        ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000),
                        ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000))');
   
    SET @@autocommit=0;
    WHILE i < f_num
    DO
      PREPARE s1 FROM @stmt;
      EXECUTE s1;
      IF MOD(i,50) = 0 THEN
        COMMIT;
      END IF;
      SET i = i + 1;
    END WHILE;  
    DROP PREPARE s1;
    COMMIT;
    SET @@autocommit=1;

  ELSEIF f_write = 'update' THEN
    SET @stmt = CONCAT('update ',f_table_name,' set r0=ceil(rand()*10000),r1 = ceil(rand()*10000),
    r2 = ceil(rand()*10000),r3 = ceil(rand()*10000),r4 = ceil(rand()*10000),
    r5 = ceil(rand()*10000),r6 = ceil(rand()*10000),r7 = ceil(rand()*10000),
    r8 = ceil(rand()*10000),r9 = ceil(rand()*10000),r10 = ceil(rand()*10000)');
    PREPARE s1 FROM @stmt;
    EXECUTE s1;
    DROP PREPARE s1;
  ELSEIF f_write = 'delete' THEN
    SET @stmt = CONCAT('delete from ',f_table_name);
    PREPARE s1 FROM @stmt;
    EXECUTE s1;
    DROP PREPARE s1;
  END IF;

END$$

DELIMITER ;

SQL > SELECT * FROM ‘INSERT’ WHERE ‘INSERT’ IS USED;

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t1',100000); Query OK, 0 rows affected (1 min 24.61 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' insert ', 't2' 100000). Query OK, 0 rows affected (1 min 25.40 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' insert 'and' t3, 100000); Query OK, 0 rows affected (1 min 29.12 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' insert ', 't4, 100000); Query OK, 0 rows affected (1 min 36.65 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' insert ', '0', 100000); Query OK, 0 rows affected (1 min 39.85 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' insert ', 't6, 100000); Query OK, 0 rows affected (1 min 41.97 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' insert ', 't7 has', 100000); The Query OK, 0 rows affected (2 min 29.07 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' insert ', 't8, 100000); The Query OK, 0 rows affected (2 min 17.62 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' insert ', 't9, 100000); The Query OK, 0 rows affected (2 min 14.14 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' insert ', 't10, 100000); Query OK, 0 rows affected (2 min 32.32 SEC)

Basically, the farther back you go, the longer it takes. A simple picture was drawn, as shown below:

Compare the time of the 10 tables by UPDATE operation:

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t1',100000); Query OK, 0 rows affected (22.81 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' update ', 't2' 100000). Query OK, 0 rows affected (47.42 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' update 'and' t3, 100000); Query OK, 0 rows affected (1 min 8.96 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' update 'and' t4, 100000); Query OK, 0 rows affected (1 min 55.50 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' update ', '0', 100000); The Query OK, 0 rows affected (2 min 13.13 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' update 'and' t6, 100000); Query OK, 0 rows affected (2 min 6.79 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' update ', 't7 has', 100000); The Query OK, 0 rows affected (2 min 35.01 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' update 'and' t8, 100000); The Query OK, 0 rows affected (2 min 44.50 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' update 'and' t9, 100000); Query OK, 0 rows affected (3 min 1.27 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' update 'and' t10, 100000); Query OK, 0 rows affected (4 min 43.74 SEC)

The biggest impact is on table updates, which are slower and slower as the number of indexes increases. See below:

SELECT * FROM 10 TABLES WHERE DELETE OPERATES:

Before deleting, look at the disk file size of each table, which is also arranged in order of the number of indexes, so the smaller the number of indexes, the smaller the disk space.

Root @ debian - ytt1: / home/ytt/mysql - sandboxes / 3500 / sandboxdata/ytt# ls - Sihlr total amount of 521 m - 2229717 - rw - r 1 ytt ytt June 112 k 1 14:03T0. IBD 2229718-RW-R ----- 1 YTT YTT 28M June 1 16:14 T1. IBD 2229719-RW-R nil 1 YTT YTT 36M June 1 16:16 T2.IBD 2229720-RW-R ----- 1 YTT YTT 40M June 1 16:16T3. IBD 2229721-RW-R ADR 1 YTT YTT 44M June 1 16:16T4. IBD 2229722-RW-R ADR 1 YTT YTT 52M June 1 16:03T5. IBD 2229723-RW-R ----- 1 YTT YTT 56M June 1 16:06T6. IBD 2229725-RW-R F1 YTT YTT 60M June 1 15:25T8.IBD 2229724-RW-R ----- 1 YTT YTT 64M June 1 16:08T7. IBD 2229734-RW-R nil 1 YTT YTT 68M June 1 15:30T9.IBD 2229735-rw-r ----- 1 ytt ytt 72M June 1 15:32t10.ibd

Delete time must increase with disk file size:

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t1',100000); Query OK, 0 rows affected (31.80 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' delete ', 't2' 100000). Query OK, 0 rows affected (59.89 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' delete ', 't3, 100000); Query OK, 0 rows affected (1 min 21.83 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' delete ', 't4, 100000); The Query OK, 0 rows affected (2 min 0.33 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' delete ', '0', 100000); The Query OK, 0 rows affected (2 min 17.66 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' delete ', 't6, 100000); The Query OK, 0 rows affected (2 min 9.37 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' delete ', 't7 has', 100000); The Query OK, 0 rows affected (2 min 53.61 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' delete ', 't8, 100000); The Query OK, 0 rows affected (2 min 39.29 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' delete ', 't9, 100000); Query OK, 0 rows affected (3 min 38.26 SEC) (debian - ytt1:3500) | (ytt) > call sp_batch_write (' delete ', 't10, 100000); Query OK, 0 rows affected (4 min 22.88 SEC)

The results were as expected, as shown below:

This is why DELETE is not recommended when clearing a table record! (DELETE marks each line, the post purge thread cleans periodically, and the space is not immediately freed. Instead of DELETE, use TRUNCATE!)

Compare the time of the LOAD DATA import operation to the time of the 10 tables:

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t1; 2. To bounce lightly about. 2. To bounce lightly about. 2. To bounce lightly about. 0 (debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t2; 2. To bounce lightly about. 2. To bounce lightly about. 2. To bounce lightly about. 0 (debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t3; 2. To bounce lightly about. 2. To bounce lightly about. 2. To bounce lightly about. 0 (debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t4; 2. To bounce lightly about. 2. To bounce lightly about. 2. To bounce lightly about. 0 (debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t5; 2. To bounce lightly about. 2. To bounce lightly about. 2. To bounce lightly about. 0 (debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t6; 2. To bounce lightly about. 2. To bounce lightly about. 2. To bounce lightly about. 0 (debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t7; 2. To bounce lightly about. 2. To bounce lightly about. 2. To bounce lightly about. 0 (debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t8; 2. To bounce lightly about. 2. To bounce lightly about. 2. To bounce lightly about. 0 (debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t9; 2. To bounce lightly about. 2. To bounce lightly about. 2. To bounce lightly about. 0 (debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t10; 2. To bounce lightly. 2. To bounce lightly about. 2. To bounce lightly about

The import time also increases with the number of indexes, as shown in the following figure:

As you can see from the above simple test results, adding an index results in additional write time consumption. In a write-first business, more indexes is not always better; it is better to keep the necessary indexes and drop the unnecessary ones. Having said that, perhaps the most common scenario in the Internet, such as master-slave architecture, host and slave table structure is the same, but the number of indexes is not the same; A host has only a primary key, and a slave may have multiple secondary indexes in addition to the primary key. In fact, this is very suitable for scenarios where the write request is on the host, the read request is on the slave, and the slave can accept a certain amount of data latency; However, if the slave cannot accept long data delays, then a bunch of less-used indexes must be discarded to improve the slave’s log consumption efficiency.

Is there anything else you’d like to know about MySQL’s technical content? Leave a message to tell Xiaobian!

Issue 31: Index Design (Discussion on the Number of Indexes)

My example focuses on four types of statements: INSERT, UPDATE, DELETE, AND LOAD DATA.

Deploy a clean instance with mysqlsh :(bring in port, remote administrator, report_host option)

Here I write a simple stored procedure to simulate Insert, update, delete write operations:

SQL > SELECT * FROM ‘INSERT’ WHERE ‘INSERT’ IS USED;

Compare the time of the 10 tables by UPDATE operation:

SELECT * FROM 10 TABLES WHERE DELETE OPERATES:

Compare the time of the LOAD DATA import operation to the time of the 10 tables:

Related Posts

Issue 32: Index Design (Detailed Specification for Index Design)

Technology sharing | create indexes on the long string