preface

Mysql > add primary key id auto_increment; mysql > add primary key ID auto_increment; mysql > add primary key ID auto_increment; What are the disadvantages of using Uuid? Get a set of more than 500 pages of PDF summary of MySQL study notes.

Summary of MySQL knowledge point a mind map to share with you

Mysql and program instance

1.1. To illustrate this problem, let’s first create three tables

User_auto_key, user_UUID, and user_random_key, respectively, meaning auto_key, uUID as primary key, and user_random_key as primary key, and we leave everything else exactly the same.

In the control variable method, we only use a different policy to generate the primary key of each table, and the other fields are exactly the same, and then test the insert speed and query speed of the table:

Note: The random key here refers to the random id calculated by the snowflake algorithm: a string of 18 bits long values

Id automatic generation table:

User uuid table

Random primary key table:

1.2. The light theory is not good, directly on the program, using spring jdbcTemplate to implement the added test:

Technical framework: Springboot +jdbcTemplate+junit+ HuTool, the principle of the program is to connect to their own test database, and then write the same amount of data in the same environment, to analyze the insert time to integrate its efficiency, in order to achieve the most real effect, All data is randomly generated, such as names, mailboxes and addresses.

`package com.wyq.mysqldemo; import cn.hutool.core.collection.CollectionUtil; import com.wyq.mysqldemo.databaseobject.UserKeyAuto; import com.wyq.mysqldemo.databaseobject.UserKeyRandom; import com.wyq.mysqldemo.databaseobject.UserKeyUUID; import com.wyq.mysqldemo.diffkeytest.AutoKeyTableService; import com.wyq.mysqldemo.diffkeytest.RandomKeyTableService; import com.wyq.mysqldemo.diffkeytest.UUIDKeyTableService; import com.wyq.mysqldemo.util.JdbcTemplateService; import org.junit.jupiter.api.Test; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.test.context.SpringBootTest; import org.springframework.util.StopWatch; import java.util.List; @SpringBootTest class MysqlDemoApplicationTests { @Autowired private JdbcTemplateService jdbcTemplateService; @Autowired private AutoKeyTableService autoKeyTableService; @Autowired private UUIDKeyTableService uuidKeyTableService; @Autowired private RandomKeyTableService randomKeyTableService; @test void testDBTime() {StopWatch StopWatch = new StopWatch(" SQL execution time "); /** * auto_increment key task */ Final String insertSql = "INSERT INTO user_key_auto(user_id,user_name,sex,address,city,email,state) VALUES(? ,? ,? ,? ,? ,? ,?) "; List<UserKeyAuto> insertData = autoKeyTableService.getInsertData(); Stopwatch. start(" Automatic key table generation task starts "); long start1 = System.currentTimeMillis(); if (CollectionUtil.isNotEmpty(insertData)) { boolean insertResult = jdbcTemplateService.insert(insertSql, insertData, false); System.out.println(insertResult); } long end1 = System.currentTimeMillis(); System.out.println("auto key cost :" + (end1-start1)); stopwatch.stop(); / * * * uudID key * / final String insertSql2 = "INSERT INTO user_uuid (id, user_id, user_name, sex, address, city, email, state) VALUES(? ,? ,? ,? ,? ,? ,? ,?) "; List<UserKeyUUID> insertData2 = uuidKeyTableService.getInsertData(); Stopwatch. start("UUID key table task starts "); long begin = System.currentTimeMillis(); if (CollectionUtil.isNotEmpty(insertData)) { boolean insertResult = jdbcTemplateService.insert(insertSql2, insertData2, true); System.out.println(insertResult); } long over = System.currentTimeMillis(); System.out.println("UUID key time :" + (over-begin)); stopwatch.stop(); Key */ final String insertSql3 = "INSERT INTO user_random_key(id,user_id,user_name,sex,address,city,email,state) VALUES(? ,? ,? ,? ,? ,? ,? ,?) "; List<UserKeyRandom> insertData3 = randomKeyTableService.getInsertData(); Stopwatch. start(" Random long key table task starts "); Long start = System.currentTimeMillis(); if (CollectionUtil.isNotEmpty(insertData)) { boolean insertResult = jdbcTemplateService.insert(insertSql3, insertData3, true); System.out.println(insertResult); } Long end = System.currentTimeMillis(); System.out.println(" random key task cost time :" + (end-start)); stopwatch.stop(); String result = stopwatch.prettyPrint(); System.out.println(result); } `Copy the code

1.3. Program write result

User_key_auto Result:

User_random_key writes the result:

Result of writing user_UUID table:

1.4. Efficiency test results

When the existing data amount is 130W: Let’s test again to insert 10W of data and see what happens:

It can be seen that when the amount of data is about 100W, the insertion efficiency of UUID is bottom, and 130W of data is added in the later order, and the time of UUDI drops straight again.

The efficiency ranking of the total time consumption can be: auto_KEY >random_key> UUID. The efficiency of UUID is the lowest. In the case of a large amount of data, the efficiency drops linearly. So why does this happen? With that in mind, let’s explore this question:

2. Compare index structure with uUID and increment ID

2.1. Internal structure using autoincrement ID

The value of the increment primary key is sequential, so Innodb stores each record after another record. When the page’s maximum fill factor is reached (InnoDB’s default maximum fill factor is 15/16 of the page size, leaving 1/16 space for later modifications) :

(1) The next record will be written to the new page. Once the data is loaded in this order, the primary key page will be filled with almost sequential records, increasing the maximum fill rate of the page, and no page waste

(2) The newly inserted row must be one row below the original maximum data row. Mysql can locate and address quickly, and there is no additional cost to calculate the position of the new row

(3) Reduced page splitting and fragmentation

2.2. Internal index structure using uUID

Because uids are random with respect to sequential increment ids, the new row value does not have to be larger than the previous primary key value, so InnoDB cannot always insert the new row at the end of the index. Instead, innoDB needs to find a new place for the new row to allocate new space.

This process requires a lot of additional operations, and the disordered data will lead to scattered data distribution, which will lead to the following problems:

(1) The target page is likely to have been flushed to disk and removed from the cache, or has not yet been loaded into the cache. Innodb has to find and read the target page from disk into memory before inserting, which leads to a lot of random IO

(2) Since the writes are out of order, InnoDB has to split pages frequently to allocate space for new rows. Page splitting causes a large amount of data to be moved, and at least three pages need to be changed in an insert

(3) Due to frequent page splitting, pages can become sparse and irregularly filled, eventually leading to data fragmentation

After loading random values (UUID and snowflake ID) into the cluster index (innoDB’s default index type), it is sometimes necessary to do an OPTIMEIZE TABLE to reconstruct the TABLE and optimize the page population, which will take some time.

Conclusion: When using InnoDB, insert in incrementing order of primary keys as much as possible, and insert rows with monotonically increasing values of cluster keys as much as possible

2.3. Disadvantages of using autoincrement ID

So there’s no harm in using an auto-increasing ID at all? No, auto-increment id also has the following problems:

① Once others climb your database, you can get your business growth information according to the database increase ID, it is easy to analyze your business situation

(2) For high concurrent load, innoDB will cause obvious lock contention when pressing the primary key to insert. The upper bound of the primary key will become the hot spot of contention, because all inserts happen here, and concurrent inserts will cause gap lock contention

③ The Auto_Increment lock mechanism causes snatch of the autoincrement lock, which has a certain performance penalty

Add: Auto_increment lock contention problem, if you want to improve the need to tune innodb_autoinc_LOCK_mode configuration

Third, summary

In this paper, we first put forward the problem at the beginning, from table building to using jdbcTemplate to test the performance of different ID generation strategies in the data insertion of large amount of data. Then, we analyzed the different ID mechanisms in mysql index structure, as well as the advantages and disadvantages, and explained the performance loss of UUID and random non-repeating ID in the data insertion. Explained the problem in detail.

In the actual development or according to the official recommendation of mysql, it is best to use self-increasing ID. Mysql is broad and profound, and there are many internal points worth optimizing that we need to learn.