Why do 39 | on the primary key is not straight?

In article 4, we talked about auto-increment primary keys, which are more compact because they allow primary key indexes to keep their insertion order as much as possible and avoid page splitting.

But in practice auto-increment primary keys do not guarantee continuous increment.

In today’s article, we will take a look at this question, and see when the increment primary key will appear “empty”? For illustration purposes, we create a table T where ID is the auto-increment primary key field and C is the unique index.

CREATE TABLE `t` (
`id` int(11) NOTNULLAUTO_INCREMENT,
`c` int(11) DEFAULTNULL,
`d` int(11) DEFAULTNULL,
PRIMARY KEY (`id`),
UNIQUE KEY `c` (`c`)
) ENGINE=InnoDB;
Copy the code

Where is the increment stored?

Insert into T values(null, 1, 1); Insert a row of data and execute the showcreate table command to see something like the following:

AUTO_INCREMENT=2 indicates that id=2 will be generated if auto-increment is required for the next data insert.

In fact, this output can lead to the misconception that the customization is stored in the table structure definition. In fact, the structure definition of the table is stored in a file with the suffix.frm, but it is not self-added.

Different engines have different save strategies for auto-increment.

  • The MyISAM engine’s self-increment is saved in data files.
  • InnoDB engine’s self-increment is actually saved in memory, and after MySQL 8.0, it has the ability of “self-increment persistence”, that is, “if restart occurs, the table self-increment can be restored to the value before MySQL restart”. The specific situation is as follows:
  1. In MySQL 5.7 and earlier, increments are stored in memory and not persisted. After each restart, when the table is opened for the first time, Max (id) is found and Max (id)+1 is used as the current value of the table.
  2. In MySQL 8.0, auto-increment changes are recorded in the redo log. During a restart, the redo log is used to restore the original values.

Now that you understand MySQL’s save strategy for self-increment, let’s look at the self-increment modification mechanism.

Self-value-added modification mechanism

In MySQL, if the field id is defined as the AUTO_INCREMENT, when inserting a row data, since the value-added behavior is as follows:

  1. Alter TABLE AUTO_INCREMENT increment (int id, int id, int id); alter table AUTO_INCREMENT (INT id, int id, int id, int id);
  2. If the id field specifies a value when inserting data, the value specified in the statement is used.

Depending on the relationship between the value to be inserted and the size of the current increment, the result of the increment change will be different. Suppose, at some point, the value to be inserted is X, and the current increment is Y.

  1. If X
  2. If X is greater than or equal to Y, you need to change the current auto-increment to the new auto-increment.

The new increment algorithm is to start with auto_increment_offset and incrementally increment until the first value greater than X is found as the new increment.

Where, auto_increment_offset and auto_increment_increment are two system parameters used to indicate the initial value and the step size of the increment respectively. The default value is 1.

When both auto_increment_offset and auto_increment_increment are 1, the new increment logic is simple:

  1. If the value to be inserted is greater than or equal to the current value, the new value to be inserted is +1.
  2. Otherwise, the self-appreciation remains unchanged.

This brings us to the problem we mentioned at the beginning of this article. When both parameters are set to 1, the auto-increment primary key ID is not guaranteed to be continuous.

The timing of self-value-added modifications

To answer this question, we need to look at the timing of self-value-added modifications. Insert (1,1,1); insert (1,1,1);

insert into t values(null, 1, 1);

The flow of this statement is as follows:

  1. The executor calls the InnoDB engine interface to write a line that is passed with the value (0,1,1);
  2. InnoDB finds that the user does not specify the value of the increment id, and obtains the current increment 2 of table T.
  3. Change the value of the passed row to (2,1,1);
  4. Alter table autoincrement to 3;
  5. Duplicate KeyError is reported because c=1 already exists. The Duplicate KeyError statement returns.

The corresponding execution flow is as follows:

As you can see, the table’s increment is changed to 3 before the actual insert operation is performed. Insert (id=2); insert (id=2); insert (c);

So, after that, when you insert a new row, you get an increment id of 3. In other words, there is an autoincrement primary key discontinuity. The complete demo is shown below.

As you can see, this sequence of operations recreates a scene where the primary key ID is incremented discontinuous (no rows with ID =2).

  • Unique key conflicts are the first cause of self-added primary key ids being discontinuous.
  • Similarly, transaction rollback results in a similar phenomenon, which is the second reason.

The following statement sequence can construct a discontinuous increment ID.

Insert into t values (null, 1, 1); begin; Insert into t values (null, 2, 2); rollback; Insert into t values (null, 2, 2); // insert rows (3,2,2)Copy the code

Why doesn’t MySQL change the auto-increment of table T back when a unique key conflict or rollback occurs?

Alter table t (id=2) alter table T (id=2) alter table T (id=2)

In fact, MySQL is designed to improve performance. Next, I will analyze this design idea with you and see why self-value-added can not be retreated.

Assume that there are two parallel transactions. When applying for increment, to avoid the two transactions applying for the same increment ID, they must be locked and applied sequentially.

  1. If transaction A applies to id=2 and transaction B applies to ID =3, then the auto-increment of table T is 4, and then the execution continues.
  2. Transaction B committed correctly, but transaction A had A unique key conflict.
  3. If transaction A is allowed to roll back the increment ID of table T, that is, to change the current increment of table T back to 2, then we have A situation where there are already rows in the table with ID =3 and the current increment ID is 2.
  4. Next, other transactions that continue will apply to id=2, and then to id=3. At this point, an insert statement error “primary key conflict” occurs.

To resolve this primary key conflict, there are two methods:

  1. Before applying for an ID, check whether the ID already exists in the table. If it exists, the ID is skipped. However, the cost of this approach is high. The primary key index tree is used to determine whether the id exists.
  2. To extend the lock scope of the increment ID, a transaction must wait until it completes and commits before the next transaction can apply for the increment ID. The problem with this method is that the granularity of the lock is too large, and the system concurrency is greatly reduced.

As you can see, both approaches cause performance problems. The culprit for all this trouble is the premise that allows auto-increment ID rollback.

As a result, InnoDB abandoned this design and did not roll back the increment ID if the statement failed. Because of this, only the increment ID is guaranteed to be increasing, but not continuous.

Optimization of auto-increment lock

As you can see, the increment ID lock is not a transaction lock, but is released immediately after each transaction is applied to allow other transactions to apply again.

Well, prior to MySQL 5.1, this was not the case.

As of MySQL 5.0, the scope of autolocks is statement level. That is, if a statement applies for a table increment lock, the lock will not be released until the statement completes execution. It obviously affects concurrency.

MySQL 5.1.22 introduced a new policy with the innodb_autoinc_lock_mode parameter. The default value is 1.

  1. If the value of this parameter is set to 0, the previous policy of MySQL 5.0 is adopted, that is, the lock is released after the statement is executed.
  2. When this parameter is set to 1:
  • Normal INSERT statement, autolock is released immediately after the request;
  • Similar to the insert… Select batch insert data statement, autoincrement lock or wait for the end of the statement before being released;
  1. When the value of this parameter is set to 2, all requests for auto-increment primary keys release the lock on request.

You must have two questions: why is the default setting for insert… Select to use statement level lock? Why is the default value of this parameter not 2? The answer is, again, data consistency.

Let’s take a look at the scene:

In this example, I insert 4 rows into table T1, then create a table T2 with the same structure, and then both sessions perform the insert into table T2 simultaneously.

You can imagine a situation like this if session B is granted an autoincrement lock and then immediately releases it:

  • Session B first inserts two records (1,1,1) and (2,2,2);
  • Session A then increments the id to get id=3 and inserts (3,5,5);
  • Session B then continues, inserting two records (4,3,3) and (5,4,4).

You might say, well, it doesn’t matter, because session B’s semantics don’t require that all rows in table T2 have the same data as session A.

Yes, the logic of the data is correct. However, if our current binlog_format=statement, you can imagine what will happen to the binlog?

Since both sessions execute insert commands at the same time, there are only two cases in which the update log for table T2 is recorded: either session A is recorded first or session B is recorded first.

But either way, this binlog is going to be executed from the library, or it’s going to be used to restore the temporary instance, the standby library and the temporary instance, so session B is going to be executed, and the result is going to be continuous ids. At this point, data inconsistencies occur in the library.

What are the reasons for the inconsistencies?

Insert statement from session B; This discontinuous id cannot be executed sequentially in statement binlog format.

To solve this problem, there are two approaches:

  1. One idea is to have the original library batch insert data statement, fixed to generate continuous ID values. Therefore, the auto-increment lock is not released until the end of the statement execution to achieve this purpose.
  2. Another way to do this is to record all inserts in the binlog, instead of relying on auto-increment primary keys to generate them when they are executed in the standby database. In this case, innodb_autoinc_lock_mode is set to 2 and binlog_format is set to row.

So in production, especially with insert… Innodb_autoinc_lock_mode =2 and binlog_format=row.

In this way, concurrency can be improved without data consistency problems. Note that when I say batch insert, the statement type is insert… Select and replace… Select and Load data statements.

However, if a normal INSERT statement contains multiple values, even if innodb_autoINC_LOCK_mode is set to 1, the lock is not released until the statement is complete. When applying for an id, you can calculate exactly how many ids are needed, and then apply for the id once. After applying for the ID, you can release the lock.

In other words, the statement that inserts data in batches needs to be set this way because “I do not know how many ids to apply for in advance”.

Since you don’t know in advance how many increment ids to apply for, the straightforward idea is to apply for one if you need one. But if a SELECT… To insert 100,000 rows of data, the insert statement would apply 100,000 times. Obviously, in the case of mass data insertion, the policy of applying for self-increasing ID not only slows down, but also affects the performance of concurrent insertion.

Therefore, MySQL has a policy of applying for increment ids in batches for statements that insert data in batches:

  1. When the statement is executed, the first time you apply for a self-added ID, one id is allocated.
  2. When one id is used up, the statement applies for a second increment and allocates two ids.
  3. When two ids are used up, four ids will be allocated on the third request.
  4. In this way, the same statement applies for the self-added ID, and the number of self-added ids is twice as many as the last one.

For example, let’s look at the following statement sequence:

Insert into t values(null, 1,1); Insert into t values(null, 2,2); Insert into t values(null, 3,3); Insert into t values(null, 4,4); create table t2 like t; insert into t2(c,d) select c,d from t; Insert into t2 values(null, 5,5);Copy the code

Insert… Select actually inserts 4 rows into table T2. Id =1; id=2 and ID =3; id=4 and ID =7;

Since this statement actually uses only four ids, id=5 through ID =7 is wasted. Insert into t2values(null, 5,5) then insert into t2values(null, 5,5).

This is the third reason why primary key ids are incremented and discontinuous.