background

In this article, you can see why developers need to know about distributed locks. In this article, you can see why developers need to know about distributed locks. After the above experience, I thought I had no problem with deadlocks, but one sunny afternoon a deadlock was reported, but this time it was not as easy as EXPECTED.

At the beginning of problems now

One afternoon, the system raised an alarm and threw an exception:


Innodb Status = Innodb Status = Innodb Status = Innodb Status = Innodb

SHOW ENGINE INNODB STATUSCopy the code

The deadlock information is as follows, and the SQL information is simply processed:

------------------------LATEST DETECTED DEADLOCK------------------------2019-02-22 15:10:56 0x7eec2f468700*** (1) TRANSACTION:TRANSACTION 2660206487, ACTIVE 0 sec starting index readmysql tables in use 1, locked 1LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)MySQL thread id 31261312, OS thread handle 139554322093824, Query ID 11624975750 10.23.134.92 erp_crm__6f73 updating/* ID :3637ba36*/UPDATE tenant_config SET open_card_point = 0 where tenant_id = 123*** (1) WAITING FOR THIS LOCK TO BE GRANTED:RECORD LOCKS space id 1322 page no 534 n bits 960 index  uidx_tenant of table `erp_crm_member_plan`.`tenant_config` trx id 2660206487 lock_mode X locks rec but not gap waiting*** (2) TRANSACTION:TRANSACTION 2660206486, ACTIVE 0 sec starting index readmysql tables in use 1, locked 13 lock struct(s), heap size 1136, 2 row lock(s)MySQL thread id 31261311, OS thread handle 139552870532864, Query ID 11624975758 10.23.134.92 erp_crm__6f73 updating/* ID :3637ba36*/UPDATE tenant_config SET open_card_point = 0 where tenant_id = 123*** (2) HOLDS THE LOCK(S):RECORD LOCKS space id 1322 page no 534 n bits 960 index uidx_tenant of table `erp_crm_member_plan`.`tenant_config` trx id 2660206486 lock mode S*** (2) WAITING FOR THIS LOCK TO BE GRANTED:RECORD LOCKS space id 1322 page no 534 n bits 960 index uidx_tenant of table `erp_crm_member_plan`.`tenant_config` trx id 2660206486 lock_mode X locks rec but not gap waiting*** WE ROLL BACK TRANSACTION (1)------------Copy the code

Transaction 1 needs to obtain the X lock (row lock) on the Uidx_tenant index when executing the Update statement. Transaction 2 needs to obtain the X lock (row lock) on the uidx_Tenant condition when executing the Update statement. Then a deadlock occurs, rolling back transaction 1. Confused, I thought back to the conditions necessary for a deadlock to occur:

  1. The mutex.

  2. Request and hold conditions.

  3. No deprivation of conditions.

  4. Loop wait. From the log, it can be seen that transaction 1 and transaction 2 are locked for the same row, which is a little different from the previous circular lock contention, which cannot satisfy the circular wait condition. After being reminded by my colleague, since the troubleshooting cannot be done from the deadlock log, the troubleshooting can only be done from the business code and business log. The logic of this code is as follows:

public int saveTenantConfig(PoiContext poiContext, TenantConfigDO tenantConfig) { try { return tenantConfigMapper.saveTenantConfig(poiContext.getTenantId(), poiContext.getPoiId(), tenantConfig); } catch (DuplicateKeyException e) {logger. warn("[saveTenantConfig] primary key conflict, update this record. context:{}, config:{}", poiContext, tenantConfig); return tenantConfigMapper.updateTenantConfig(poiContext.getTenantId(), tenantConfig); }}Copy the code

This code is meant to save a configuration file and update it if a unique index conflict occurs. Of course, this code may not be very standard, but it can be used

insert into ... on duplicate key update Copy the code

The same effect can be achieved, but deadlocks can actually occur even with this. After looking at the code, my colleague sent me the current business log,


You can see that there are three simultaneous logs, indicating that a unique index conflict occurred in the updated statement, and then a deadlock occurred. At this point, the answer finally becomes a little clearer.

Now look at our table structure as follows (simplified):

CREATE TABLE `tenant_config` (  `id` bigint(21) NOT NULL AUTO_INCREMENT,  `tenant_id` int(11) NOT NULL,  `open_card_point` int(11) DEFAULT NULL,  PRIMARY KEY (`id`),  UNIQUE KEY `uidx_tenant` (`tenant_id`)) ENGINE=InnoDB  DEFAULT CHARSET=utf8mb4 ROW_FORMAT=COMPACTCopy the code

Our tenant_ID is used for the unique index, and our insert and update WHERE conditions are based on the unique index.

UPDATE tenant_config SET       open_card_point =  0       where tenant_id = 123Copy the code

At this point, it is felt that the lock on the unique index is related to the insertion, and we will proceed to the next step of in-depth analysis.

in-depth

Three transactions enter the update statement at the same time. The following table shows how all transactions enter the update statement:


Tip :S locks are shared locks, and X locks are mutually exclusive locks. In general, X locks and S locks are mutually exclusive, and S locks and S locks are not mutually exclusive.

We can see from the above flow that the key to this deadlock is to acquire the S lock. Why do we need to acquire the S lock when we insert it again? Because we need to check for unique indexes, right? In RR isolation, if you want to read, then it is currently read, then you actually need to add an S lock. This is where the unique key is found, and an update is blocked by the two transaction S locks on each other, resulting in the circular wait condition above.


Note: In MVCC, the difference between the current read and snapshot read is that the current read requires a lock (shared lock or mutex lock) to obtain the latest data, while the snapshot read reads the snapshot at the start of the transaction, which is implemented by using undo log.

This is the cause of the whole deadlock, which can also occur if three inserts are performed at the same time, and if the first transaction is rolled back, the other two transactions will also be deadlocked.

The solution

The core problem here is the need to kill the S lock, and there are three possible solutions:

  • Reduce the RR isolation level to the RC isolation level. Here the RC isolation level is read by snapshot so that the S lock is not applied.

  • Insert select * for update with X lock.

  • Distributed locks can be added in advance, using Redis, ZK, etc. Distributed locks can be referenced in this article. Talk about distributed locking

The first approach is not practical because the isolation level cannot be easily changed. The third method is more troublesome. So the second method is what we finally decided on.

conclusion

With all that said, let’s conclude with a little. When troubleshooting deadlock problems, deadlock logs sometimes fail to solve the problem. You need to analyze the entire business log, code, and table structure to get the correct result. Of course, there are some basic database locks that you can check out my other article on why developers need to know about distributed locks.

Finally, this article was included in the Jgrow-Case Study, a comprehensive and excellent Java learning route co-built by the community. If you want to participate in the maintenance of open source projects, you can co-build it. Making the address is: https://github.com/javagrowing/JGrowing trouble yo give a little star.

If you find this article helpful to you, your attention and forwarding will be my biggest support.