How to ensure idempotency of interfaces under high concurrency?

preface

Interface idempotence is a common problem for developers irrespective of language. This article shares some very practical ways to solve this kind of problem, most of the content I have practiced in the project, for the need of a small partner for reference.

I don’t know if you’ve encountered any of these scenarios:

Sometimes we fill in some forms and accidentally hit the save button twice quickly, resulting in two duplicate entries with different ids.
We often introduce retry mechanisms in projects to address interface timeouts. The first time the interface timed out, the requester did not get the result in time (which may have been successful), in order to avoid returning the wrong result (this case can not directly return failure?). , the request is retried several times, which also produces duplicate data.
Mq consumers sometimes get duplicate messages when they read messages (for what reason?), which can also produce duplicate data if not handled properly.

Recently, I accidentally got a copy of the notes written by a big boss of BAT factory. All of a sudden, I got through to both my supervisor and my supervisor. I felt that the algorithm was not as difficult as I imagined. BAT boss wrote the brush notes, let me get the offer soft

Yes, these are idempotent problems.

Interface idempotency means that the results of one or multiple requests for the same operation are consistent without side effects caused by multiple clicks.

This type of problem occurs in interfaces:

Insert operations, in which multiple requests may result in duplicate data.
Update user set status=1 where id=1; Update user set status=status+1 where id=1 update user set status=status+1 where id=1

So how do we keep interfaces idempotent? This article will tell you the answer.

1. Select () before insert

Typically, in an interface where data is stored, to prevent duplicate data, we will select data from the name or code field before the insert. Update if the data already exists, insert if it does not.

This scheme is probably the one we use most often to prevent duplication of data. However, this scheme is not applicable to concurrent scenarios. In concurrent scenarios, it must be used together with other schemes; otherwise, duplicate data will be generated. I mention it here to avoid people stepping on holes.

2. Add pessimism lock

In the payment scenario, user A’s account balance is 150 yuan and wants to transfer 100 yuan, but under normal circumstances, user A’s balance is only 50 yuan. In general, SQL looks like this:

update user amount = amount-100 where id=123;
Copy the code

If the same request occurs multiple times, user A’s balance may turn negative. In this case, user A is probably going to cry. At the same time, the system developers are probably crying, because this is a serious system bug.

To solve this problem, A pessimistic lock can be added, which locks user A’s row of data and allows only one request to acquire the lock and update data at A time, while the rest of the requests wait.

Typically, a single row of data is locked with the following SQL:

select * from user id=123 for update;
Copy the code

The specific process is as follows:

Specific steps:

Multiple requests query user information based on id at the same time.
Check whether the balance is less than 100. If the balance is less than 100, return the balance is insufficient.
If the balance is sufficient, query the user information again with for UPDATE and try to obtain the lock.
Only the first request can obtain the row lock, and the other requests that do not obtain the lock will wait for the next opportunity to obtain the lock.
After the first request acquires the lock, determine if the balance is less than 100, and update if the balance is sufficient.
If the balance is insufficient, the request is repeated and success is returned.

It is important to note that if you are using a mysql database, the storage engine must use InnoDB because it only supports transactions. In addition, the id field must be the primary key or unique index, otherwise the entire table will be locked.

Pessimistic locking requires a row of data to be locked in the same transaction. If the transaction takes a long time, a large number of requests will be waited, affecting interface performance.

In addition, it is difficult to guarantee the same return value for each request interface, so it is not suitable for idempotent design scenarios, but can be used in anti-replay scenarios.

By the way, there’s a difference between anti-redesigning and idempotent design. Anti-redesign is mainly to avoid duplicate data and does not have too many requirements on interface return. Idempotent design, in addition to avoiding duplicate data, requires that the same result be returned every time a request is made.

3. Add optimistic locks

Since pessimistic locks have performance problems, we can use optimistic locks to improve interface performance. You need to add a TIMESTAMP or Version field to the table. The version field is used as an example.

Query the data before updating it:

select id,amount,version from user id=123;
Copy the code

If version = 1, use id and version as query criteria to update data:

update user set amount=amount+100,version=version+1
where id=123 and version=1;
Copy the code

If the number of rows affected by the update operation is greater than 0, the update is successful; if it is equal to 0, the update has not changed the data.

Since the first request for version equal to 1 was successful, the operation succeeded and version became 2. If a concurrent request comes in, execute the same SQL:

 update user set amount=amount+100,version=version+1
where id=123 and version=1;
Copy the code

The update operation does not actually update the data, and the result of the SQL execution will affect the number of rows 0, because version has changed to 2, and version=1 in where will definitely not satisfy the condition. However, in order to ensure the idempotent nature of the interface, the interface can return success directly, because the version value has been changed, then the previous success must have been once, and the subsequent requests are repeated.

The specific process is as follows:

Specific steps:

Query user information based on the ID, including the Version field
Update user information based on the ID and version field values as parameters of the WHERE condition, with version+1
Check the number of rows affected by the operation. If the operation affects 1 row, it indicates that the operation is a request and other data operations can be performed.
If 0 rows are affected, the request is repeated and success is returned.

4. Add unique indexes

In most cases, adding a unique index to a table to prevent duplicate data is a very simple and effective solution.

alter table `order` add UNIQUE KEY `un_code` (`code`);
Copy the code

After adding a unique index, the first request for data can be inserted successfully. However, the Duplicate Entry ‘002’ for key ‘ORDER. Un_code is abnormal when inserting data in the same request, indicating a unique index conflict.

Although throwing exceptions does not affect the data, it will not cause error data. But to keep the interface idempotent, we need to catch the exception and return success.

If it is a Java program needs to capture: DuplicateKeyException anomalies, if you use the spring framework will also need to capture: MySQLIntegrityConstraintViolationException anomalies.

The specific flow chart is as follows:

Specific steps:

The user initiates a request through the browser, and the server collects data.
Insert this data into mysql
Determine if the execution was successful, and if so, operate on additional data (and possibly other business logic).
If the execution fails, catch the unique index conflict exception and return success.

5. Build a repeat watch

Sometimes not all scenarios in a table are not allowed to produce duplicate data, only certain scenarios are not allowed. Adding a unique index to the table is obviously not appropriate.

In view of this situation, we can solve the problem by building anti-repeat table.

The table can contain only two fields: ID and unique index. The unique index can be the unique identifier of multiple fields, such as name and code, for example, SUSan_0001.

The specific flow chart is as follows:

Specific steps:

The user initiates a request through the browser, and the server collects data.
Insert this data into the mysql anti-replay table
Check whether the execution is successful, and if so, do other mysql data operations (and possibly other business logic).
If the execution fails, catch the unique index conflict exception and return success.

It is important to note that the anti-duplicate table and the business table must be in the same database and the operations must be in the same transaction.

6. According to the state machine

Most of the time, the business table is stateful, for example, the order table has: 1- placed, 2- paid, 3- completed, 4- cancelled and so on. If the values of these states are regular, and the business nodes exactly grow from small to large, we can use it to ensure the idempotency of the interface.

If the status of the order with ID =123 is paid, it should now become completed.

update `order` set status=3 where id=123 and status=2;
Copy the code

On the first request, the status of the order is paid and the value is 2, so the UPDATE statement can update the data normally, and the impact of the SQL execution result is 1, and the order status changes to 3.

When the same request comes later and the same SQL is executed, the order status changes to 3 and status=2 is used as the condition, the data that needs to be updated cannot be queried. Therefore, the number of rows affected by the final SQL execution result is 0, that is, the data will not be really updated. However, to ensure the idempotency of the interface, if the number of affected rows is 0, the interface can directly return success.

The specific flow chart is as follows:

Specific steps:

The user initiates a request through the browser, and the server collects data.
Update to the next state based on the ID and current state as conditions
Check the number of rows affected by the operation. If 1 row is affected, the operation succeeds and you can perform other data operations.
If 0 rows are affected, the request is repeated and success is returned.

The main thing to note is that this scheme is limited to the special case where the table to be updated has stateful fields and happens to be updated, and is not suitable for all scenarios.

7. Add distributed locks

In fact, the previous introduction of adding a unique index or anti-repeat table, the essence is the use of a distributed database lock, also belongs to a distributed lock. However, because the performance of database distributed locks is not very good, we can use redis or ZooKeeper instead.

Given the fact that many companies are using Distributed configuration centers instead of Using ZooKeeper, such as Apollo or NACOS, we will use Redis as an example to introduce distributed locks.

There are three main ways to implement distributed locks in Redis:

SetNx command
The set command
Redission framework

Each scheme has its own advantages and disadvantages. I will not mention the specific implementation details. Interested friends can add my wechat and talk to me privately.

The specific flow chart is as follows:

Specific steps:

The user initiates a request through the browser, and the server collects the data and generates the order number code as a unique business field.
Use the redis set command to set the order code to Redis and set the timeout.
Check whether the configuration is successful. If the configuration is successful, it indicates that this is the first request and data operation is performed.
If the setting fails, the request is repeated and success is displayed.

It is important to note that a distributed lock must have a reasonable expiration time. If the expiration time is too short, it cannot effectively prevent repeated requests. If the value is too long, redis storage space may be wasted. The value depends on the actual service situation.

8. Access token

In addition to the above options, there is one last option to use tokens. This scenario is a bit different from all previous scenarios, requiring two requests to complete one business operation.

The first request to obtain the token
The second request takes the token and completes the business operation.

The specific flow chart is as follows:

The first step is to obtain the token.

The second step is to do specific business operations.

Specific steps:

When a user accesses the page, the browser automatically initiates a request to obtain the token.
The server generates the token, saves it in Redis, and returns it to the browser.
When a user initiates a request through a browser, the user carries the token.
Check whether the token exists in Redis. If the token does not exist, it is the first request and subsequent data operations are performed.
If yes, the request is repeated and success is returned.
Tokens in Redis are automatically deleted after the expiration date.

The above scheme is for idempotent design.

If it is anti-redesign, the flow chart should be modified:

It is important to note that tokens must be globally unique.

One last word (attention, don’t fuck me for nothing)

If this article is of any help or inspiration to you, please scan the QR code and pay attention to it. Your support is the biggest motivation for me to keep writing.

Ask for a key three even: like, forward, look.

Pay attention to the public account: [Su SAN said technology], in the public account reply: interview, code artifact, development manual, time management have excellent fan welfare, in addition reply: add group, can communicate with a lot of BAT big factory seniors and learn.

How to ensure idempotency of interfaces under high concurrency?

preface

1. Select () before insert

2. Add pessimism lock

3. Add optimistic locks

4. Add unique indexes

5. Build a repeat watch

6. According to the state machine

7. Add distributed locks

8. Access token

One last word (attention, don’t fuck me for nothing)

Related Posts

A crawler movie station based on Golang

Jenkins+Jmeter interface successfully identifies build status

Node.js multiple processes handle CPU-intensive tasks