I remember when I was looking for a job at the beginning of the year, the interviewer always asked me if I had worked on distributed system. As a developer, distributed system has become standard, and distributed transaction has become standard with distributed system. How to solve the distributed problem is one of the most asked questions, because there is no sufficient preparation, so the answer is always unable to touch the head, so I take this opportunity, through consulting information and combining my own work experience, summed up some solutions to solve the distributed transaction. I summarized five solutions, namely XA solution, TCC solution, local message table, reliable message final consistency solution and maximum effort notification solution, and mainly expanded the advantages and disadvantages and main principles of the solution.Copy the code

A. XA

XA solution, the core of the two-phase commit, has a transaction manager concept, responsible for coordinating the transactions of multiple databases (resource managers), the transaction manager first ask each database are you ready? If each database replies ok, then the transaction is formally committed and the operation is performed on each database; If either of the databases answers no, then the transaction is rolled back. For example, our company will organize group construction, and then there will be a person in charge of group construction. Generally, the person in charge of group construction will ask the people in the team the week before, and say, guys, we will go skiing and barbecue next Saturday, shall we go? At this time, the leader of the group construction starts to wait for everyone's answer. If everyone says OK, then we can decide to go to the group construction together. If any of the people at this stage say, "I'm not going," then the organizer cancels the event. This is called an XA transaction, a two-phase commit. There is the concept of a transaction manager that coordinates transactions between multiple databases (resource managers). The transaction manager asks each database are you ready? If each database replies ok, the transaction is formally committed, the operation is performed on each database, and if any database replies no, the transaction is rolled back. This distributed transaction scheme is more suitable for single block applications, across multiple distributed transactions, and because it relies heavily on the database level to handle complex transactions, the efficiency is very low, and it is definitely not suitable for high concurrency scenarios. Generally speaking, if there is an operation within a system that spans multiple libraries, it is not compliant. Now microservices, a large system divided into hundreds of services, dozens of services. In general, our rules and specifications require that each service function operates on its own corresponding database. If you want to operate other corresponding library services, are not allowed to direct other service of library, violations against the micro service architecture specification, you literally crossing random access, hundreds of services, all the broken, the administration of such a service can't can't control, data correction by others, often their own libraries written by others. If you want to operate on someone else's service library, you must do so by calling another service's interface, never allowing cross-access to someone else's database.Copy the code

2. TCC scheme

The full name of TCC is try-confirm-cancel, which can be understood as a try-confirm-cancel operation, and can also be referred to as a two-stage compensation operation. In the first stage, Try is only reserved resource, and in the second stage, the service provider should be clearly told whether you want this resource or not, and Confirm/Cancel in the second stage should be answered. It is used to clean up the effects of the first phase, so it is called a compensating transaction. For example, when we are shopping online, we click the payment button on the page to call the payment service. Then we need to implement the following three steps in the background: First, modify the order state, namely the try stage: Order service: modify the order state to payment; Account service: The account balance stays the same, the available balance is reduced by 1, then the number 1 is frozen in a separate field; Inventory service: Keep the number of inventory unchanged, subtract 1 from the marketable inventory, and then freeze the number 1 in a separate field; Order service: Changes the status of the order to payment completed; Account service: the account balance becomes (the current value minus the value of the frozen field), the available balance remains unchanged (it has been reduced in the Try stage), and the frozen field is cleared to 0; Inventory service: Inventory becomes (current value minus the value of the frozen field), salable inventory remains unchanged (the Try stage has been reduced), and the frozen field is cleared to 0; Order service: Changes the status of the order to unpaid account Service: the account balance remains unchanged, the available balance becomes (the current value plus the value of the frozen field), and the frozen field clears 0. Inventory service: inventory remains unchanged, saleable inventory becomes (current value plus the value of the frozen field), and the frozen field clears 0. In order to achieve the effect of transactions, either together succeed or together fail, this is the TCC distributed transaction scheme. To be honest, this scheme is rarely used, and I use it less, because the transaction rollback is actually heavily dependent on your own code to roll back and compensate, resulting in huge compensation code. However, there are also scenarios of use. Generally speaking, TCC will be used for scenarios related to money, dealing with money, payment and transaction, to strictly ensure distributed transactions, either all successful, or all automatic rollback, strictly ensure the correctness of funds, to ensure that there will be no problems in funds. But it's best if you have a short execution time for each phase. But to be honest, generally try not to do this, write your own rollback logic, or compensation logic, the code is very complex, and that business code is difficult to maintain.Copy the code

Local message table

This way of thinking, is the foreign ebay to come out of such a set of ideas, later through alipay and other companies preaching, widely used in the industry. The basic design idea is to split remote distributed transactions into a series of local transactions. Regardless of performance or design elegance, this can be done with tables in a relational database. A inserts A data into the message table when the system operates in A local transaction. 2. System A then sends the message to MQ. 3. After receiving the message, the system inserts a data into its local message table in a transaction and performs other business operations at the same time. If the message has been processed, the transaction will be rolled back to ensure that the message will not be processed repeatedly. 4. After the successful execution, system B will update the status of its local message table and that of system A. 5. If system B fails to process the message table, the message table status will not be updated. At this time, system A will periodically scan its message table, and if there are unprocessed messages, it will send them to MQ again for B to process again. 6. This scheme guarantees ultimate consistency. Even if B fails, A will continue to resend messages until B succeeds. Take A classic example of inter-bank transfer, the first step is to deduct 1W from account A, ensure that the voucher message is inserted into the message table through local transaction, and the second step is to inform the other party that 1W is added to the bank account. Which begs the question, how do you notify them? Two ways are usually adopted: 1. MQ with high timeliness is adopted. The other party subscribs to messages and listens, and automatically triggers events when there is a message; 2. Use periodic polling scan to check the data in the message table. Both approaches have their pros and cons, and relying solely on MQ can cause notification failures. And too frequent timed polling, efficiency is not optimal (90% of useless). So, we usually use a combination of the two. Now that we've solved the notification problem, we have a new problem. In case this message is repeated consumption, add money to the user account, that is not very serious consequences? If you think about it, we can actually record consumption behavior through a "consumption status table" as well. Before performing the "add" operation, check whether the message (providing identification) has been consumed, and then update the "consumption status table" through local transaction control. In this way, the problem of double consumption is avoided. The appeal approach is a classic implementation that basically avoids distributed transactions and achieves "final consistency." However, the throughput and performance of relational databases are bottlenecks, and frequent read and write messages will cause stress to the database. Therefore, in true high concurrency scenarios, this solution will also have bottlenecks and limitations.Copy the code

Fourth, reliable message final consistency scheme

The idea is to do away with local message tables altogether and implement transactions directly based on MQ. Alibaba's RocketMQ, for example, supports message transactions. System A will send A Prepared message to MQ. If the prepared message fails to be sent, the operation will be cancelled. 2. If the message is sent successfully, then the local transaction is executed, telling MQ to send the confirmation of cancellation if it succeeds, and telling MQ to rollback the message if it fails. 3. If an acknowledgement message is sent, system B receives the acknowledgement message and executes the local transaction. Mq will automatically poll all prepared messages to call back to your interface and ask if this message is a local transaction failure. Should you retry or roll back any unconfirmed messages? Generally you can check the database here to see if the previous local transaction was executed, and if it was rolled back, then roll back here as well. This is to avoid the possibility that the local transaction execution succeeded but the confirmation message failed to be sent; 5. In this scenario, what if the transaction of system B fails? The system will automatically retry until the rollback succeeds. If the rollback fails, the system can roll back important capital services. For example, after the local rollback of system B, system A can be notified to roll back as well. Or send an alarm by manual manual rollback and compensation; Reliable message ultimate consistency is the consistency of messages passing from producer to consumer through messaging middleware. RocketMQ addresses two main features: 1. Atomicity of local transactions and message delivery. 2. Reliability of transaction participants receiving messages. The final consistency transaction of reliable message is suitable for the scenario with long execution cycle and low real-time requirement. With the introduction of message mechanism, synchronous transaction operation becomes asynchronous operation based on message execution, avoiding the influence of synchronous blocking operation in distributed transaction, and realizing the decoupling of the two services. This plan is still quite appropriate, at present domestic Internet company is mostly used so.Copy the code

5, best efforts to inform the scheme

System A sends A message to MQ after the local transaction is completed. There will be a Max effort notification service dedicated to consuming MQ, which will consume MQ and write it to the database, or put it on a memory queue, and then call the interface of system B; If system B succeeds, it is OK; If the execution of system B fails, the service will periodically try to call system B again, repeated N times, finally failed to give up. The maximum effort notification scheme is implemented in a relatively simple way, which is in essence periodically proofread. It is suitable for occasions where the time requirement of data consistency is not too high. In fact, it is not considered as a distributed transaction scheme, but as a cross-platform data processing scheme.Copy the code

A small summary

Each solution has its own advantages and disadvantages, can't say that a certain solution is absolutely good, or absolute difference, only in specific love scene, to see the advantages and disadvantages, solution but personally think that, if the order after insertion to invoke inventory service update inventory, inventory data without money so sensitive, can use reliable sources eventual consistency, For general distributed transaction scenarios, TCC can be used. I only listed several common centralized solutions, hoping to make a reference to the majority of readers, and on this basis to expand. The 5th issue of [🏆 technology project |] talk about distributed those things...  (https://juejin.cn/post/6872367966512644103)Copy the code