preface

CAP and BASE, we have learned about the problems of distribution and the general solution theory, but what are the specific implementation protocols or schemes?

  • Distributed consistency
  • Distributed consensus algorithm
    • Paoxs, Raft, Zab
  • Distributed transaction consistency
  • Implementation scheme of Distributed Transaction Consistency (XA mode and AT Mode)
    • Two-phase commit
    • Three-stage commit
    • Flexible transaction TCC
    • AT mode
    • Event notification

Pay attention to the public account, communicate together, wechat search: sneak forward

1 Distributed Consistency

  • What is distributed consistency? In fact, distributed consistency tends to solve the consistency of data copy state between multiple services rather than the consistency of relational database (data constraints).

2 distributed consensus algorithm

Paoxs algorithm

  • Paxos algorithm is a highly fault-tolerant consistency algorithm based on message passing. It is recognized as one of the most effective algorithms to solve distributed consistency problems
  • Popular understanding of the Paxos algorithm
    • Suppose ten people want to travel to Chengdu and Lhasa. In order to unify the destination, a simple way can be to open a wechat group chat, everyone vote, according to the principle of majority. But in the Paxos algorithm, think wechat platform is not reliable, it hangs what to do? The principle of Paxos is to be fault-tolerant, so Paxos uses mutual texting
    • Ask three other people to act as intermediaries (or choose from ten, not just three intermediaries), and ten people text them, and the intermediaries do not have to communicate with each other
  • Application phase: Each person’s text message will have a sending time, and the agent will only communicate with the proposer of the latest text message, and only with one person. Everyone frantically texted the agency, hoping to get the right to communicate
  • Communication stage: If half of the intermediary communication rights are obtained. The proposer would suggest a destination (such as Chengdu) to the agents. There were three kinds of results received;
    • A: More than half of the brokers agreed to collect the goods and go to Chengdu.
    • B: If at least one agent has decided on the destination (not necessarily Chengdu, but Lhasa as agreed by the other promoters and the agent), then see if there are more than half of the destinations. If not, choose the destination at the latest next time
    • C: Lose the right to communicate and continue texting…
  • The consistency of Paxos, which addresses redundant copies, is not the same thing as the consistency of ACID in a relational database

Raft algorithm

  • Because Paxos is difficult to understand, it is also difficult to implement. So there’s a new consensus algorithm. Raft has three roles
    • Leader: Handles all client interactions, log replication, and so on. Only one Leader is available at a time
    • Follower: similar to voters, completely passive
    • Candidate: Can be elected as a new leader

The election stage

  • At first any server is a Follwer, with a built-in countdown that becomes a Candidate when the countdown is over, making a request to other Follwers to vote for them
  • There are three states
    • A: More than half of follwers follow, becoming the new leader
    • B: There is a competitor, and it has more than half of its followers, so it gives up and becomes its follwer
    • C: There are competitors. The Candidate will launch the election again in the next election cycle. At this time, there is also a built-in countdown. The first one who finishes the countdown quickly will become the leader who will occupy half of follwer (note: the follwer that became another person in the previous round cannot run for election).

Log Replication Phase

  • 1: The Leader has been elected, and the client issues a request to add a log, such as the log is “Hello”.
  • 2: The Leader asks Followe to follow his instructions and appends this new log content to their respective logs
  • 3: After most follower servers write logs to disk files, confirm that the logs are appended successfully and issue Commited Ok
  • 4: On the next heartbeat, the Leader notifies all Follwer to update commited projects
  • If a network partition or network communication failure occurs during the process. This prevents the Leader from accessing most Follwers, and Follwers elects a new Leader to provide services outside the heap. When the network is restored, the old leader becomes the follwer of the new leader with the majority of follwers. Commit rollback during a failure

Zab algorithm

ZXID

The protocol transaction number Zxid in the design, the Zxid is a 64-bit number

  • The lower 32 bits are a simple monotonically increasing counter that increments by one for each transaction request from the client
  • The high 32 bits represent the number of the Leader cycle epoch. Every time a new Leader server is elected, it will fetch the maximum transaction ZXID from the Leader server’s local log, read the epoch value from the Leader server, and then add 1. As the new epoch. The lower 32-bit counter is recalculated from 0

Crash Recovery Mode (Election)

  • When the cluster is initialized or the Leader is disconnected, the node (any node) initiates the primary election, and the other nodes in the cluster vote for the primary election
  • Node B judgment to determine A can be A Leader, they vote for node is A node B, judgment is based on the premise that election epoch (A) > election epoch (B) | | zxid (A) > zxid (B) | | the sid (A) > sid (B). And update your vote to vote for B
  • Sid is a manually configured service ID

Message broadcast mode

  • The Leader converts the client’s request into a Proposal
  • The Leader prepares a FIFO queue for each Follower and sends the Proposal to the queue
  • If the Leader receives more than half of the ACK feedback from followers
  • The Leader sends a commit to all the followers

Some of the details

  • After receiving the client request, the Leader will encapsulate the request into a transaction and assign a globally increasing unique ID to the transaction, called transaction ID (ZXID). ZAB XI protocol needs to ensure the order of transactions, so each transaction must be sorted according to ZXID and processed
  • There is also a message queue between the Leader and Follwer to decouple them and unblock synchronization
  • To ensure that all processes in a ZooKeeper cluster can be executed in an orderly order, only the Leader server can accept write requests. Even if the Follower server receives the requests from the client, the write requests are forwarded to the Leader server for processing

3 Distributed transaction consistency

  • For distributed consistency and distributed transaction consistency. I prefer to distinguish:
  • A- Distributed consistency is to solve the problem of data distribution across multiple services being consistent in state (multiple copies being consistent)
  • B- Distributed transaction consistency, which is more like relational database consistency, constrains the relationship between data in distributed services (for example, the state of data A in service A and data B in service B need to maintain a fixed mapping relationship)

The difference between distributed consensus algorithm and distributed consistency

  • Consensus algorithms are algorithms designed to solve distributed consistency, but are not suitable for solving distributed transaction consistency (they can solve but are not suitable)

4 Implementation scheme of Distributed Transaction Consistency (XA mode and AT mode)

  • XA mode is pre-committed data mode (pre-committed data cannot be accessed by other transactions), which rolls back the pre-committed data in the event of a failure
  • AT mode data is committed for confirmation, but there is a lock that prevents the data from being accessed by other transactions. If a failure occurs, the data is repaired using a flush operation. Compared with XA mode, AT mode is more suitable to solve distributed transactions and reduce blocking wait time

Two-phase Commit (strong consistency)(XA mode)

Two-phase Commit protocol (2PC) is a commonly used distributed transaction solution. The Commit process of a transaction is divided into Two phases: preparation phase and Commit phase

Processing flow

Stage 1: Preparation stage

  • The coordinator sends transaction content to all participants, asks if the transaction can be committed, and waits for all participants to respond.
  • Each participant performs a transaction, recording undo and redo information to the transaction log (but not committing a transaction).
  • If the participant succeeds in the execution, feedback yes to the coordinator, that is, it can be submitted; If the execution fails, feedback no to the coordinator, that is, no submission

Phase 2: Commit phase

  • If the coordinator receives a failure message or timeout message from each participant, send a rollback message directly to each participant. Otherwise, send a commit message.
  • Participants perform commit or rollback operations according to the coordinator’s instructions to release lock resources used during all transactions

Disadvantages of the 2PC solution:

  • Performance problem: All participants are synchronously blocked during the transaction commit phase, occupying system resources and easily leading to performance bottlenecks
  • Reliability issues: If the coordinator has a single point of failure, the participant will remain locked if the coordinator fails
  • Data consistency problem: During the commit phase, if a local network problem occurs, some transaction participants receive the commit message and others do not, resulting in data inconsistency between nodes

Three-phase Commit (strong consistency)(XA mode)

Three-phase commit protocol is an improved version of two-phase commit protocol. Different from two-phase commit, timeout mechanism is introduced. Introduce timeouts for both the coordinator and the participant

Processing flow

Phase 1: canCommit

  • The coordinator sends a commit request to the participant, who returns yes if it can commit (the participant does not perform the transaction) or no if it does not:
  • The coordinator issues a canCommit request containing the transaction content to all participants, asking if the transaction can be committed, and waits for all participants to respond
  • After receiving a canCommit request, the participant reports yes and enters the preparatory state if it thinks the transaction can be performed, or no if it does not

Phase 2: preCommit

  • The coordinator determines whether transaction-based preCommit operations can be performed based on the response of phase 1 canCommit participants. Depending on the response, there are two possibilities
  • Case 1: All participants in phase 1 give feedback yes, participants pre-execute transactions
    • The coordinator initiates the preparation phase by issuing a preCommit request to all participants
    • After receiving a preCommit request, the participant performs a transaction, logging undo and redo information to the transaction log (but not committing the transaction)
    • Each participant feeds back an ACK or no response to the coordinator and waits for the final instruction
  • Case 2: Any participant in Stage 1 feedback no,Or, waiting for the coordinator to time out and not receiving feedback from all participants, the transaction is interrupted
    • The coordinator issues abort requests to all participants
    • The participant interrupts the transaction whether it receives an ABORT request from the coordinator or if a timeout occurs while waiting for the coordinator’s request

Stage 3: Do Commit

  • The actual transaction commit occurs in this phase, which falls into three categories
  • Case 1: Phase 2 all participants respond with an ACK response to perform the actual transaction commit
    • If the coordinator is in a working state, a DO Commit request is issued to all participants, who, upon receipt of the DO Commit request, formally performs the transaction Commit and releases the resources occupied during the entire transaction
    • Each participant feeds back the ACK completion message to the coordinator, and the coordinator completes the transaction submission after receiving the ACK message from all participants
  • Case 2: In phase 2, any participant gives no feedback, or the coordinator cannot receive feedback from all participants after timeout, i.e. the transaction is interrupted
    • If the coordinator is at work, abort requests are made to all participants, who use the undo information in phase 1 to perform a rollback and release the resources occupied during the entire transaction
    • Each participant feedback the ACK completion message to the coordinator, the coordinator receives the ACK message feedback from all participants, that is, the transaction is completed and interrupted
  • Case 3: The network between the coordinator and the participant is faulty
    • Participants continue to Commit transactions after the coordinator issues a DO Commit or ABORT request times out

The advantages and disadvantages

  • Advantages: In the second phase, the coordinator or actor interrupts the transaction after the wait times out
  • Advantage: In phase 3, the coordinator’s single point of problem is avoided and participants continue to commit transactions if the coordinator fails (also a disadvantage)
  • Disadvantages: Data inconsistency still exists. In the third stage, if the coordinator requests to interrupt the transaction, and the coordinator cannot communicate with the participant normally, the participant will continue to submit the transaction, resulting in data inconsistency

Flexible Transaction TCC (IMPLEMENTATION of XA pattern at service level)

  • In the Try phase, check and reserve resources. In the withholding scenario, all the Try does is check that the account has sufficient available balance and then freeze the account’s funds. After the Try method is executed, the account balance is still 100, but $30 of it is frozen and cannot be used by other transactions
  • Confirm phase: Deducts the funds frozen in the Try phase. After the Confirm method is implemented, the 30 yuan frozen in the first phase is deducted, and the balance of account A becomes 70 yuan
  • Cancel phase: For rollback, you need to release 30 yuan frozen in the Try phase in Cancel method, so that the account is back to its original state, and 100 yuan is available

AT mode (Alibaba Distributed Framework SEATA)

Stage one: Submission

  • In the first phase, Seata intercepts “business SQL”, parses SQL semantics, finds the business data to be updated by “Business SQL”, saves it as “Before image” before the business data is updated, and then executes “Business SQL” to update the business data. After the business data is updated, Save it as “After Image” and generate a row lock. All of the above operations are done within a single database transaction, which ensures atomicity of the one-phase operations

Two-phase commit or rollback

  • Seata framework only needs to delete the snapshot data and row locks saved in the first phase to complete data cleaning because the “business SQL” has been committed to the database in the first phase

  • In phase 2 rollback mode, Seata needs to roll back the “business SQL” executed in phase 1 to restore the business data
  • The rollback method is to use “before Image” to restore service data. However, check dirty write data before restoring the database. Compare current service data in the database with After Image. If the two data files are identical, there is no dirty write data and the service data can be restored

Event Notification (transaction message)

Synchronous notification

  • People’s habitual thinking will consider synchronous call, which is simple and easy to implement the scheme. However, compared with the third-party system, it is not reliable, internal processing timeout, network disconnection, it is easy to accident. And waiting for the interface to return is a blocking process, affecting system performance

Asynchronous callback notification

  • As opposed to synchronous notifications, its processing interface is asynchronous callback. Therefore, the problem of timeout processing and timeout return can be avoided
  • The retry mechanism needs to be added because an error occurs on the interface during the callback

The message queue

  • Message queues decouple services and solve the problem of error retries
  • The interface needs to be idempotent because it can be called repeatedly or in error
  • Consistency problems in normal message processing: the sender’s business logic processing succeeded ->MQ stored the message successfully -> but MQ processing timed out -> ACK confirmation failed -> The sender’s local transaction was rolled back, but the ACTUAL MQ processing succeeded
  • If there is a processing return result, it can also be sent back through the message queue

Transaction state table + message queue scheme

  • The core approach of the local message-based ultimate consistency scheme is to record a message data to the DB during the execution of a business operation, and the recording of the message data and the recording of the business data must be done in the same transaction
  • After recording the post-message data, a scheduled task can be performed to the DB to rotate messages with a status of waiting for delivery, and then deliver the messages to MQ. During this process, there may be a message delivery failure, in which case the retry mechanism is used to ensure that the message status is updated or the message is cleared until a successful ACK from MQ is received
  • You also need to ensure idempotent interfaces

Corrections are welcome

Refer to the article

  • One of distributed theory: popular understanding of Paxos algorithm
  • Distributed transaction consistency solution
  • 2 PCS and 3 PCS
  • Don’t understand “distributed transactions”? This article tells you clearly!
  • Distributed transactions — the ultimate message consistency scheme
  • Distributed transactions? No, final consistency
  • Four modes of distributed transactions
  • Distributed transaction Seata(2) Understand what is AT, TCC, Saga