Welcome to search and follow the wechat official account of the same name [Coder’s Technical Road]. The background provides historical articles to be compiled and downloaded in PDF. Welcome to collect and discuss


The background,

As the open source long transaction solution of Ali, Saga involves the generation and concatenation of global transaction IDS, and the stability and global uniqueness of transaction IDS need to be guaranteed.

Second, the principle of

Twitter’s open-source Snowflake algorithm.

Saga’s globally unique ID generation algorithm is also derived from Snowflake. As one of the most popular distributed ID generation algorithms, in fact, in countless open source code have seen his figure. Sharding- JDBC, for example, is also used to generate distributed ids when working with the database and table.

As can be seen from the figure above, the snowflake algorithm is composed of four parts: sign bit + 41-bit timestamp + 10-bit machine code + 12-bit serial number.

What are the benefits of this combination? This will be seen from the specific use of distributed ID requirements:

  • Globally unique: In a distributed deployment environment, the ids of different machines on the same machine cannot be the same.

  • Data security: Consider using discontinuous ids to hide the production status if it involves requests such as order number class IDS.

Snowflake uses a timestamp + machine code to ensure that different machines have different IDS; Use timestamp + serial number to ensure that the ID on the same machine is unique.

We can also see from its combination that the guarantee of uniqueness of the algorithm is strongly dependent on the machine clock, and once the clock is turned back, the function will be abnormal.

Snowflake itself puts more emphasis on the fact that there are no dependencies, and that it’s all local. Of course, in addition to the non-dependent strategy, there are semi-dependent and purely dependent approaches. In some business scenarios, increment (or at least trend increment) is required to determine the sequence of services. This requires the use of additional components, such as the mysql batch number cache strategy. We’ll talk about that in the last part of the application.

Three, source code analysis

First, you need to define the constants required for the ID combination, such as the number of bits per segment

/* Start time (2020-05-03) */
private final long twepoch = 1607529600000L;

/* The number of bits in machine code = 10 */
private final long workerIdBits = 10L;

/* Maximum machine code supported = 1023, which is equivalent to ~(-1l << 10L) */ 
private final long maxWorkerId = -1L ^ (-1L << workerIdBits);

/* The number of bits of the serial number */
private final long sequenceBits = 12L;

/* Machine code needs to be */ to the left of the serial number
private final long workerIdShift = sequenceBits;

/* The start position of the timestamp is */ to the left of the serial number and machine code
private final long timestampLeftShift = sequenceBits + workerIdBits;

/* Maximum number of serial numbers */
private final long sequenceMask = -1L ^ (-1L << sequenceBits);
Copy the code

Then, define the meaning fields for the three bit segments

/* Machine code (0 ~ 1023) */
private long workerId;

/* Serial number (0 to 4095) */
private long sequence = 0L;

/* Last timestamp */
private long lastTimestamp = -1L;
Copy the code

Next comes the core of producing distributed ids

  • Build machine code. Machine code is related to the IP address of the machine where the service is located, and generally does not change after initialization:
InetAddress address;
try {
    // Obtain the local IP address
    address = InetAddress.getLocalHost();
} catch (final UnknownHostException e) {
    throw new IllegalStateException("Cannot get LocalHost InetAddress, please check your network!",e);
}
byte[] ipAddressByteArray = address.getAddress();
// Machine code requires 10 bits in total, so take the last 2 bits from the second to last segment + all 8 bits from the first to last segment
return ((ipAddressByteArray[ipAddressByteArray.length - 2] & 0B11) << Byte.SIZE) + (ipAddressByteArray[ipAddressByteArray.length - 1] & 0xFF);
Copy the code
  • Get the next ID
// The current timestamp
long timestamp = System.currentTimeMillis()
// If the current timestamp is < the last timestamp recorded, a clock rollback may occur and an abnormal interrupt may occur
if (timestamp < lastTimestamp) {
    throw new RuntimeException(String.format(
        "clock moved backwards. Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
}
// If current timestamp == last timestamp recorded
if (lastTimestamp == timestamp) {
    // Calculate the next sequence number, where &4095 is used as a loop because by 4096 it becomes 0
    sequence = (sequence + 1) & sequenceMask;
    // If the next sequence number ==0, the sequence number in the same timestamp is used up, set the next timestamp as the last timestamp
    if (sequence == 0) {
        // This method loops internally until the current time > the last timetimestamp = tilNextMillis(lastTimestamp); }}else {
    sequence = 0L;
}
lastTimestamp = timestamp;
/ / timestamp and specify the time difference left 12 + 10 | | machine code left 12 serial number
return ((timestamp - twepoch) << timestampLeftShift) | (workerId << workerIdShift) | sequence;
Copy the code

Fourth, using

Because the distributed IDS generated here in Saga are only to ensure that the IDS used to concatenate distributed transactions are unique. So, using a no-dependency strategy is perfectly adequate. This method can be used directly if there is a demand for such serial business in the business.

However, a large part of business scenarios require distributed IDS to meet specific business requirements, such as incremental messages, sorting messages, and ID increment to ensure efficiency when b-Tree index is used for storage.

Meituan’s Leaf-Snowflake program adopts the semi-dependent method and the method of relying on ZK to issue the number:

(source: tech.meituan.com/2017/04/21/…).

Ali uses the order number internally, because it involves LDC logical unit deployment, it involves flexible deployment in the big push, it involves data version, business labeling, etc., the implementation scheme is a little more stringent, but the principle is also an extension of Snowflake, from the original three components to multiple components. For example, the LDC route value is denoted by the fixed location and fixed length bit, the elasticity is denoted by another N bit, and the concurrency is denoted by N bit. Therefore, it is basically an all-dependent strategy.

As long as you understand the internal logic of realization, you can combine their own business to make suitable transformation and optimization.

Copyright notice: This article was originally written by Coder’s Path to Technology. Please contact the author for reprint.