preface

In the Internet business system, there are various ids involved, such as payment ID and refund ID in the payment system. So what are the general solutions for generating ids? Especially in complex distributed system business scenarios, it is important that we adopt the right solution for ourselves. Below we list one by one, not necessarily all suitable, these solutions are for your reference, may be useful to you.

The body of the

Distributed ID features

  • Uniqueness: Ensure that the generated ID is unique across the network.
  • Sequential increment: Ensure that the generated IDS are increments by a certain number for a user or business.
  • High availability: Ensure that ids are correctly generated at all times.
  • With time: ID contains the time, a glance at the past to know which day of the transaction.

Distributed ID generation scheme

1. UUID

The core idea of the algorithm is to combine the machine’s network card, local time, a random number to generate UUID.

  • Advantages: Local generation, simple generation, good performance, no high availability risk
  • Disadvantages: Long, redundant storage, unordered, unreadable, low query efficiency

2. The ID of the database is automatically added

Use the database id increment strategy, such as MySQL auto_INCREMENT. And you can use the two databases to set the asynchronous length, generate the policy of not repeating the ID to achieve high availability.

  • Advantages: The ids generated by the database are absolutely ordered, and the implementation of high availability is simple
  • Disadvantages: Requires independent deployment of database instances, high cost, and performance bottlenecks

3. Generate ids in batches

Multiple ids are generated in batches at a time. Each time, the database needs to be accessed, the database needs to be changed to the maximum ID value, and the current value and maximum ID value are recorded in memory.

  • Advantages: Avoids the stress of having to access the database every time an ID is generated, improving performance
  • Disadvantages: It is a local generation policy and has a single point of failure. Ids are discontinuous due to service restart

4. Redis generates an ID

All command operations in Redis are single-threaded and provide self-incrementing atom commands like incr and increby, so the generated IDS are guaranteed to be uniquely ordered.

  • Advantages: Not dependent on the database, flexible and convenient, and performance is better than the database; Numeric IDS are naturally sorted, which can be useful for paging or sorting results.

  • Disadvantages: If the system does not have Redis, it needs to introduce new components, increasing the system complexity; It takes a lot of coding and configuration.

Given the single-node performance bottleneck, Redis clusters can be used to achieve higher throughput. Suppose there are 5 Redis in a cluster. You can initialize each Redis with a value of 1, 2, 3, 4, 5, and step size of 5. The ids generated by each Redis are:

A: 1, 6, 11, 16, 21 B: 2, 7, 12, 17, 22 C: 3, 8, 13, 18, 23 D: 4, 9, 14, 19, 24 E: 5, 10, 15, 20, 25Copy the code

Random load to which machine to determine, the future is difficult to modify. The step size and initial value must be determined in advance. Using A Redis cluster can also be a way to troubleshoot a single point of failure.

In addition, it is a good idea to use Redis to generate daily serial numbers starting from 0. For example, order number = date + date increment. You can generate a Key in Redis every day and use INCR to accumulate it.

5. Twitter’s Snowflake algorithm

Twitter uses ZooKeeper to implement a global ID-generated service Snowflake: github.com/twitter/sno…

As you can see above, Twitter’s Snowflake algorithm consists of the following parts:

  • 1 bit sign bit:

Since the long type is signed in Java, the highest bit is the sign bit, the positive number is 0, and the negative number is 1, and the ids used in the actual system are generally positive numbers, the highest bit is 0.

  • 41-bit timestamp (in milliseconds) :

It is important to note that the 41-bit timestamp here does not store the current timestamp, but stores the difference between the timestamp (current timestamp – start timestamp). The start timestamp here is usually the timestamp used by the ID generator and is specified by the program. So a 41-bit millisecond timestamp can be used at most (1 << 41)/(1000x60x60x24x365) = 69 years.

  • 10-bit data machine bits:

It includes five data identification bits and five machine identification bits. The 10 bits determine that a maximum of 1 << 10 = 1024 s nodes can be deployed in a distributed system. Beyond that, the generated ids are likely to conflict.

  • Sequence in 12 bit milliseconds:

This 12-bit count supports the generation of up to 1 << 12 = 4096 ids per millisecond (same machine, same time) per node

It adds up to exactly 64 bits. It’s a Long.

  • Advantages: High performance, low latency, in time order, generally do not cause ID collision
  • Cons: Independent development and deployment, machine clock dependent

Simple implementation

public class IdWorker {
    /**
     * 起始时间戳 2017-04-01
     */
    private final long epoch = 1491004800000L;
    /** * Number of digits in the machine ID */
    private final long workerIdBits = 5L;
    /** * The number of bits of the data ID */
    private final long dataCenterIdBits = 5L;
    /** * the maximum machine ID supported. The result is 31 */
    private final long maxWorkerId = ~(-1L << workerIdBits);
    /** * The maximum data ID supported. The result is 31 */
    private final long maxDataCenterId = ~(-1 << dataCenterIdBits);
    /** * The number of bits in the id */
    private final long sequenceBits = 12L;
    /** * Move machine ID 12 bits to left */
    private final long workerIdShift = sequenceBits;
    /** * Move the data ID 17(12+5) bits to the left */
    private final long dataCenterIdShift = sequenceBits + workerIdBits;
    /** * Shift the timestamp to the left by 22(12+5+5) bits */
    private final long timestampShift = sequenceBits + workerIdBits + dataCenterIdBits;
    /** * Generates the mask of the sequence, which is 4095 (0b111111111111= 0xFFf =4095) */
    private final long sequenceMask = ~(-1L << sequenceBits);
    /** * Data ID (0 to 31) */
    private long dataCenterId;
    /** * Machine ID (0 ~ 31) */
    private long workerId;
    /** * Sequence in milliseconds (0 ~ 4095) */
    private long sequence;
    /** * The last time the ID was generated */
    private long lastTimestamp = -1L;

    public IdWorker(long dataCenterId, long workerId) {
        if (dataCenterId > maxDataCenterId || dataCenterId < 0) {
            throw new IllegalArgumentException(String.format("dataCenterId can't be greater than %d or less than 0", maxDataCenterId));
        }
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
        }
        this.dataCenterId = dataCenterId;
        this.workerId = workerId;
    }

    /** * get the next ID (this method is thread safe) *@return snowflakeId
     */
    public synchronized long nextId(a) {
        long timestamp = timeGen();
        // If the current time is less than the timestamp generated by the last ID, the system clock has gone back, and this time should throw an exception
        if (timestamp < lastTimestamp) {
            throw new RuntimeException(String.format("Clock moved backwards. Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
        }
        // If they are generated at the same time, the sequence in milliseconds is performed
        if (timestamp == lastTimestamp) {
            sequence = (sequence + 1) & sequenceMask;
            // Sequence overflow in milliseconds
            if (sequence == 0) {
                // Block until the next millisecond to get the new timestamptimestamp = nextMillis(lastTimestamp); }}else {// The timestamp changes and the sequence resets within milliseconds
            sequence = 0L;
        }
        lastTimestamp = timestamp;
        // Shift and put together a 64-bit ID by bitwise or operation
        return ((timestamp - epoch) << timestampShift) |
                (dataCenterId << dataCenterIdShift) |
                (workerId << workerIdShift) |
                sequence;
    }

    /** * Returns the current time in milliseconds@returnCurrent time (ms) */
    protected long timeGen(a) {
        return System.currentTimeMillis();
    }

    /** * block until the next millisecond until a new timestamp is obtained *@paramLastTimestamp Timestamp of the last ID generation *@returnCurrent timestamp */
    protected long nextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = lastTimestamp;
        }
        returntimestamp; }}Copy the code

6. Baidu UidGenerator

UidGenerator is baidu’s open source distributed ID generator, based on the snowflake algorithm implementation, looks like it is ok. However, domestic open source project maintainability is really worried.

Specific can refer to the official website: github.com/baidu/uid-g…

7. Meituan Leaf

Leaf is an open source distributed ID generator of Meituan, which can ensure global uniqueness, increasing trend, increasing monotonicity and information security. It also mentions the comparison of several distributed schemes, but it also needs to rely on middleware such as relational database and Zookeeper.

Specific can refer to the official website: tech.meituan.com/MT_Leaf.htm…

summary

This article shares several common solutions for global ID generation services, and compares the pros and cons and application scenarios. In practical work, we can combine their own business and system architecture system for reasonable selection.

Welcome to scan the code to pay attention to the public number: one technology Stack

This account will continue to share backend technology essentials, including virtual machine fundamentals, multi-threaded programming, high-performance frameworks, asynchronous, cache and messaging middleware, distributed and microservices, architecture learning and advanced learning materials and articles.