Make writing a habit together! This is the 9th day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.

1. Snowflake

Twitter’s Snowflake algorithm is an algorithm that incrementally increases ids in distributed systems. Ids can be generated in chronological order and can be globally unique. Twitter needs a Snowflake algorithm:

performance

  • Each process has at least 10K ids per second
  • Response rate 2ms(including network latency)

coordinate

For high availability within and across data centers, the machines that generate ids do not need cluster coordination. This means that there is no need for each service to communicate and coordinate.

Direct sequencing

Sort without loading the entire object ID (timestamp)

compact

The generated ids need to be compact, in other words, the length of the ids needs to be appropriate to fulfill the business needs.

High availability

The ID generation service must be highly available, for example, storage service

Tips: Instructions for Twitter’s Snowflake algorithm github.com/twitter-arc…

1.1 Data structure of snowflake algorithm

The snowflake algorithm produces an ID that is eight bytes 64-bit, the length of the long integer long.

  • The first bit represents the sign bit, and the generated ID is all positive so the highest bit is 0

  • Timestamp (41bit), millisecond level timestamp. But the actual timestamps used during development use the difference of the timestamps. This difference = the current timestamp – the fixed timestamp set by the developer, so the 41-bit timestamp will last 69 years

    (1L << 41)/(1000L * 60 * 60 * 24 * 365) calculated almost 69 years

  • Machine ID(10bit). A total of 1024 machines can be configured. If multiple machine rooms are configured on 10bit, the machine room and machine can be combined

  • Serial number (12bit), each machine 1ms can generate 4096(if a machine generates more than 4096 in a millisecond needs to be protected)

1.2 System clock Dependency

NTP should be used to keep the system clock accurate. Snowflake protects against the effects of non-monotonic clocks, which run backwards. If your clock is running fast and NTP tells it to repeat for a few milliseconds, Snowflake will refuse to generate the ID until some time after the last time we did. Run in a mode where NTP does not rewind the clock.

If the time is called back, the generated ID is likely to be duplicated.

2. Java implementation of snowflake algorithm

/ * * *@author mxsm
 * @date 2022/4/9 21:17
 * @Since1.0.0 * /
public class SnowflakeGenerator {

    private static final long FIXED_TIMESTAMP = 1649491204306L;

    private int machineId;

    private int sequenceNumber = 0;

    // The last time the ID was generated
    private volatile long lastTimestamp = -1L;


    public SnowflakeGenerator(int machineId) {
        this.machineId = machineId;
    }

    public synchronized long nextId(a) {

        // Get the current time
        long currentTimestamp = System.currentTimeMillis();

        // Generate ids in the same millisecond
        if(currentTimestamp == lastTimestamp){
            sequenceNumber += 1;
            // Process more than 4096 items per second
            if(sequenceNumber > 4096) {while (currentTimestamp <= lastTimestamp){
                    currentTimestamp = System.currentTimeMillis();
                }
                sequenceNumber = 0; }}else {
            // Reset the serial number
            sequenceNumber = 0;
        }
        lastTimestamp = currentTimestamp;

        return ( (currentTimestamp - FIXED_TIMESTAMP) << 22) | (machineId << 12) | sequenceNumber; }}Copy the code

Tips: Code address github.com/mxsm/distri…

The above code is a simple implementation.

3. The advantages and disadvantages

Advantages:

  • There is no coordination between ID generation services and services. Work alone.
  • Local generation in ID generation service is not as efficient as network consumption, high performance and high availability: the generation does not depend on the database, completely in memory generation
  • High throughput: Generate millions of autoincrement ids per second

Disadvantages:

  • Depending on the consistency with the system time, if the system time is called back or changed, ID conflicts or duplicate may occur.

Tips: Instead of deploying the Snowflake algorithm on a separate production server at my company, generators are integrated directly into the code for each project. The machine ID is the value of the IP address mod 32. Therefore, there may be a small probability of repetition under high concurrency. Within permissible limits

4. To summarize

The service cluster of Snowflake algorithm has no coordination and synchronization among services. It can be said to be a high availability distributed ID available cluster composed of single machines. The Snowflake algorithm relies on time stamps. If the timestamp is dialed back, the ID may be duplicated. In general, snowflake algorithm is better than Redis to achieve UUID and MySQL.

I am ant back elephant, the article is helpful to you like to pay attention to me, the article has incorrect place please give correct comments ~ thank you