preface

Before I put forward a question in the public number, how to ensure the uniqueness of the distributed system ID, today I will give you the answer, if there is any supplement or question can go to my public number [6 Xi Xuan] message, see it will reply as soon as possible.

System unique ID is a problem we often encounter when designing a system, and we often struggle with this problem. There are many ways to generate ids, adapting to different scenarios, requirements, and performance requirements. So some of the more complex systems have multiple policies for ID generation. Here are some common ID generation strategies.

The body of the

A database grows its own sequence or field

The most common way. Use the database, the whole database is unique.

Advantages:

1) Simple, convenient code, acceptable performance; 2) Numeric ID natural sorting, for paging or need to sort the result is very helpful.

Disadvantages:

1) Different database syntax and implementation is different, database migration or multiple database versions support when need to deal with; 2) Only one master library can be generated in the case of a single database or read/write separation or one master with many slaves. Risk of single point of failure; 3) It is difficult to extend when the performance is not up to the requirements; 4) If you encounter multiple systems that need to merge or involve data migration, it will be quite painful; 5) There will be trouble when dividing tables and libraries.

Optimization scheme:

1) For the Master library single point, if there are multiple Master libraries, then each Master library set the starting number is different, the step size is the same, can be the number of masters. For example: Master1 generates 1,4,7,10; Master2 generates 2,5,8,11; Master3 generates 3,6,9,12. This effectively generates unique ids in the cluster and greatly reduces the load of ID generation to the database.

UUID

Common way. It can be either database or procedural generation, generally globally unique.

Advantages:

1) Simple, convenient code. 2) The performance of ID generation is very good, and there is basically no performance problem. 3) Unique in the world, it can handle data migration, system data merge, or database change.

Disadvantages:

1) Without sorting, trend increasing cannot be guaranteed; 2) UUID is usually stored in a string, so the query efficiency is low; 3) The storage space is relatively large. If it is a massive database, the storage capacity needs to be considered. 4) Large amount of data is transmitted; 5) Unreadable.

UUID variant,

1) To resolve unreadable UUID, use UUID to Int64.

/// <summary>
// get a unique sequence of numbers according to the GUID
/// </summary>
public static long GuidToInt64(a)
{
    byte[] bytes = Guid.NewGuid().ToByteArray();
    return BitConverter.ToInt64(bytes, 0);
}
Copy the code

2) To solve the problem of unordered UUID, NHibernate provides Comb algorithm (Combined GUID/TIMESTAMP) in its primary key generation method. Keep 10 bytes of the GUID and use another 6 bytes to represent the time (DateTime) at which the GUID was generated.

/// <summary> 
/// Generate a new <see cref="Guid"/> using the comb algorithm. 
/// </summary> 
private Guid GenerateComb(a)
{
    byte[] guidArray = Guid.NewGuid().ToByteArray();
 
    DateTime baseDate = new DateTime(1900.1.1);
    DateTime now = DateTime.Now;
 
    // Get the days and milliseconds which will be used to build    
    //the byte string    
    TimeSpan days = new TimeSpan(now.Ticks - baseDate.Ticks);
    TimeSpan msecs = now.TimeOfDay;
 
    // Convert to a byte array        
    // Note that SQL Server is accurate to 1/300th of a    
    // millisecond so we split by 3.333333
    byte[] daysArray = BitConverter.GetBytes(days.Days);
    byte[] msecsArray = BitConverter.GetBytes((long)
      (msecs.TotalMilliseconds / 3.333333));
 
    // Reverse the bytes to match SQL Servers ordering    
    Array.Reverse(daysArray);
    Array.Reverse(msecsArray);
 
    // Copy the bytes into the guid    
    Array.Copy(daysArray, daysArray.Length - 2, guidArray,
      guidArray.Length - 6.2);
    Array.Copy(msecsArray, msecsArray.Length - 4, guidArray,
      guidArray.Length - 4.4);
 
    return new Guid(guidArray);
}
Copy the code

For comparison, the first three strings are the result of COMB’s algorithm, the last 12 strings are in chronological order (3 UUID generated uniformly in milliseconds), and if generated again at a later time, the 12 strings will be larger than shown. The next three are directly generated GUids.

If you want to put the chronological order first, you can change the position of the 12 strings after generation, or you can change the last two array.copy of the algorithm class.

Redis generated ID

When using a database to generate ids is not performance enough, we can try using Redis to generate ids. This depends on Redis being single-threaded, so it is possible to generate globally unique ids. This can be done using Redis’s atomic operations INCR and INCRBY.

Redis clustering can be used for higher throughput. Suppose there are five Redis in a cluster. You can initialize each Redis with a value of 1, 2, 3, 4, 5, and then step size of 5. The ID generated by each Redis is:

A: 1,6,11,16,21

B: 2,7,12,17,22

C: 3,8,13,18,23

D: 4,9,14,19,24

E: 5,10,15,20,25

This, arbitrary load to which machine is determined, difficult to modify in the future. However, 3-5 servers can basically meet the requirements, and can obtain different ids. However, step sizes and initial values must be required in advance, and Redis clustering can also prevent single points of failure.

In addition, it is good to use Redis to generate daily serial numbers starting from 0. For example, order number = date + daily growth number. You can generate a Key every day in Redis and use INCR to accumulate.

Advantages:

1) Independent of database, flexible and convenient, and better performance than database. 2) Numeric ID natural sorting, for paging or need to sort the result is very helpful.

Disadvantages:

1) If there is no Redis in the system, new components need to be introduced to increase the complexity of the system. 2) The workload of coding and configuration is relatively large.

Twitter’s Snowflake algorithm

Snowflake is Twitter’s open source distributed ID generation algorithm that results in a long ID. The idea is to use 41bits as the number of milliseconds, 10 bits as the machine ID (5 bits for the data center, 5 bits for the machine ID), 12 bits as the serial number within milliseconds (meaning each node can generate 4096 ids per millisecond), and finally a symbol bit, always 0. Specific implementation code can refer to https://github.com/twitter/snowflake.

The C# code is as follows:

/// <summary> /// From: https://github.com/twitter/snowflake /// An object that generates IDs. /// This is broken into a separate class in case /// we ever want to support multiple worker threads /// per process /// </summary> public class IdWorker { private long workerId; private long datacenterId; private long sequence = 0L; private static long twepoch = 1288834974657L; private static long workerIdBits = 5L; private static long datacenterIdBits = 5L; private static long maxWorkerId = -1L ^ (-1L << (int)workerIdBits); private static long maxDatacenterId = -1L ^ (-1L << (int)datacenterIdBits); private static long sequenceBits = 12L; private long workerIdShift = sequenceBits; private long datacenterIdShift = sequenceBits + workerIdBits; private long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits; private long sequenceMask = -1L ^ (-1L << (int)sequenceBits); private long lastTimestamp = -1L; private static object syncRoot = new object(); public IdWorker(long workerId, long datacenterId) { // sanity check for workerId if (workerId > maxWorkerId || workerId < 0) { throw new ArgumentException(string.Format("worker Id can't be greater than %d or less than 0", maxWorkerId)); } if (datacenterId > maxDatacenterId || datacenterId < 0) { throw new ArgumentException(string.Format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId)); } this.workerId = workerId; this.datacenterId = datacenterId; } public long nextId() { lock (syncRoot) { long timestamp = timeGen(); if (timestamp < lastTimestamp) { throw new ApplicationException(string.Format("Clock moved backwards. Refusing to generate id for %d milliseconds", lastTimestamp - timestamp)); } if (lastTimestamp == timestamp) { sequence = (sequence + 1) & sequenceMask; if (sequence == 0) { timestamp = tilNextMillis(lastTimestamp); } } else { sequence = 0L; } lastTimestamp = timestamp; return ((timestamp - twepoch) << (int)timestampLeftShift) | (datacenterId << (int)datacenterIdShift) | (workerId << (int)workerIdShift) | sequence; } } protected long tilNextMillis(long lastTimestamp) { long timestamp = timeGen(); while (timestamp <= lastTimestamp) { timestamp = timeGen(); } return timestamp; } protected long timeGen() { return (long)(DateTime.UtcNow - new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc)).TotalMilliseconds; }}Copy the code

The test code is as follows:

private static void TestIdWorker()
        {
            HashSet<long> set = new HashSet<long>();
            IdWorker idWorker1 = new IdWorker(0, 0);
            IdWorker idWorker2 = new IdWorker(1, 0);
            Thread t1 = new Thread(() => DoTestIdWoker(idWorker1, set));
            Thread t2 = new Thread(() => DoTestIdWoker(idWorker2, set));
            t1.IsBackground = true;
            t2.IsBackground = true;

            t1.Start();
            t2.Start();
            try
            {
                Thread.Sleep(30000);
                t1.Abort();
                t2.Abort();
            }
            catch (Exception e)
            {
            }

            Console.WriteLine("done");
        }

        private static void DoTestIdWoker(IdWorker idWorker, HashSet<long> set)
        {
            while (true)
            {
                long id = idWorker.nextId();
                if (!set.Add(id))
                {
                    Console.WriteLine("duplicate:" + id);
                }

                Thread.Sleep(1);
            }
        }
Copy the code

The Snowflake algorithm can be modified according to the needs of its own project. Such as estimating the number of data centers in the future, the number of machines per data center, and the number of concurrent applications in a single millisecond to adjust the number of bits needed in the algorithm.

Advantages:

1) Independent of database, flexible and convenient, and better performance than database. 2) THE ID is incremented by time on a single machine.

Disadvantages:

1) It is incrementing on a single machine, but due to the distributed environment, it is impossible for clocks on each machine to be completely synchronized, and sometimes it may not be globally incrementing.

Generate unique ids using ZooKeeper

Zookeeper mainly generates serial numbers based on its ZNode data version. It can generate 32-bit or 64-bit data version numbers. The client can use this version number as the unique serial number. Zookeeper is rarely used to generate unique ids. This is mainly due to the need to rely on ZooKeeper and use multi-step APIS. In the case of high competition, you need to consider using distributed locks. Therefore, performance is not ideal in a distributed environment with high concurrency.

MongoDB的ObjectId

MongoDB’s ObjectId algorithm is similar to snowflake’s. It is designed to be lightweight and can be easily generated by different machines in the same way that is globally unique. MongoDB was designed from the start as a distributed database, and handling multiple nodes was a core requirement. Make it much easier to generate in a sharding environment.

The format is as follows:

The first four bytes are the timestamp from the standard era in seconds. The timestamp, combined with the subsequent five bytes, provides second-level uniqueness. Because the timestamp comes first, this means the ObjectId will be arranged roughly in the order in which they are inserted. This is useful for some purposes, such as using it as an index to improve efficiency. These four bytes also imply when the document was created. Most client libraries will expose a method to get this information from ObjectId. The next three bytes are the unique identifier of the host. Usually the hash value of the machine hostname. This ensures that different hosts generate different objectiDs without conflicts. To ensure that the ObjectId is unique from multiple concurrent processes on the same machine, the next two bytes come from the process identifier (PID) that generated the ObjectId. The first nine bytes ensure that the ObjectId generated by different processes on different machines in the same second is unique. The last three bytes are an automatically incrementing counter that ensures that the ObjectId generated by the same process will be different in one second. Each process can have a maximum of 2563 (16 777 216) different objectiDs in one second.

The source code can be downloaded from the official MongoDB website.

By the way

There is a problem? Can you leave me a message or chat privately? Just give it a thumbs up

Of course, you can also go to my official account “6 Xi Xuan”,

Reply to “Learn” and receive a copy of the Video tutorial for Advanced Architects for Java Engineers

Answer “interview”, can obtain:

MySQL brain Map MySQL brain map

The sunrise hin I am trained programmers, PHP, Android and hardware are done, but in the end or choose to focus on Java, so have what questions to ask the public for discussion (emotional pouring technology can ha ha ha), see words will reply as soon as possible, hope can with everyone common learning progress, about the server architecture, Java core knowledge analysis, career, interview summary and other articles will be pushed irregularly output, welcome to pay attention to ~~~