preface

Interviewer: Young man, do you remember me? I’m the interviewer who interviewed you last time.

I thought to myself: I go, how can not remember, I am not youth dementia, the last time I draw so many pictures, also hard to knock a more than a clock of the computer, full of mind is your shadow.

I: Yes, I do. Hello, I am very glad to continue to communicate with you on technical issues through the second interview.

Is it really okay for me to say this against my conscience, just for once, when it’s so hard to get a face to face?

The interviewer took a quick look at my resume. It may have been more than a week since the last time I read it. I guess he forgot my resume.

Interviewer: I see on your resume that you have a deep understanding of distributed and have worked on distributed projects. That’s good. Do you know some methods to generate distributed ids in distributed projects?

I: I know this, the generation of distributed Id methods are mainly as follows:

  1. ID of the database.
  2. Database horizontal split, set the initial value and the same increment size.
  3. Apply for self-adding ids in batches.
  4. UUID generated.
  5. The Redis way.
  6. Snowflake algorithm.
  7. Baidu UidGenerator algorithm
  8. Meituan Leaf algorithm

Interviewer: Oh, that’s great. Can you give me your analysis and understanding of each of these ways?

I thought to myself: I go, this can be embarrassing big, so much, I just probably know the main, how can each kind of understanding and in-depth, all of a sudden said so much is not to dig a hole for yourself?

Ah, can’t come out mix, always want return of, can say oneself know only? Don’t know probably rough over.

ID of the database

Me: Uh-huh, ok. When creating a table, you can specify the primary key auto_increment.

I: but using the database increment ID, although simple, will bring the problem of ID duplication, and stand-alone version of the ID increment, and each generation of an ID will access the database once, DB pressure is also very big, and there is no concurrency performance at all.

Interviewer: HMM.

I saw that the interviewer was listening to me, and sometimes TOUCHED his sparse hair and forehead. His deep eyes revealed his calmness, which might be the charm of a mature architect. Effectice Java Concurrent Programming Combat Code Refactoring: Improve the design of existing code…… I struck the iron while it was hot, and then the following answer.

Database horizontal split, set the initial value and the same increment size

I: In view of the above database auto-increasing ID problems: ID duplication, poor performance. Here comes a clustered version of the distributed ID generation scheme. Split the database horizontally, set the initial value and the same increment step, and apply for increment IDS in batches.

I: “database split horizontally, set the initial value and the same increment step” refers to in the DB cluster environment, the database is divided horizontally, and then each database set “different initial value” and “the same step”, so as to avoid ID duplication situation.

Interviewer: Sorry to interrupt, young man, but can you draw a picture? I don’t understand what you mean.

I can have what method, completely have no way, can only take out pen and paper from the trouser pocket, quickly drew a picture.

I: Let’s assume that there are three databases and set the initial value for each database. The initial value can be set by using the following SQL:

set@@auto_increment_offset = 1; // Set the initial valueset@@auto_increment_increment = 2; // Set the step sizeCopy the code

I: The initial values of the three data are set to 1, 2 and 3 respectively. Generally, the step size is set to the data of the database. Here, the number of databases is 3, so the step size is also set to 3.

Interviewer: What if you have to expand again?

I: yeah well, the situation of the expansion is a disadvantage of this approach, the above I said the number of the step length of general Settings for the database, it is in the case of, later will not be expanded, if sure late will have expanded case, in the early stage of the design can set the step length longer, some initial value for subsequent expansion using “reserved”.

I: in short, this scheme or advantages and disadvantages, but also has its own advantages and disadvantages is: “late may face no ID initial value can be divided into the dilemma, the database is the database, anti high concurrency is limited”.

I: its advantage is to be regarded as solved “DB single point problem”.

Interviewer: HMM.

Apply for self-adding ids in batches

I: “batch application self-increasing ID” solution can solve the problem of no ID can be divided, its principle is to assign a batch of ID value to the corresponding database for consumption, use up, and then come back to apply.

This time I consciously took pen and paper out of my trouser pocket and drew the following picture. History is always so strikingly similar.

I: In the initial stage of design, we can design a table with initial value field and step field. When we want to apply for batch ID every time, we can apply in this table. After each application, “initial value = initial value of last time + step size”.

I: in this way, we can keep the initial value is the maximum value of each application ID, avoid ID duplication, and each time there will be an ID, a batch of IDS will be generated to use, so that the number of access to the database greatly reduced.

Me: But this one kind of scheme still has its own shortcoming, still cannot resist the high concurrency on true meaning.

UUID generated

Me: The fourth way is to use the “UUID generation” method to generate distributed IDS, the core idea of UUID is to use “machine network card, local time, a random number” to generate UUID.

Me: Using UUID method just need to call UUID.randomuuid ().toString() can be generated, this way is convenient and simple, local generation, no network consumption.

Me: At that time, simple things, there will be more problems, not good for storage, 16 bytes 128 bits, usually in a 36-bit length string, a lot of scenarios are not suitable.

I: And the unordered string generated by UUID is inefficient to query, has no actual business meaning, and does not have the auto-increment feature, so UUID is not used as the distributed ID.

Interviewer: Well, do you know how many ways to generate a UUID? It’s okay if you don’t know, this is just an extension.

I: this can be generated by “current timestamp and machine MAC address”, which can ensure that the generated UUID is unique globally.

Interviewer: Yeah, that’s ok.

Redis way

I: In order to solve the above pure relational database to generate distributed ID cannot resist high concurrency problem, we can use the Redis method to generate distributed ID.

Me: Redis itself has increby commands like incr and increby to ensure atomicity, and the generated IDS are also ordered.

Me: Redis is based on memory operation, high performance, does not depend on the database, the data is naturally ordered, conducive to paging and sorting.

Me: But this solution will also have its own disadvantages, because the increase of middleware, need to code their own implementation workload increased, increase complexity.

If you use Redis, you need to consider persistence. There are two types of persistence: “RDB” and “AOF”. “RDB is in the form of snapshot, and data will be lost since the last snapshot.

I: “AOF can be set to persist once a second, the lost data is within seconds”, there may be a second after the last increment of the ID is not persistent problem.

Me: But this approach is much better than the above method of generating distributed ids for relational databases.

If the amount of data is large, it will take a long time to restart Redis, so we can use Redis cluster mode.

Interviewer: Can you hand-write the Redis utility class that generates distributed ids?

I broke down, I am most afraid of handwriting, because tools of this kind of things, basically is the beginning of the project to write once, after the future market to reuse, remember, but also handwritten, this is too difficult for me to fear tiger.

Me: write by hand should not work, because some API can not remember, tool class is basically at the beginning of the project to write some, follow-up have not gone to see, not specifically to remember it.

Me: May I use your computer? You should be able to type out these tool classes using a computer.

Interviewer: Yes, here is the computer for you. You are under the test.

Me: Ok, thanks.

As time goes by……..

After a few minutes of typing, it took a lot of effort, but I finally got it out. I couldn’t remember many apis, so I had to find them slowly. I wrote two main ways to generate distributed ids.

The first is to use the RedisAtomicLong atomic class to generate the ID using the CAS operation.

@Service
public class RedisSequenceFactory {
    @Autowired
    RedisTemplate<String, String> redisTemplate;

 public void setSeq(String key, int value, Date expireTime) {  RedisAtomicLong counter = new RedisAtomicLong(key, redisTemplate.getConnectionFactory());  counter.set(value);  counter.expireAt(expireTime);  }   public void setSeq(String key, int value, long timeout, TimeUnit unit) {  RedisAtomicLong counter = new RedisAtomicLong(key, redisTemplate.getConnectionFactory());  counter.set(value);  counter.expire(timeout, unit);  }   public long generate(String key) {  RedisAtomicLong counter = new RedisAtomicLong(key, redisTemplate.getConnectionFactory());  return counter.incrementAndGet();  }   public long incr(String key, Date expireTime) {  RedisAtomicLong counter = new RedisAtomicLong(key, redisTemplate.getConnectionFactory());  counter.expireAt(expireTime);  return counter.incrementAndGet();  }   public long incr(String key, int increment) {  RedisAtomicLong counter = new RedisAtomicLong(key, redisTemplate.getConnectionFactory());  return counter.addAndGet(increment);  }   public long incr(String key, int increment, Date expireTime) {  RedisAtomicLong counter = new RedisAtomicLong(key, redisTemplate.getConnectionFactory());  counter.expireAt(expireTime);  return counter.addAndGet(increment);  } } Copy the code

The second is to use redistemplate-opsforhash () in combination with a UUID to generate the generated ID.

public Long getSeq(String key,String hashKey,Long delta) throws BusinessException{
        try {
            if (null == delta) {
                delta=1L;
            }
 return redisTemplate.opsForHash().increment(key, hashKey, delta); } catch (Exception e) {// Use uUID if redis is down int first = new Random(10).nextInt(8) + 1;  int randNo=UUID.randomUUID().toString().hashCode();  if (randNo < 0) {  randNo=-randNo;  }  return Long.valueOf(first + String.format("%16d", randNo));  }  } Copy the code

I moved the computer back to the interviewer, who quickly scanned my code and said something.

Interviewer: Young man, it’s a bad habit not to write notes.

Me: Oh, thanks for reminding, sorry, I will pay attention next time.

Snowflakes algorithm

Me: The sixth way is “snowflake algorithm”, which is also a popular method to generate distributed ID in the market.

Then I knew that drawing was indispensable, so I drew on the table. The interviewer looked at me curiously, knew what I was doing, and waited patiently.

I: he uses the 64-bit as the ID generation type, and divides the 64-bit into several segments, as shown below.

I handed him my drawing, then explained it to him.

I: the first bit is used as the identifier bit, because in Java, the time symbol of type long, because the ID bit is positive, so the first bit is 0.

Me: The next 41 bits are time stamps, millisecond bits. Note that the timestamp here is not the current timestamp, but the difference between values (” current time – start time “).

Me: The start time here generally refers to the start time of the ID generator, which is specified by our program.

Me: The next 10 bits include a 5-bit datacenterId and a 5-bit workerId. A maximum of 1024 nodes can be identified (1<<10=1024).

Me: The most 12 bits are serial numbers, and the 12-bit counting sequence supports 4096 serial numbers per millisecond per node (1<<12=4096).

Snowflake algorithm uses data center ID and machine ID as identification, does not produce duplicate ID, and is generated locally, does not consume network, high efficiency, data display, can generate 260,000 IDS per second.

I: But the snowflake algorithm also has its own disadvantages, because the calculation of the snowflake algorithm depends on time, if the system time is called back, it will produce the case of repeated ID.

Interviewer: Do you have a good solution for the situation where time callbacks produce duplicate IDS?

I: In the implementation of the Snowflake algorithm, if the lead time is equal to the current time, an exception is thrown, and time callback can also be turned off.

Me: If the callback time is short, the ID can be regenerated after the callback time expires.

Interviewer: Can you help me type a snowflake algorithm? I’ll give you this keyboard.

Me:…

Me: Ok.

As time goes by……

After a few minutes, also finally is the snowflake algorithm to knock out, really want to die, face a try how so difficult?

/ * ** Snowflake algorithm* @author: Lidu* /public class SnowflakeIdWorker {
 /** Start time cut-off */ private final long twepoch = 1530051700000L;  /** The number of bits in the machine ID */ private final long workerIdBits = 5L;  /** The number of bits of the data id */ private final long datacenterIdBits = 5L;  /** Max machine ID, result is 31 */ private final long maxWorkerId = -1L ^ (-1L << workerIdBits);  /** The maximum data id is 31 */ private final long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);  /** The number of bits in the sequence */ private final long sequenceBits = 12L;  /** Move the machine ID to the left by 12 bits */ private final long workerIdShift = sequenceBits;  /** Move the data id 17 bits to the left */ private final long datacenterIdShift = sequenceBits + workerIdBits;  /** Shift the time slice to the left by 22 bits */ private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;  /** Generate sequence mask */ private final long sequenceMask = -1L ^ (-1L << sequenceBits);  /** ID of machine (0~31) */ private long workerId;  /** Data center ID(0 to 31) */ private long datacenterId;  /** Sequence in milliseconds (0~4095) */ private long sequence = 0L;  /** The last time the ID was generated */ private long lastTimestamp = -1L;  / * ** Constructors* @param workerId work ID (0~31)* @param datacenterId datacenterId (0~31)* / public SnowflakeIdWorker(long workerId, long datacenterId) {  if (workerId > maxWorkerId || workerId < 0) {  throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));  }  if (datacenterId > maxDatacenterId || datacenterId < 0) {  throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));  }  this.workerId = workerId;  this.datacenterId = datacenterId;  }  / * ** Get the next ID (this method is thread safe) * @return SnowflakeId * / public synchronized long nextId() {  long timestamp = getCurrentTime();  // If the current time is less than the last generated timestamp, the system clock has been rolled back if (timestamp < lastTimestamp) {  throw new BusinessionException("Callback time is:"+lastTimestamp - timestamp);  }  // If they are generated at the same time, the sequence in milliseconds is performed if (lastTimestamp == timestamp) {  sequence = (sequence + 1) & sequenceMask; // Sequence overflow in milliseconds if (sequence == 0) { // Get the new timestamp timestamp = tilNextMillis(lastTimestamp);  }  } else{// The timestamp changes, and the sequence resets within milliseconds sequence = 0L;  }  // The last time the ID was generated lastTimestamp = timestamp;  // Shift and put together by the or operation to form a 64-bit ID return((timestamp - twepoch) << timestampLeftShift) // Calculate the timestamp| (datacenterId < < datacenterIdShift) / / computing data center| (workerId < < workerIdShift) / / computing machine ID| sequence; / / serial number }  / * ** Get a new timestamp* @param lastTimestamp Timestamp When the last ID was generated * @returnCurrent timestamp* / protected long tilNextMillis(long lastTimestamp) {  long timestamp = getCurrentTime(); // If the current time is equal to the last time, block until the latest time is obtained (the time after the callback) while (timestamp <= lastTimestamp) {  timestamp = getCurrentTime();  }  return timestamp;  }  / * ** Get the current time * @returnCurrent time (ms)* / protected long getCurrentTime() {  return System.currentTimeMillis();  }  Copy the code

In order to make a good impression on the interviewer, I also wrote a note so that he would not talk about me again. When I finished typing, I moved the computer back to him. He looked at it quickly, nodded and smiled.

Interviewer: Well, you have a solid foundation. You have prepared a lot before the interview. You have read a lot of interview materials.

I thought how to prepare for an interview? I have been repreparing all the time. From work to now, I have been summarizing my knowledge points and forming my own knowledge system. In order to cater to him, I can only say yes.

I: HMM, yes, prepared for a long time, is more full.

Interviewer: Well, the last two algorithms, do you know more about them?

The Leaf and UidGenerator

I: the last two really do not have in-depth understanding, before there is read online data said that Meituan Leaf algorithm needs to rely on the database, ZK, and also can ensure to go to the uniqueness of global ID, single item increasing.

I: Baidu UidGenerator algorithm is based on the snowflake algorithm to implement, but also needs to rely on the database, and the snowflake algorithm is different, “UidGenerator support custom timestamp, clause center ID and machine ID, serial number of bits”.

Interviewer: Uh-huh, ok, that’s all for today’s interview, young man. I’ll see you next time.

In complacently……