Introduction to the

In the last article we talked about short url services (portal: How to design a short url service) ❤ ️ involves a lot of points, which is the core of the global id, so today we will have a chat to global change id how to design to efficient and safe ❤ ️ as well as the advantages and disadvantages of each method, problems and solutions ❤ ️ patience to finish with the interviewer can blow dry, full water to the line Are you ready 🐒

The body of the

Why global ID

meaning

A global ID is a string or number that will never be repeated by a business. Use mysql increment or Java i++. Of course!! Using mysql for increased global id premise is no depots table using Java i++ must ensure that your service will not restart and is a standalone service (no 🤔)

implementation

We have to have something that guarantees uniqueness and order even in distributed, separate libraries and tables. Right? Aye? Is that bad luck? What’s the first thing that comes to mind? Redis bai, also can how ~ ~ ~, NONO, or there are a lot of implementation methods, the following we discuss a few commonly used implementation methods on the market

UUID

Go straight to code

System.out.println(UUID.randomUUID());
Copy the code

The result: 9796A02E-ec79-47AB-bd10-d5a2D27CE650 Is simple enough to generate a number + English mystery code, but such a string for business, what significance? It doesn’t make any sense. You don’t usually choose to use it as a distributed global ID. What is a UUID

redis

We can use the atomic operation of Redis to implement global ID INCR/Incrby redis performance is very high, if there is a performance bottleneck on a single machine, we can use Redis cluster (coDIS is recommended), multiple clusters can use step (incrby cluster number) to avoid id duplication, like we have 3 clusters

The cluster Id sequence
A One, four, seven, 10, 13
B Two, five, eight, 11, 14
C Three, six, nine, 12, 15

Interviewer: What if Redis goes down or restarts? Me: Yes, redis is based on memory, even if there is a persistent mechanism RDB, AOF, when Redis down or restart will inevitably lose data, god how to do so big bug ah? Why don’t you just give up? ❌ pretend to think 🤔 after…… Me: Because Redis is based on memory, even if there is a persistence mechanism RDB, AOF, when Redis down or restart will inevitably lose data? If the global ID key is lost, then the ID needs to start from scratch or generate duplicate IDS. In this case, we can use a separate Redis instance to make the global ID. AOF is used to enhance the use of redis, save every record of the redis operation, so that when restarting, data will not be lost. The disadvantage is that it takes too long to recover data. Interviewer: Ok, let’s continue talking ~ ~

mysql

To use mysql as a global ID, there are two ways: 🈶️

  1. On the id
  2. Id field

On the id

Use a single table, with the ID increment to achieve the effect of global ID

CREATE TABLE t_sequence_1 (
    id bigint(20) unsigned NOT NULL auto_increment, 
    value char(10) NOT NULL default '',
    PRIMARY KEY (id)
);
Copy the code

To generate an ID, write a piece of data into it and return the ID

insert into t_sequence_1(value)value("xxx");
Copy the code

This is simple, but deadly. Mysql itself is a performance bottleneck when the service QPS is high, resulting in a large amount of data in this table, which is therefore risky and not recommended

Id field

Use a single table that has only one record, add the ID field and return it when the ID is generated

CREATE TABLE `t_sequence` (
  `name` varchar(50) CHARACTER SET utf8mb4 NOT NULL DEFAULT '',
  `id` bigint(30) NOT NULL DEFAULT '0',
  PRIMARY KEY (`name`)
) 
Copy the code

Use the name field to differentiate services, such as the current business is tingURL, first give an initial value

insert t_sequence value ("tinyurl",1);
Copy the code

Check the ID field before adding the ID. Ensure the concurrency control ⭐️⭐️⭐ ⭐️⭐️

Pessimistic lock implementation:  step1. SELECT id from t_sequence where name = 'tinyurl' for update step2. UPDATE t_sequenceset id = id + #{num} where name = 'tinyurl' step3. Return the value of ID +num as the global ID for the service side to use. Num is the step size, indicating the id interval size. The value can be 1 or >1. Of course, you can also use optimistic lock to achieve, but the business needs to do spin operation, the gain is not worth the loss optimistic lock implementation:  step1. SELECT id from t_sequence where name = 'tinyurl' step2. UPDATE t_sequenceset id = id + #{num} where name = 'tinyurl' and id = #{id} (step1 query id), where num is the step size, indicating the id interval size, can be 1, or >1, use step3 for specific services. If the number of update entries is 1, id+num is returned. If the number of update entries is 0, the update conflict occurs and the process starts from step1Copy the code

Interviewer: You speak well and concurrency control is down, but have you considered the I/O overhead? And the maximum capacity? I (heart) : just wait for you to ask this question, I have to pretend to think about it before answering you this question, so as to appear that I am good enough, hee hee HEE I am really a small smart 🧐 pretend to think for a number of seconds: In view of the I/O overhead, we can achieve an id generator, constantly generate a thread id, guarantee uniqueness and order, and then the generated id into the queue (first in first out, ensure order), to obtain a work thread id take out directly from the head of this queue, when this is not just to remove the worker threads I/O process, direct reading memory, The speed is greatly increased. Of course, we can also set a rule, such as when the queue length is less than a certain threshold to regenerate the ID. The following is the id generator. This method is also suitable for redis id generation. It is also the strategy for generating global ID of short URL service

Private static final int STEP = 50; private static final int STEP = 50; // create a queue of fixed length to store the pre-generated id, Private static final ArrayBlockingQueue<Long> SEQUENCE_QUEUE = new ArrayBlockingQueue<>(step*5); @postconstruct public void init() {new Thread() -> {while (true) {try {// If the queue size is smaller than step/2 (SEQUENCE_QUEUE.size() < (STEP / 2)) { genId(); log.info("queueSsize(" + SEQUENCE_QUEUE.size() + ")"); } } catch (Exception e) { ThreadUtil.sleep(50); } ThreadUtil.sleep(50); } }).start(); } /** * generate id */ private void genId() {// obtain id from database int seqMax = sequencerepository.getid (STEP); for (long i = seqMax - STEP + 1; i <= seqMax; i++) { try { SEQUENCE_QUEUE.put(i); } catch (InterruptedException e) { log.error("tinyurl_error: SEQUENCE_QUEUE put fail", e); }} public int getId(int step) {sequencemapper.getid (); // updateId sequencemapper.updateid (step); // return id return id + step; } getId SQL: SELECT ID from T_sequence WHERE name = 'tinyURL' for update updateId SQL: UPDATE t_sequence set id = id + #{step} where name = 'tinyurl'Copy the code

Here’s how to get the ID

/** * Get an ID from the pregenerated ID queue header ** @return * @throws Exception */ public long newId() throws Exception {return SEQUENCE_QUEUE.take(); }Copy the code

In the case of “run out of ids”, the generated ID exceeds the storage size. For example, the maximum storage size of long is 9223372036854775807, which is the same as the maximum storage size of Java long. We can sigh that our business is so hot 😎 when the global ID reaches this upper limit, we should consider whether our global ID is not reasonable use? (for example, if you want to generate ids like this in a logging system, that is absolutely not possible), should you consider the horizontal separation of business? This is not a problem that can be solved at the system level (unless the ID is changed to decimal, which is costly for the business side as well), and it is time to consider business splitting. Interviewer: Nice guy. We came into the office to talk about everything

Mysql number segment mode

This mode is also widely used. The principle is to obtain ids in batches from the database. Each time, a number segment is read from the DB

CREATE TABLE t_step_id (id BIGINT(20) NOT NULL, type int(10) NOT NULL COMMENT 'business type ', Max_id bigint(20) NOT NULL COMMENT 'iD ', step int(20) NOT NULL COMMENT' id', Version int(20) NOT NULL COMMENT 'upgrade ', PRIMARY KEY (' id')) insert int t_step_id value(1,1,5000,5000,1); If the service provider needs to use ids, it first reads the 5000 ids from 1 to 5000 from the memory and extracts a minimum value, such as 996, and then removes 996 from the memory. If there is no value in the memory, it needs to apply for a number segment from mysql again. The SQL is as follows:  update t_step_id set max_id = #{max_id+step}, Version = version + 1 where version = # {version} and type = 1 where version = # {version} and type = 1 Because support multi-service, prevent concurrency only ~ ~Copy the code

This way to avoid multiple access to the database, reduce the NUMBER of I/O times, increase efficiency, but also has the disadvantage, once the number segment of the node to apply for a restart, then will lose this section of ID, causing the ID void discontinuous, but harmless 🤔.

SnowFlake (Twitter)

Go straight to the schematic

  • The first bit is the identification part. In Java, because the highest bit of long is the sign bit, the positive number is 0, and the negative number is 1. Generally, the generated ID is a positive number, so it is fixed to 0.
  • The timestamp part is 41bit, this is the millisecond level of time, the general implementation does not store the current timestamp, but the difference of the timestamp (current time – fixed start time), so that the generated ID starts from a smaller value; The 41-bit timestamp will last 69 years, (1L << 41)/(1000L * 60 * 60 * 24 * 365) = 69 years
  • The number of working machine ids is 10 bits. This parameter is flexible. For example, you can use the first five digits as the id of the data center equipment room and the last five digits as the id of the single machine room to deploy 1024 nodes.
  • The serial number is 12 bits, and 4096 ids can be generated for the same node in the same millisecond

The following code is copied, do not spray 🤣🤣🤣

package snowflake; Public class SnowflakeIdWorker {/** * Start time (2015-01-01), you can customize */ private final Long twepoch = 1420041600000L; Private final Long workerIdBits = 5L; /** * private final Long datacenterIdBits = 5L; */ Private final Long maxWorkerId = -1l ^ (-1l << workerIdBits); */ Private final Long maxWorkerId = -1l ^ (-1l << workerIdBits);  Private final Long maxDatacenterId = -1L ^ (-1L << datacenterIdBits); private Final Long maxDatacenterId = -1L ^ (-1L << datacenterIdBits); */ private final Long sequenceBits = 12L; */ private final Long workerIdShift = sequenceBits; */ Private final Long datacenterIdShift = sequenceBits + workerIdBits; */ Private final Long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits; Private final Long sequenceMask = -1l ^ (-1L << sequenceBits); private final Long sequenceMask = -1l ^ (-1l << sequenceBits); /** * private long workerId; /** * datacenterId (0~31) */ private long datacenterId; /** * private long sequence = 0L; /** * Private long lastTimestamp = -1l; /** * constructor ** @param workerId work ID (0 to 31) * @param datacenterId datacenterId (0 to 31) */ public SnowflakeIdWorker(long) workerId, long datacenterId) { if (workerId > maxWorkerId || workerId < 0) { throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId)); } if (datacenterId > maxDatacenterId || datacenterId < 0) { throw new IllegalArgumentException(String.format("datacenter  Id can't be greater than %d or less than 0", maxDatacenterId)); } this.workerId = workerId; this.datacenterId = datacenterId; } /** * obtain the nextId(this method is thread-safe) ** @return SnowflakeId */ public synchronized long nextId() {long timestamp = System.currentTimeMillis(); // If the current time is less than the last ID generated timestamp, If (timestamp < lastTimestamp) {throw new RuntimeException(string. format("Clock moved backwards. Refusing to generate id for %d milliseconds", lastTimestamp - timestamp)); If (lastTimestamp == timestamp) {sequence = (sequence + 1) &sequencemask; If (sequence == 0) {timestamp = tilNextMillis(lastTimestamp); Else {sequence = 0L; } // The last time the ID was generated lastTimestamp = timestamp; / / shift and through or operations up to 64 - bit ID return ((timestamp - twepoch) < < timestampLeftShift) / / | (datacenterId < < datacenterIdShift) / / | (workerId << workerIdShift) // | sequence; } /** * blocks until the next millisecond, * * @param lastTimestamp Last generated ID * @return current timestamp */ protected long tilNextMillis(long lastTimestamp) {long  timestamp = System.currentTimeMillis(); while (timestamp <= lastTimestamp) { timestamp = System.currentTimeMillis(); } return timestamp; } public static void main(String[] args) {SnowflakeIdWorker idWorker = new SnowflakeIdWorker(0, 0); for (int i = 0; i < 100; i++) { long id = idWorker.nextId(); System.out.println(id); }}}Copy the code

This algorithm is highly efficient, has no I/O overhead, and does not depend on any middleware, but has a fatal disadvantage, is dependent on the server time, if the server time rollback, may cause id duplication

conclusion

These are just a few common global ID solutions, but there are many open source solutions for example

  • TinyID
  • Uidgenerator (Baidu)
  • Leaf (Meituan)

We are interested in their own understanding, in fact, is the encapsulation of the above methods + optimization, all changes from its original article, completely based on personal understanding of the code out of the article, if there are incorrect places I hope you actively pointed out, become better ~ ~

⭐️⭐ ⭐ ⭐ ⭐️ ️ : How to design an efficient and accurate ES full and incremental service? I will discuss this problem based on my own experience. I hope THAT BLOG is a Java programmer who is trying to become stronger