Writing in the front

The design of MyKit-Serial framework refers to the Open source Vesta framework of Li Yanpeng, and the Vesta framework is completely reconstructed, referring to the idea of SnowFlake algorithm, and on this basis, it has been comprehensively upgraded and optimized. Support embedded (Jar package), RPC (Dubbo, Motan, SOFA, SpringCloud, SpringCloud Alibaba and other mainstream RPC framework), Restful API (support SpringBoot and Netty), Maximum peak and minimum granularity modes are supported.

Open source address:

GitHub:github.com/sunshinelyz…

Gitee:gitee.com/binghe001/m…

Why not use database increment fields?

If the self-added field of the database is used in the business system, the self-added field completely depends on the database, which brings a lot of trouble in database transplantation, capacity expansion, data cleaning, database and table division and other operations.

When a database is divided into databases and tables, one method is to achieve the uniqueness of ids across databases by adjusting the step size of the increment field or database sequence. However, it is still a strongly database dependent solution, with many limitations, and strongly dependent on database types. If we wanted to add a database instance or migrate our business to a different type of database, that would be quite cumbersome.

Why not use UUID?

UUID can ensure the uniqueness of IDS, but it cannot meet many other features required by business systems, such as rough time ordering, invertibility, and manufacturability. In addition, UUID generated when using full time data, the performance is poorer, and UUID is long, take up the space is large, indirectly led to the decrease of the database performance, and, more importantly, UUID do not order, this leads to B + tree index at the time of writing will have too many random write (write) ID will produce part of the continuous order. In addition, since sequential append operations cannot be generated during write, an INSERT operation is required, which reads the entire B+ tree node into memory, inserts the record, and writes the entire node back to disk. This operation degrades performance greatly when the record occupies a large space. Therefore, the use of UUID is not recommended.

Questions to consider

Given the limitations of database increments of ids and UUID, we need to consider how to design a distributed globally unique serial number (DISTRIBUTED ID) service. Here, we need to consider the following factors.

Globally unique

The only pessimistic strategy that a distributed system can guarantee globally is to use locks or distributed locks, but using locks can significantly degrade performance.

Therefore, we can learn from Twitter’s SnowFlake algorithm to take advantage of the orderliness of time and use self-increasing sequences under a certain unit of time to achieve global uniqueness.

A rough order

The biggest problem with UUID is disorder. Any business hopes to generate an ordered ID. However, to achieve complete order in a distributed system, it involves data aggregation and requires the use of locks or distributed locks. There are two main solutions, second order and millisecond order, and there is a trade-off. We have decided to support one of the two approaches by configuring the service to use one of them.

Can be the solution

After an ID is generated, the ID itself contains a lot of information. When we conduct online investigation, we usually see the ID first. If we can know when it is generated and where it comes from according to the ID, such a solvable ID can help a lot.

If the ID has a time and can be reversed, the storage level will save a lot of space used by traditional timestamp fields, which is also a two-stone design.

Can be made

A system does not guarantee high availability again forever is not a problem, the problem how to do, manual processing, the data is what to do with pollution, clean data, but data processing or wash by hand, if the use of database on the field, ID has been covered in the later business, how to restore the system, the problem of time window?

Therefore, the distributed global serial number (DISTRIBUTED ID) service we use must be replicable, recoverable, and manufacturable.

A high performance

No matter which business, order or commodity, if a new record is inserted, it must be the core function of the business, the performance requirements are very high, ID generation depends on the network IO and CPU performance, CPU is generally not the bottleneck, according to experience, a single machine TPS should reach 10000/s.

High availability

First, the distributed global serial number (DISTRIBUTED ID) service must be a peer-to-peer cluster, one machine fails, requests must be able to be forwarded to other machines, and a retry mechanism is also necessary. Finally, if the remote service goes down, we need to have a local fault tolerance scheme, and the dependency of local libraries can act as a final barrier to high availability.

That is, we support RPC publishing mode, embedded publishing mode and REST publishing mode. If one mode is unavailable, we can fall back to another publishing mode, and if Zookeeper is unavailable, we can fall back to using a locally pre-configured machine ID. Thus achieving maximum availability of the service.

scalable

As a distributed system, one thing that can never be ignored is the continuous growth of the business. The absolute capacity of the business is not the only standard to measure a system, and the business is always growing. Therefore, the system design should not only consider the absolute capacity that can be sustained, but also consider the speed of business growth. Whether the horizontal expansion of the system can meet the growth rate of business is another important standard to measure a system.

Design and Implementation

Overall architecture design

The overall architecture of MyKit-Serial is shown below.

The meanings of each module of myKit-Serial framework are as follows:

  • Mykit-bean: Provides unified bean class encapsulation and constants used throughout the framework.
  • Mykit-common: encapsulates a common utility class for the entire framework.
  • Mykit-config: provides global configuration capability.
  • Mykit-core: the core implementation module of the entire framework.
  • Mykit-db: stores database scripts.
  • Mykit-interface: the core abstract interface of the entire framework.
  • Mykit-service: Core function based on Spring implementation.
  • Mykit-rpc: provide external services in the form of RPC (support Dubbo, Motan, SOFA, SpringCloud, SpringCloud Alibaba and other mainstream RPC frameworks).
  • Mykit-server: Dubbo mode is implemented at present, and will be migrated to mykit-RPC module later.
  • Mykit-rest: Rest service implemented based on SpringBoot.
  • Mykit-rest_netty: Rest service implemented based on Netty.
  • Mykit-test: the whole framework of the test module, through this module can quickly master the use of mykit-serial.

Release pattern

According to the final customer usage, it can be divided into embedded publishing mode, RPC publishing mode and Rest publishing mode.

  1. Embedded publishing mode: Only for Java clients, it provides a local Jar package, which is an embedded native service that requires a local machine ID configured in advance (or a unique distributed serial number dynamically assigned by Zookeeper when the service is started), but is not dependent on the central server.

  2. RPC publishing mode: Applies only to Java clients, providing a client Jar package for the service. Java programs are called as if they were calling a local API, but depend on the central distributed serial number (distributed ID) generation server.

  3. REST publishing mode: The central server provides services through Restful apis for non-Java clients to use.

The publication pattern is eventually recorded in the generated global sequence number.

Serial number type

According to the number of bits of time and serial number, it can be divided into maximum peak type and minimum granularity type.

1. Maximum peak type: second order, second time occupies 30 bits, serial number occupies 20 bits

field version type generation Second class time The serial number The machine ID
digits 63 62 60-61. 30 to 59 10-29 0-9

2. Minimum granularity: use millisecond order, millisecond time occupies 40 bits, serial number occupies 10 bits

field version type generation Millisecond time The serial number The machine ID
digits 63 62 60-61. 20 to 59 10-19 0-9

The maximum peak type can withstand higher peak pressure, but the roughly ordered particle size is a little larger. The minimum particle size has finer particle size, but each millisecond can withstand a limited theoretical peak value of 1024. If more requests are generated in the same millisecond, the response must wait until the next millisecond.

The type of distributed SERIAL number (DISTRIBUTED ID) is specified during the configuration. You need to restart the service to switch to each other.

The data structure

1. The serial number

Maximum peak type

20 bits, theoretically can produce an average of 2^20= 1048576 ID per second, million level, if the system’s network IO and CPU are strong enough, can withstand the peak of up to million level per millisecond.

Minimum size type

10 bits, serial number 2^10=1024 in total in every millisecond, that is, each millisecond generates at most 1000+ ID, theoretically bear the peak is completely inferior to our maximum peak scheme.

2. Second time/millisecond time

Maximum peak type

30 bits, representing the second level of time, 2^30/60/60/24/365=34, that is, 30+ years of use.

Minimum size type

2^40/1000/60/60/24/365=34. You can also use 30+ years.

3. The machine ID

10 bits, 2^10=1024, which means 1000+ servers at most. The central publishing mode and REST publishing mode generally do not have too many machines. According to the design of TPS 10,000 /s for each machine, 10 servers can have 100,000 /s TPS, which can basically meet most business requirements.

But considering that we can use inline publishing in business services, the need for machine ids becomes even greater, with a maximum of 1024 servers supported.

4. Generation mode

Two bits to distinguish three publishing modes: embedded publishing mode, RPC publishing mode, and REST publishing mode.

00: embedded publishing mode 01: RPC publishing mode 02: REST publishing mode 03: Reserved

5. Serial number type

1 bit to distinguish between two ID types: maximum peak and minimum granularity.

0: maximum peak value 1: minimum particle size

Version 6.

It is used as a temporary solution for expansion or capacity expansion.

0: the default value, in case the value is truncated when converted to an integer and then back to a string. 1: indicates expansion or expansion

It can be used as a 30 year extension, or extended to second or millisecond time to gain a migration window of the system when the ID is nearly used up after 30 years. In fact, it can be used for another 30 years if it is extended by one bit.

Concurrent processing

For the central server and REST distribution mode, ID generation involves network IO and CPU operations. The generation of ID is basically a memory-to-cache operation without IO operations. Network IO is the bottleneck of the system.

Network IO is the bottleneck compared with CPU computing speed. Therefore, the service generated by ID uses multithreading. For the competing points of time and sequence in the process of ID generation, multiple implementation methods are used

  1. Mutual exclusion using ReentrantLock for a Concurrent package is the default implementation and a compromise for both performance and stability.
  2. Using traditional synchronized to the mutex, this way the performance of the slightly less, by passing in the JVM parameter – Dmykit. Serial. Sync. Lock. The impl. Key = true to open it.
  3. Using CAS mutually exclusive in the form of this implementation performance is very high, but in a high CPU load, high concurrency environment by passing in the JVM parameter – Dmykit. Serial. Atomic. Impl. Key = true to open it.

Assignment of machine ids

We split the machine ID into two sections, one serving the RPC publishing pattern and the REST publishing pattern, and the other serving the embedded publishing pattern.

0-923: embedded publishing mode, pre-configured, (or generated by Zookeeper), supports a maximum of 924 embedded servers 924-1023: Central server publishing mode and REST publishing mode, supports a maximum of 300 servers, supports a maximum of 300 x 10,000 =3 million TPS /s

If the usage of the embedded and RPC publishing and REST publishing patterns does not match this ratio, we can dynamically adjust the values of both ranges to suit.

In addition, there is an inherent isolation between vertical services, each of which can use up to 1024 servers.

And they are integrated

For embedded publishing mode, the service startup needs to connect to the Zookeeper cluster. Zookeeper assigns an ID in the 0-923 range. If the IDS in the 0-923 range are used up, Zookeeper assigns an ID greater than 923.

If you don’t want to use the unique machine ID generated by Zookeeper, we provide a default pre-configured machine ID solution. Each service that uses the unified Distributed Global Serial Number (DISTRIBUTED ID) service requires a pre-configured default machine ID.

Time synchronization

When using mykit-Serial to generate distributed global serial numbers (distributed IDS), we need to ensure that the server time is normal. In this case, we can use the Scheduling task Crontab of Linux to periodically approve the server time through the virtual cluster of timing servers (more than 3000 servers worldwide).

ntpdate -u pool.ntp.orgpool.ntp.org

performance

The final performance verification should ensure that the TPS of each server reaches more than 10,000 /s.

A Restful API documentation

Generate distributed global sequence numbers

  • Generates a globally unique global sequence number based on system time and returns it in the method body.
  • Path: / genSerialNumber
  • Parameters: N/A
  • Non-empty parameters: N/A
  • Example: http://localhost:8080/genSerialNumber
  • Results: 3456526092514361344

Invert the global sequence number

  • Description: Inverts the generated serialNumber and returns the inverse JSON string in the response body.
  • Path: / expSerialNumber
  • Parameter: serialNumber=?
  • Non-empty parameter: serialNumber
  • Example: http://localhost:8080/expSerialNumber? serialNumber=3456526092514361344
  • Results: {” genMethod “: 2,” the machine “: 1022,” seq “: 0,” time “: 12758739,” type “: 0,” version “: 0}

The translation of time

  • Description: Converts the time of a long integer to a readable format.
  • Path: / transtime
  • Parameters: time =?
  • Non-null parameter: time
  • Example: http://localhost:8080/transtime? time=12758739
  • Results: Thu May 28 16:05:39 CST 2015

Manufacture global serial number

  • Description: Make distributed global sequence numbers from the given distributed global sequence number element.
  • Path: / makeSerialNumber
  • Parameters: genMethod =? &machine=? &seq=? &time=? &type=? &version=?
  • Non-empty parameters: time,seq
  • Example: http://localhost:8080/makeSerialNumber? genMethod=2&machine=1022&seq=0&time=12758739&type=0&version=0
  • Results: 3456526092514361344

The Java API documentation

Generate global serial number

  • Generates a globally unique distributed serial number (distributed ID) based on system time and returns it in the method body.
  • Class: SerialNumberService
  • Methods: genSerialNumber
  • Parameters: N/A
  • Return type: long
  • Example: long serialNumber = serialNumberService. GenSerialNumber ();

Invert the global sequence number

  • Description: Inverts the generated distributed serial number (distributed ID) and returns the inverted JSON string in the response body.
  • Class: SerialNumberService
  • Methods: expSerialNumber
  • Parameter: Long serialNumber
  • Return type: SerialNumber
  • Example: the SerialNumber SerialNumber = serialNumberService. ExpSerialNumber (3456526092514361344);

The translation of time

  • Description: Converts the time of a long integer to a readable format.
  • Class: SerialNumberService
  • Methods: transTime
  • Parameter: Long time
  • Return type: Date
  • Example: the Date Date = serialNumberService. TransTime (12758739);

Make global serial number (1)

  • Description: Make distributed sequence numbers from a given distributed sequence number element.
  • Class: SerialNumberService
  • Methods: makeSerialNumber
  • Parameters: long time, long seq
  • Return type: long
  • Example: long serialNumber = SerialNumberService. MakeSerialNumber (12758739, 0).

Manufacture global serial number (2)

  • Description: Make ids from the given ID element.
  • Class: SerialNumberService
  • Methods: makeSerialNumber
  • Parameters: Long machine, long time, long SEq
  • Return type: long
  • Example: long serialNumber = serialNumberService. MakeSerialNumber (1, 12758739, 0).

Manufacture global serial number (3)

  • Description: Make ids from the given distributed sequence number element.
  • Class: SerialNumberService
  • Methods: makeSerialNumber
  • Parameters: Long genMethod, long machine, Long time, long seq
  • Return type: long
  • Example: long serialNumber = serialNumberService. MakeSerialNumber (0, 1, 12758739, 0).

Manufacture global serial number (4)

  • Description: Make ids from the given distributed sequence number element.
  • Class: SerialNumberService
  • Methods: makeSerialNumber
  • Parameters: Long Type, long genMethod, long machine, long time, long seq
  • Return type: long
  • Example: long serialNumber = serialNumberService. MakeSerialNumber (0, 2, 1, 12758739, 0).

Manufacture global serial number (5)

  • Description: Make ids from the given ID element.
  • Class: SerialNumberService
  • Methods: makeSerialNumber
  • Parameters: Long version, Long Type, Long genMethod, long machine, Long time, long SEq
  • Return type: long
  • Example: long serialNumber = serialNumberService. MakeSerialNumber (0, 0, 2, 1, 12758739, 0).

FAQ

1. Will the ID generation function be affected when you adjust the time?

Mykit-serial throws an exception and refuses to generate the ID. Restart the machine and adjust the time. The ID is normally generated after the adjustment. No ID is generated during the adjustment period.

2. What is the impact of the slow or fast restart time?

Restart the machine to slow down the time, mykit-Serial may generate repeated time, the system administrator needs to ensure that this does not happen. Restart the machine and adjust the time. The ID is normally generated after the adjustment. No ID is generated during the adjustment period.

3. Will the ID generation function be affected once every four years?

The difference between the atomic clock and the electronic clock is 1 second every four years, which means that the electronic clock is 1 second behind the atomic clock every four years. Therefore, the network clock synchronizes time every four years. However, local machines such as Windows and Linux do not automatically synchronize time. Because the clock is set one second faster, ID generation is not affected after adjustment. No ID is generated within 1s after adjustment.

Ok, that’s enough for today. I’m Glacier. See you next time

Write in the last

If you think glacier wrote good, please search and pay attention to “glacier Technology” wechat public number, learn with glacier high concurrency, distributed, micro services, big data, Internet and cloud native technology, “glacier technology” wechat public number updated a large number of technical topics, each technical article is full of dry goods! Many readers have read the articles on the wechat public account of “Glacier Technology” and succeeded in job-hopping to big factories. There are also many readers to achieve a technological leap, become the company’s technical backbone! If you also want to like them to improve their ability to achieve a leap in technical ability, into the big factory, promotion and salary, then pay attention to the “Glacier Technology” wechat public account, update the super core technology every day dry goods, so that you no longer confused about how to improve technical ability!