Moment For Technology

Database sub-database sub-table middleware Sharding-JDBC source code analysis - distributed primary key

Posted on Nov. 27, 2023, 11:14 p.m. by Shayak Krishnan
Category: The back-end Tag: The back-end The database algorithm The source code

??? follow wechat public number:

  1. RocketMQ/MyCAT/Sharding-JDBC all source code analysis article list
  2. RocketMQ/MyCAT/Sharding-JDBC 中文 解 决 source GitHub address
  3. Any questions you may have about the source code will be answered carefully. Even do not know how to read the source can also ask oh.
  4. New source code parsing articles are notified in real time. It's updated about once a week.
  5. Serious source communication wechat group.

This article is based on Sharding-JDBC 1.5.0 official release

1. An overview of the

This article shares the Sharding-JDBC distributed primary key implementation.

The official document "Distributed primary key" is very complete in its introduction and usage, strongly read first. The motivation for implementing distributed primary keys is referenced below:

In traditional database software development, automatic generation of primary keys is a basic requirement. Major databases also provide support for this requirement, such as MySQL's auto-increment key. For MySQL, it is very difficult to generate globally unique ids for different tables after the database is divided into different tables. Since increment keys between different real tables in the same logical table are not mutually aware, duplicate ids can be generated. We can certainly achieve data non-duplication by constraining the rules for table key generation, but this requires the introduction of additional operational forces to solve the duplication problem and make the framework less extensible.

There are a number of third-party solutions that perfectly solve this problem, such as self-generating non-repeating keys with specific algorithms such as UUID, or generating services with the introduction of ids. But because of this diversity, sharding-JDBC can be limited if it relies heavily on any one solution.

Based on the above reasons, the JDBC interface is finally adopted to achieve access to the generated Id, and the underlying specific Id generation implementation is separated.

Sharding-JDBC is collecting a list of companies that use: Portals. ? Your registration will allow more people to participate and use Sharding-JDBC. The portal Sharding-JDBC will therefore be able to cover more business scenarios. Sign up for the portal, Slut! portal

2. KeyGenerator

KeyGenerator, primary KeyGenerator interface. The implementation class provides the ability to generate primary keys externally by implementing the #generateKey() method.

2.1 DefaultKeyGenerator

DefaultKeyGenerator, the default primary key generator. The generator uses the Twitter Snowflake algorithm to generate 64 Bits of Long numbers. MyCAT distributed primary key, another database middleware in China, is also based on this algorithm. The number transmitter services of many large Internet companies in China are implemented based on the algorithm plus partial transformation. So DefaultKeyGenerator must be root and seedling red. If you're interested in distributed primary keys, check out "Talk about IDS," which I've put together.

Ahem, ahem, a little off topic. The numbers are made up of four parts, from high to low (left to right) :

Bits The name instructions
1 The sign bit Equal to zero
41 The time stamp 2 ^ 41/365/24/60/60/1000 =69.7 years
10 Working process Number A maximum of 1024 processes are supported
12 The serial number The value is incremented from 0 every millisecond and supports 4096 numbers
  • Each worker process can generate 4,096,000 numbers per second. Is ash often cow than ?
// Public final class DefaultKeyGenerator implements KeyGenerator {/** * */ public static Final Long EPOCH; Private static final Long SEQUENCE_BITS = 12L; /** * WORKER_ID_BITS = 10L; /** * WORKER_ID_BITS = 10L; Private static final long SEQUENCE_MASK = (1  SEQUENCE_BITS) - 1; Private static final Long WORKER_ID_LEFT_SHIFT_BITS = SEQUENCE_BITS; private static final Long WORKER_ID_LEFT_SHIFT_BITS = SEQUENCE_BITS; */ private static final Long TIMESTAMP_LEFT_SHIFT_BITS = WORKER_ID_LEFT_SHIFT_BITS + WORKER_ID_BITS; */ private static final Long WORKER_ID_MAX_VALUE = 1L  WORKER_ID_BITS; @Setter private static TimeService timeService = new TimeService(); /** * private static long workerId; static { Calendar calendar = Calendar.getInstance(); calendar.set(2016, Calendar.NOVEMBER, 1); calendar.set(Calendar.HOUR_OF_DAY, 0); calendar.set(Calendar.MINUTE, 0); calendar.set(Calendar.SECOND, 0); calendar.set(Calendar.MILLISECOND, 0); EPOCH = calendar.getTimeInMillis(); } /** * private long sequence; /** * private long lastTime; ** @param workerId workerId */ public static void setWorkerId(final Long workerId) { Preconditions.checkArgument(workerId = 0L  workerId  WORKER_ID_MAX_VALUE); DefaultKeyGenerator.workerId = workerId; ** @override public synchronized Number generateKey() {// Ensure that the current time is greater than the last time. Long time back can lead to repeat id currentMillis = timeService. GetCurrentMillis (); Preconditions.checkState(lastTime = currentMillis, "Clock is moving backwards, last time is %d milliseconds, current time is %d milliseconds", lastTime, currentMillis); If (lastTime == currentMillis) {if (0L == (sequence = ++sequence  SEQUENCE_MASK)) { CurrentMillis = waitUntilNextTime(currentMillis); } } else { sequence = 0; } // Set the last timestamp lastTime = currentMillis; if (log.isDebugEnabled()) { log.debug("{}-{}-{}", new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS").format(new Date(lastTime)), workerId, sequence); } / / generated number return ((currentMillis - EPOCH)   TIMESTAMP_LEFT_SHIFT_BITS) | (workerId   | WORKER_ID_LEFT_SHIFT_BITS) sequence; } /** * get time, Private Long waitUntilNextTime(Final Long lastTime) {long time = final long lastTime timeService.getCurrentMillis(); while (time = lastTime) { time = timeService.getCurrentMillis(); } return time; }}Copy the code
  • EPOCH = calendar.getTimeInMillis();Count the number of milliseconds since zero on 11/01, 2016.
  • #generateKey()Implementation logic
    1. Verify that the current time is less than or equal to the current time. The last time stamp is generated to avoid server clock synchronization, which may cause time rollback and duplicate numbering
    • Get the serial number. The current timestamp can be obtained since the increment reaches its maximum when called#waitUntilNextTime()Get the next millisecond
    • Set the last generation number timestamp to verify the time rollback
    • Bit operation generation number

In general, the implementation of Twitter Snowflake algorithm is relatively simple and easy to understand. What is more troublesome is how to solve the assignment of work process number?

  1. What if there are more than 1024?
  2. How to ensure global uniqueness?

The first problem is to separate distributed primary key generation into a transmitter service that provides the ability to generate distributed numbers. This is beyond the scope of this article, but if you are interested, Google it.

The second problem is middleware that provides distributed configuration functionality through Zookeeper, Consul, Etcd, etc. Of course, Sharding-JDBC also provides a way to not rely on these services, so let's look at them one by one.

2.2 HostNameKeyGenerator

Gets the worker process number based on the numeric number at the end of the machine name. If there is a unified specification for naming online machines, it is recommended to use this method. For example, if the HostName of the machine is dangdang-db-sharding-dev-01(company name - department name - service name - environment name - id), the last number of the HostName 01 will be truncated as the workId.

// static void initWorkerId() { InetAddress address; Long workerId; try { address = InetAddress.getLocalHost(); } catch (final UnknownHostException e) { throw new IllegalStateException("Cannot get LocalHost InetAddress, please check your network!" ); } String hostName = address.getHostName(); try { workerId = Long.valueOf(hostName.replace(hostName.replaceAll("\\d+$", ""), "")); } catch (final NumberFormatException e) { throw new IllegalArgumentException(String.format("Wrong hostname:%s, hostname must be end with number!" , hostName)); } DefaultKeyGenerator.setWorkerId(workerId); }Copy the code

2.3 IPKeyGenerator

Get the worker process number based on the machine IP. This is recommended if the last 10 bits of the IP binary representation of the online machine are not duplicated. For example, if the IP address of the machine is, the binary representation is 11000000 10101000 00000001 01101100, the last 10 digits of 01 01101100 are converted to 364, and the working process number is set to 364.

// static void initWorkerId() { InetAddress address; try { address = InetAddress.getLocalHost(); } catch (final UnknownHostException e) { throw new IllegalStateException("Cannot get LocalHost InetAddress, please check your network!" ); } byte[] ipAddressByteArray = address.getAddress(); DefaultKeyGenerator.setWorkerId((long) (((ipAddressByteArray[ipAddressByteArray.length - 2]  0B11)  Byte.SIZE) + (ipAddressByteArray[ipAddressByteArray.length - 1]  0xFF))); }Copy the code

2.4 IPSectionKeyGenerator

Contribution from DogFc, modification of IPKeyGenerator.

After viewing the rules generated by the IPKeyGenerator worker process number, I find that the last 10 digits (especially IPV6) of the server IP address are relatively constrained. The optimization idea is as follows: Because the maximum number of working process is 2^10, we can generate the project process number as long as it is less than 1024. 1. For IPV4:... IP maximum And (255+255+255+255) 1024. ... . Therefore, a unique workerId can be generated by adding IP segment values, regardless of IP bits.

  1. In view of the IPV6:... Maximum FFFF. IP: FFFF: FFFF: FFFF: FFFF: FFFF: FFFF: FFFF... . To ensure that the project process number generated by the sum is less than 1024, the idea is to add the last six bits of each Bit. To some extent, this can also satisfy the workerId non-duplication problem. To use this method of generating worker process numbers from IP, it is necessary to ensure that IP segments do not add up repeatedly

For IPV6:2^ 6 = 64. 64 x 8 = 512 1024.

// static void initWorkerId() { InetAddress address; try { address = InetAddress.getLocalHost(); } catch (final UnknownHostException e) { throw new IllegalStateException("Cannot get LocalHost InetAddress, please check your network!" ); } byte[] ipAddressByteArray = address.getAddress(); long workerId = 0L; // IPV4 if (ipAddressByteArray.length == 4) { for (byte byteNum : ipAddressByteArray) { workerId += byteNum  0xFF; } // IPV6 } else if (ipAddressByteArray.length == 16) { for (byte byteNum : ipAddressByteArray) { workerId += byteNum  0B111111; } } else { throw new IllegalStateException("Bad LocalHost InetAddress, please check your network!" ); } DefaultKeyGenerator.setWorkerId(workerId); }Copy the code

666. The eggs

There are no eggs. HOHOHO

Dao friends, share a wave of friends can be good.

Thank you, technology so good, but also pay attention to my public number.

About (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.