This article is originally shared by Rongyun Technology team and published on the public account of “Rongyun Global Internet Communication Cloud” with the original title “How to Achieve the generation of unique ID in distributed Scenarios”. , im network included some changes.

1, the introduction

For IM applications, message IDS (or serial numbers) are one of those seemingly insignificant but very important things.

Message ids are used throughout all aspects of IM technology logic, such as:

1) Guarantee the order of chat messages;

2) De-weighting of QoS delivery guarantee mechanism of chat messages;

3) Accurate search and matching of specific chat messages;

4) Chat message read or unread processing;

5) Receipt of delivery of chat messages;

6) Spread read pull mark of group chat messages;

7)… .

However, IM system is highly personalized (there is no unified design standards and ideas), including the generation algorithm of chat message ID, each product has its own ideas and considerations.

Common message ID generation strategies are:

1) UUID: this method is simple and intuitive, which can well ensure uniqueness, but the ID length will be a little long for technical cleanliness frets;

2) Use the timestamp long integer: this is the most lazy. It can also be used in scenarios with small throughput, but it has the risk of repetition and cannot guarantee the uniqueness under distributed conditions.

3) Use Twitter’s open source Snowflake algorithm: this is also a good choice in distributed, high-concurrency situations;

4) Generate sequential ids according to the independent ID generation space used by users: for example, wechat’s message serial number generation strategy is very good.

In a sense, the message ID generation strategy determines the difficulty of realizing some functions in THE IM application layer. A good message ID generation strategy will make the development of IM products more smooth, and vice versa.

This article is to share rongyun instant messaging cloud products in the chat message ID generation algorithm and strategy, a 19 byte ID can contain: timestamp, message type, session ID, serial number, small ID, large purpose, worthy of reference!

Disclaimer: This article from rongyun official technical team to share, only for the purpose of technical exchange and study, do not use for illegal purposes, if the article involves trade secrets, please inform me!

Special note: only for technical research and learning purposes to share this article, did not receive any benefits, so this article is not advertising, I am not entrusted. If there is anything wrong, please let me know!

Learning and communication:

– Im/Push technology development communication group 5:215477170[Recommended]

– Mobile IM Development primer article: Just one Primer: Developing Mobile IM from Scratch

(This article is simultaneously published at: www.52im.net/thread-2747…)

2. Related articles

  • Wechat Technology Sharing: Practice of Generating Serial number of wechat’s Massive IM Chat Messages (Part of Algorithm Principle)
  • Wechat Technology Sharing: Practice of Generating serial Numbers of wechat’s Massive IM Chat Messages (Disaster Recovery Solution)

3. Technical background

For a distributed IM system, the ids of each message must be globally unique in the cluster and ordered by generation time. How to generate the unique ID of message data quickly and efficiently is the key factor affecting system throughput.

So, how does Rongyun achieve the global unique message ID generation?

First, we need to clarify the core requirements for ID generation:

1) globally unique;

2) Order.

4. Design ideas

The unique ID length of rongyun message data adopts 80 bits.

Every five bits are encoded in 32-bit and converted to one character. The value of the character ranges from 2 to 9 and from A to B. Of these, the numbers 0 and 1 (8 remaining digits) and the letters O and I (24 remaining digits) have been removed, making the total number of available characters 32 (just enough to encode in base 32).

In this way, 80 bits can be converted into 16 characters, plus three separators (-), and the 16 characters are divided into four groups, resulting in a 19-character unique ID in the form of “bD8U-FCOJ-LDC5-L789”. This design ensures that the generated ids are in order and easy to read.

As shown in the figure above, the 80 bits are divided into four segments.

1) The first paragraph is 42 Bit: this is used to store the timestamp up to 2109, which is enough for developers to use now. The timestamp data is placed high to ensure that the generated unique ID is time-ordered, which is the condition that the message ID must meet.

2) the second paragraph 12 Bit: used to store the rotation ID. We know that the timestamp is millisecond accurate, and it is quite normal for a hundred million level IM system to generate multiple messages in the same millisecond. The spin ID is auto-numbering the messages that fall in the same millisecond. Twelve bits means that a maximum of 4096 (2 ^ 12) messages can be identified in a single host in the same millisecond.

3) The third paragraph 4 Bit: identifies the session type. 4 bits: identifies a maximum of 16 sessions, sufficient to cover common session types such as single chat, group chat, system messages, chat rooms, customer service, and public accounts.

4) Segment 4 22 Bit: indicates the session ID. For example, group ID in group chat, chat room ID in chat room, etc. Combined with the third session type, it uniquely identifies a session. Other ID generation algorithms reserve two sections to identify the data center number and the host number (such as SnowFlake). We do not do this, but use these two sections to identify the session. In this way, ID generation can be directly integrated into service services, and there is no need to care about the host where the service resides, achieving stateless expansion and shrinkage.

5. Code implementation

The message ID has a total of 80 bits, and we divide the calculation into two parts: the high 64 bits (denoted as highBits) and the low 16 bits (denoted as lowBits).

The specific code implementation process is roughly as follows.

1) Get the current system timestamp and assign the high 64 Bit of the message ID:

2) Get a spin ID, move highBits 12 bits to the left, and splice the spin ID into the lower 12 bits:

Where, spin ID is an increasing number from 0 to 4095, and the generation rule is as follows:

3) Move highBits to the left by 4 bits to concatenate the session type to the lower 4 bits:

4) Select the lower 22 bits of the session ID hash value as sessionIdInt:

5) Move highBits left 6 bits and concatenate the high 6 bits of sessionIdInt into the low 6 bits of highBits:

6) Select the lower 16 bits of the session ID as lowBits:

7) highBits and lowBits are concatenated to obtain an 80-bit message ID, which can be encoded in 32-bit to obtain a unique message ID:

Encoding rule: From left to right, every 5 bits are converted to an integer. If the integer is used as a subscript, the corresponding character can be found in the following table.

6. Practical application

PS: If you feel that the algorithm ideas and code implementation introduced in the above two sections are a little abstract, you can directly look at the actual message ID generation in Rongyun’S IM products.

For example, from the Demo product of Rongyun, three single chat message ID samples of the same user in a similar period of time are taken:

BD8U-DG1U-5UI5-L789

BD8U-DU6D-2205-L789

BD8U-FCOJ-LDC5-L789

For example, you can directly log in to rongyun’s Web productsweb.sealtalk.imTo learn the message ID generation on the browser side:

Special note: only for technical research and learning purposes to share this article, and did not receive any benefits of rongyun, so this article is not advertising, I am not entrusted. If there is anything wrong, please let me know!

7. Precautions

This ID generation method used by Rongyun needs to be noted:

1) Ensure that spin ID generation is thread-safe;

2) Avoid generating the same ID in the case of concurrency;

3) This ID generation algorithm is strongly dependent on the system time. If the system time is changed to a smaller value, it may also cause duplicate ID generation.

Appendix: More articles on hot technologies for IM development

Mobile IM Developers must read (1) : Easy to understand, understand the “weak” and “slow” of mobile Network (2) : Mobile IM developers must read (2) : Most complete history of moving weak network optimization method summary “from the client point of view to talk about the news of the mobile terminal IM reliability and service mechanism, the modern mobile terminal network short connection optimization to summarize: request speed and weak network security, and adapt the tencent technology sharing: social network road to the bandwidth of the image compression technology evolution, the small white stuff: How to Ensure the efficiency and Real-time performance of Mass Group Message Push in Mobile IM? “Technical Issues in mobile IM Development” “Is it better to design your own protocol to use byte stream or character stream for IM development?” Does anyone know the mainstream implementation of voice mail chat? Realization of IM Message Delivery Guarantee Mechanism (I) : Ensuring reliable Delivery of Online Real-time Messages Realization of IM Message Delivery Guarantee Mechanism (II) : Ensuring Reliable Delivery of Offline Messages How to Ensure the “Timing” and “Consistency” of IM Real-time Messages “A Low-cost Method to Ensure IM Message Timing”. “Should I Use” Push “or” pull “to synchronize online status in IM Chat and Group Chat?” IM group chat messages are so complex, how to ensure that not lost and not heavy? Discussion on the Optimization of Login Request in IM Development on Mobile Terminal how to Save Traffic by Pulling Data during IM Login on Mobile Terminal On the Principle of Multi-point login and Message Roaming in MOBILE IM, How to design the “Retry Failure” mechanism in COMPLETELY Self-developed IM? “Easy to Understand: Cluster Based Mobile TERMINAL IM Access Layer Load Balancing Scheme sharing” “Wechat influence on the Network technology Test and Analysis (full paper)” “Instant Messaging System principle, Technology and Application (Technical paper)” “Open source IM project” Mogujie TeamTalk “status: A begin well but end badly open source show “, “tencent original share (a) : how to boost mobile network under the mobile phone QQ picture transmission speed and success rate of” “and go to: WeChat for the mobile terminal of the IM network layer cross-platform component library Mars has officially open source” the Yelp based on social networks is how to realize the mass user image lossless?” Tencent technology Sharing: How Tencent greatly reduced bandwidth and Network traffic (audio and video technology), the character coding thing: A quick understanding of ASCII, Unicode, GBK and UTF-8; a comprehensive grasp of the features, performance and tuning of mainstream picture formats on mobile terminals; Behind bullet SMS: Chief architect of NetEase Yunit Shares the technical practice of hundred-million-level IM Platform; Easy to understand, correctly understand and use a good MQ message queue “wechat technology sharing: wechat massive IM chat message serial number generation practice (algorithm principle)” their own IM development is so difficult? Hand teach you from a Simple Andriod version OF IM (source) “Melt cloud technology sharing: Decrypting the chat message ID generation strategy of Rongyun IM products

(This article is simultaneously published at: www.52im.net/thread-2747…)