Brief introduction to the details of postGRE – SQL UUID generation method

preface

I am curious about gen_random_uuid() when I am writing business SQL recently. I am curious about whether it has strong consistency in the case of high concurrency (that is, ensure primary key uniqueness). I have researched a wave and found many interesting things to discuss. So I wrote this article to talk about it.

The premise condition

I suspect that many readers have no idea what postgreSql is, and that it is difficult to remember the name at first, so people in the industry refer to it as: Post-gres-s-q-l, which I prefer to call PG-SQL, has several prominent features in this data.

  1. Fully open source, community-maintained, and community-active.

  2. NoSQL: JSON, JSONB, XML, HStore native support, external data wrapper for NoSQL databases

  3. Since mysql was acquired by Oracle, the original authors of mysql have also been involved, and we can see a lot of shadow of mysql, such as OLTP and MVCC implementation.

Unfortunately, postgresQL is not as popular as mysql, although it seems to be better than mysql, and mysql seems to have a lot of history problems, but it is still the mainstream database. The topic of today is not about pG-SQL, so let’s focus on postGRE-SQL generating UUID methods.

Purpose of the article

For this article, we will discuss the following topics:

  • Gen_random_uuid()How did you get there?

PostgreSQL 13: Added the built-in function Gen_random_uuid() to generate UUID data, in other words, the previous version required manual installation

  • uuid_generate_v4()Is it possible to repeat?

The answer is yes, even SQL’s original gen_randowm_uUID method can be repeated, but there are some “special conditions”, and we’ll explore why.

  • Compare the implementation differences between gen_randowm_uUID () and uuid_generate_V4 functions.

The main difference is the way to generate random numbers, other work is basically the same.

1. Gen_random_uuid()How did you get there?

If the postGRE-SQL version is used before 13, the following question will be raised:

function gen_random_uuid() does not exist
Copy the code

If you want to be able to use this method, you need to use the following command, using pgcrypto:

CREATE EXTENSION pgcrypto;
Copy the code

The following is the postgresqL-SQL 12 version.

# SELECT gen_random_uuid();
ERROR:  function gen_random_uuid() does not exist
LINE 1: select gen_random_uuid();
               ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
# CREATE EXTENSION pgcrypto;
CREATE EXTENSION
# SELECT gen_random_uuid();
           gen_random_uuid            
--------------------------------------
 19a12b49-a57a4 -f1e-8e66- 152.be08e6165
(1 row)
Copy the code

Note that version 13 does not add this library, but a UUID function that relies on the OSSP library. Gen_random_uuid_v4 () is a function called gen_random_uuid_v4().

1.1 gen_random_uuid_V4 () Official introduction

Function has what function need not say more, the following is about the official introduction:

  We create a uuid_t object just once per session and re-use it for all  
  operations in this module.  OSSP UUID caches the system MAC address and  
  other state in this object.  Reusing the object has a number of benefits:  
 saving the cycles needed to fetch the system MAC address over and over,  
 reducing the amount of entropy we draw from /dev/urandom, and providing a  
 positive guarantee that successive generated V1-style UUIDs don't collide. (On a machine fast enough to generate multiple UUIDs per microsecond, or whatever the system's wall-clock resolution is, we'd otherwise risk collisions whenever random initialization of the uuid_t's clock sequence  
  value chanced to produce duplicates.) 
Copy the code

If you don’t understand, here’s what Google Translate says:

We just create one uuid_t object per session and reuse it for everyone in this module. OSSP UUID Caches the system MAC address and other states in this object. Reusing objects has many benefits: 1. Saves the cycle of getting the system'S MAC address over and over again, 2. Reduces the amount of entropy we extract from /dev/urandom and provides a positive guarantee that continuously generated V1 style UUID does not conflict. (Generate multiple UUids per microsecond on a fast enough machine, or whatever the clock resolution of the system is, otherwise we run the risk of creating duplicates whenever a clock sequence of uuid_t is randomly initialized with conflicting values.)Copy the code

After translation, I found that it is still difficult to say, and I can’t understand it. So HERE I find another article. Here is a brief description of the content and meaning of this article.

Ispostgres’ uuid_generate_V4 securely random?

Question: The general question is that he uses postgresQL V4 UUID () to generate a key token for an Access token, and asks if it is thread safe (uUID is unique), and if it needs to be unique on the application side, which fits the topic of this article.

Here are some personal answers after analysis (everyone has different understanding ability and may not be completely correct) :

  1. First, uuid_generate_V4 relies on the uID-ossp library, and version 13 of Postgres relies on this implementation for its Uuid.

  2. OSSP (version 1.6.2) source code indicates that this code uses **/dev/urandom** on Unix-like systems (CryptGenRandom () on Windows), PRNG with poor reliability based on current time, process ID, and C library rand() function, However, the use of xor for these three results can greatly avoid duplication, and the use of /dev/urandom is sufficient to ensure strong randomness.

  3. But if /dev/urandom fails for some reason (for example, the process is running out of available file descriptors at the time), the library falls back to using only weak PRNG without warning, which is scary, and increases the randomness and likelihood of concurrent use of the same clock node.

In summary, although uID-OSSP can guarantee strong uniqueness under normal circumstances, it is possible to degenerate into weak uniqueness. In the worst case, using machine hour to generate Uuid will result in duplicate UUID. It is advisable not to rely on the strong randomness of postgresQL-generated UUID, but to explicitly use strong randomness sources on the application side.

Some readers will ask what is PRNG? Pseudo Random Number Generator (PRNG), also known as Deterministic Random Bit Generator (DRBG), [1] is an algorithm that generates a sequence of numbers, whose properties are similar to those of random number sequences. The sequence generated by PRNG is not truly random, so it is entirely determined by an initial value called the PRNG’s random seed (but this seed may contain truly random numbers). Although near-true random sequences can be generated by hardware random number generators, pseudo-random number generators are also important in practice because of their speed and reproducibility. [2].

Aside: A lot of strategy games actually use pseudo-random number algorithms.

1.2 summary

With the introduction above, we have a general understanding of the basic implementation of uuid_generate_V4 (). Let’s make a brief summary.

Situation:

  1. Duplicates occur if the machine produces more than one UUID in the same microsecond, and **/dev/urandom** fails.

  2. If HAVE_UUID_OSSP is not defined, you need to call uuid_GENERate_time or UUid_GENERate_random of the OS to generate a UUID.

  3. In most cases, if you only need a random number generator, the official recommendation is to use PGcrypto’s gen_random_uuid().

2. Contrastgen_random_uuid()uuid_generate_v4()

Yes, this is also a reference to the article, and compared many sources found the following answer is clear:

PostgreSQL generates UUids in two different ways: gen_random_uUID and uuid_generate_v4s

The first is the direct difference between the two functions:

  • The gen_random_uuid() extension provides PGcrypto

  • Uuid_generate_v4 () extension provides uuID-OSSP

The fundamental conclusion drawn from this reference is:

  • Uuid_generate_v4 () uses arc4random to determine the random part.

  • Gen_random_uuid () uses Fortuna instead of the implementation.

The difference between these two algorithms will not be expanded here. If there are conditions, you can take a look at the following wiki link introduction. If you can’t access it, you can also access the Internet by yourself.

Refer to the article: en.wikipedia.org/wiki/RC4#RC…

conclusion

In fact, this article is to tell you that the uUID generation method of the database needs to be combined with the actual business to ensure the strong uniqueness of THE UUID. If necessary, it is strongly recommended not to rely on the implementation of the database, especially in the case of a very high concurrency, it is very unreliable.

Finally, if you find any mistakes, you are welcome to correct them.

Further reading

1. How to create a random string that fits a PostgreSQL session ID?

  • www.itranslater.com/qa/details/…