Introduction to the

Argon2 is a key derivation function that was chosen as the winner of the Cryptography Hashing Competition in July 2015. It was designed by Alex Biryukov, Daniel Dinu, and Dmitry Khovratovich of the University of Luxembourg, Argon2 implementations are usually released under the Creative Commons CC0 License (i.e., public domain) or Apache License 2.0, and are available in three related versions, Argon2d, Argon2i, and Argon2id.

This article will discuss the principles and use of Argon2.

Key derivation function Key Derivation Function

In cryptography, a key derivation function (KDF) is a cryptographic hash function that uses a pseudo-random function to derive one or more keys from a secret value such as a master key, password, or password. KDF can be used to stretch a key into a longer key, or to obtain a key in a desired format, such as converting the result of diffie-Hellman key exchange into a symmetric key for AES.

Password Hashing Competition

Although cryptography is the study of password, but its encryption algorithm is the more open the better, only open to review the algorithm, only through thorough research, can let the algorithm be used and spread in the industry.

The most famous cryptographic algorithm competition is certainly the one held by NIST in 2001 to specify the standard AES algorithm, the purpose of the competition is to find the latest encryption algorithm to replace the old DES algorithm. In this competition, many excellent algorithms emerged, Including CAST-256, CRYPTON, DEAL, DFC, E2, FROG, HPC, LOKI97, MAGENTA, MARS, RC6, Rijndael, SAFER+, Serpent, and Twofish. Finally Rijndael algorithm was selected as the final IMPLEMENTATION of AES algorithm.

PHC is also one of these algorithms contests, and unlike NIST, it is an unofficial contest organized by cryptographers. It was launched by Jean-Philippe Aumasson in the fall of 2012.

In the first quarter of 2013, a call for submissions was issued and 24 submissions were received by the closing date of March 31, 2014. In December 2014, a shortlist of nine was decided. In July 2015, Argon2 was announced as the winner.

Argon2 algorithm

The Argon2 is a simple design designed to achieve the highest memory fill rate and efficient utilization of multiple cells, while also providing defense against Tradeoff Attacks (by leveraging the processor’s cache and memory).

The Argon2 comes in three variants. Argon2i, Argon2d and Argon2id. Argon2d is faster and uses data-dependent memory access, which makes it highly resistant to GGPU hacking attacks and suitable for applications without the threat of Side-channel Timing attacks (such as cryptocurrencies).

Argon2i uses data-independent memory access, which is preferred for password hashing and password-based key derivation algorithms, and is characterized by slower speeds because it runs more processing logic on memory to prevent tradeoff attacks.

Argon2id is a hybrid of Argon2i and Argon2d that combines data-dependent and data-independent memory access to protect against both Side-channel Timing attacks and GPU hacking attacks.

Argon2 input parameter

Argon2 has two types of input parameters: primary inputs and secondary inputs.

Primary inputs include the messages P and nonce S for password and salt to encrypt.

The length of P is 0 to 232-1 bytes, and the length of S is 8 to 232-1 bytes (16 bytes are recommended for password hash).

They are called primary inputs because these two parameters must be entered.

The remaining parameters are called secondary inputs, which include:

  • The degree of parallelism, p, indicates how many independent computing chains can run simultaneously, ranging from 1 to 224-1.
  • The Tag length τ ranges from 4 to 232-1 bytes. ‘
  • Memory size m, in megabytes, from 8P to 232-1.
  • The number of iterators, t, improves the running speed. The value ranges from 1 to 232-1.
  • Version number v, one byte, 0x13.
  • The safe value K is 0 to 232-1 bytes long.
  • Additional data X, 0 to 232-1 bytes in length.
  • Type of Argon2. 0 indicates Argon2d, 1 indicates Argon2i, and 2 indicates Argon2id.

These inputs can be represented by the following code:

Inputs: password (P): Bytes (0.. 232-1) Password (or message) to be hashed salt (S): Bytes (8.. 232-1) Salt (16 bytes recommended for password hashing) parallelism (p): Number (1.. 224-1) Degree of parallelism (i.e. number of threads) tagLength (T): Number (4.. 232-1) Desired number of returned bytes memorySizeKB (m): Number (8p.. 232-1) Amount of memory (in kibibytes) to use iterations (t): Number (1.. 232-1) Number of iterations to perform version (v): Number (0x13) The current version is 0x13 (19 decimal) key (K): Bytes (0.. 232-1) Optional key (Errata: PDF says 0.. 32 bytes, RFC says 0.. 232 bytes) associatedData (X): Bytes (0.. 232-1) Optional arbitrary extra data hashType (y): Number (0=Argon2d, 1=Argon2i, 2=Argon2id) Output: tag: Bytes (tagLength) The resulting generated bytes, tagLength bytes longCopy the code

Processing flow

Let’s take a look at the algorithm flow of non-parallel Argon2:

Non-parallel Argon2 is the simplest.

In the figure above, G represents a compression function that takes two 1024 bytes of input and one 1024 bytes of output.

I represents the number of steps performed, and φ(I) above is the input, taken from memory space.

As a memory-hard algorithm, one important job is to build initial memory. Next, let’s look at how to build the initial memory space.

First, we need to build H0, which is a 64-byte block value, from which we can build more blocks. The formula for calculating H0 is as follows:

Tau H0 = H (p, m, t, v, y, ⟩ ⟨ p, p, ⟨ S ⟩, S, ⟩ ⟨ K, K, ⟨ ⟩ X, X)

It is the H function of the input parameter we mentioned earlier. H0 has a size of 64 bytes.

Take a look at H0 code generation:

Generate initial 64-byte block H0. All the input parameters are concatenated and input as a source of additional entropy. Errata: RFC says H0 is 64-bits; PDF says H0 is 64-bytes. Errata: RFC says the Hash is H^, The PDF says it's ℋ (but doesn't document what ℋ is). It's actually Blake2b. Variable length items are prepended with Buffer ← Parallelism ∥ tagLength ∥ memorySizeKB ∥ Iterations ∥ Version ∥ HashType ∥ Length(password) ∥ password ∥ Length(salt) ∥ salt ∥ Length(key) ∥ key ∥ Length(associatedData) ∥ AssociatedData H0 ← Blake2b(buffer, 64) // Default Hash size of Blake2b is 64-bytesCopy the code

For input parameter parallelism p, the memory needs to be divided into a memory matrix B[I][j], which is a matrix of P rows.

Calculate the value of matrix B:

Where H ‘is a variable-length hash algorithm based on H.

Let’s give an implementation of this algorithm:

Function Hash(message, digestSize) Inputs: message: Bytes (0.. 232-1) Message to be hashed digestSize: Integer (1.. 232) Desired number of bytes to be returned Output: digest: Bytes (digestSize) The resulting generated bytes, digestSize bytes long Hash is a variable-length hash function, built using Blake2b, capable of generating digests up to 232 bytes. If the requested digestSize is 64-bytes or lower, Then we use Blake2b directly if (digestSize <= 64) then return Blake2b(digestSize ∥ message, digestSize) //concatenate 32-bit little endian digestSize with the message bytes For desired hashes over 64-bytes (e.g. 1024 bytes for Argon2 blocks), we use Blake2b to generate twice the number of needed 64-byte blocks, and then only use 32-bytes from each block Calculate the number of whole blocks (knowing we're only going to use 32-bytes from each) r ← Ceil(digestSize/32)-1; Generate r whole blocks. Initial block is generated from message V1 ← Blake2b(digestSize ∥ message, 64); Subsequent blocks are generated from previous blocks for I ← 2 to r do Vi ← Blake2b(vi-1, 64) Generate the final (possibly partial) block partialBytesNeeded ← digestSize -- 32*r; Vr + 1 please Blake2b (Vr, partialBytesNeeded) Concatenate the first 32-bytes of each block Vi (except the possibly partial last block, Which we take the whole thing) Let Ai represent the lower 32-bytes of block Vi return A1 ∥ A2 ∥... ∥ Ar ∥ Vr + 1Copy the code

If we have more than one iteration, that is to say t > 1, we calculate the B of the next iteration as follows:


B t [ i ] [ 0 ] = G ( B t 1 [ i ] [ q 1 ] . B [ i ] [ j ] ) The radius B t 1 [ i ] [ 0 ] B^{t}[i][0]=G\left(B^{t-1}[i][q-1], B\left[i^{\prime}\right]\left[j^{\prime}\right]\right) \oplus B^{t-1}[i][0]


B t [ i ] [ j ] = G ( B t [ i ] [ j 1 ] . B [ i ] [ j ] ) The radius B t 1 [ i ] [ j ] B^{t}[i][j]=G\left(B^{t}[i][j-1], B\left[i^{\prime}\right]\left[j^{\prime}\right]\right) \oplus B^{t-1}[i][j]

After finally traversing T times, we get the final B:


B final  = B T [ 0 ] [ q 1 ] The radius B T [ 1 ] [ q 1 ] The radius The radius B T [ p 1 ] [ q 1 ] B_{\text {final }}=B^{T}[0][q-1] \oplus B^{T}[1][q-1] \oplus \cdots \oplus B^{T}[p-1][q-1]

The final output is:


T a g please H ( B final  ) \mathrm{Tag} \leftarrow H^{\prime}\left(B_{\text {final }}\right)

This logic can also be expressed in code:

Calculate number of 1 KB blocks by rounding down memorySizeKB to the nearest multiple of 4*parallelism kibibytes BlockCount please Floor (memorySizeKB, 4* Parallelism) Allocate two-dimensional array of 1 KiB blocks (Parallelism rows X columnCount columns) columnCount ← blockCount / parallelism; //In the RFC, columnCount is referred to as q Compute the first and second block (i.e. column zero and one ) of each lane (i.e. row) For I ← 0 to parallelism-1 do for each row Bi[0] ← Hash(H0 ∥ 0 ∥ I, //Generate a 1024-byte digest Bi[1] ← Hash(H0 ∥ 1 ∥ I, Generate a 1024-byte digest Compute Remaining columns of each lane for I ← 0 to ParallelISM-1 do //for each row For J ← 2 to columncount-1 do //for each subsequent column // I 'and J' indexes Depend if it's Argon2i, Argon2d, Or Argon2id (See section 3.4) I ', j '← GetBlockIndexes(I, j) //the GetBlockIndexes function is not defined Bi[j] = G(Bi[j-1], Bi '[J']) // The G hash function is not defined Further passes when iterations > 1 for nIteration ← 2 to iterations do for I ← 0 to ParallelISM -1 do for each row for J ← 0 to columnCount-1 do //for each subsequent column // I 'and J' indexes Depend if it's Argon2i, Argon2d, or Argon2id (See Section 3.4) I ', j '← GetBlockIndexes(I, J) if j == 0 then Bi[0] = Bi[0] xor G(Bi[1], Bi '[j]) else Bi[j] = Bi[j] xor G(Bi[j-1], Bi '[j']) Compute final block C as the XOR of the last column of each row C ← B0[columnCount-1] for I ← 1 to Parallelism -1 do C ← C xor Bi[columnCount-1] Compute output tag return Hash(C, tagLength)Copy the code

This article is available at www.flydean.com/40-argon2/

The most popular interpretation, the most profound dry goods, the most concise tutorial, many tips you didn’t know waiting for you to discover!

Welcome to pay attention to my public number: “procedures those things”, understand technology, more understand you!