One, foreword

For some images, articles, or user home pages, you need to construct urls to provide externally. When a URL is published, it is usually domain name/path/resource ID. The path is optional. For example, it may be domain name/resource ID when a short link is generated.

For example, nugget URL format: juejin.cn/user/299912… Juejin. Cn/post / 684490… Jane books URL: www.jianshu.com/u/11d3f06af… www.jianshu.com/p/3df395d8a…

How is the resource ID section at the end of these links constructed? It’s impossible to know for sure, but it’s good to guess.

The resource ID of the nugget, hexadecimal word code, 32 bytes, possibly UUID (minus delimiters) or MD5. Both UUID and MD5 are random, so we don’t worry about finding the rule. As the value range is 128bit, the probability of conflict is very small.

Simple book resource ID, hexadecimal code, 12 bytes, that is 48 bits, the value range of more than 200 trillion, enough to allocate, readable. However, the value range of 48bit is not random like the VALUE of UUID, otherwise it is prone to conflict. Therefore, the original ID is guessed by the segment ID or the increment ID. The increment ID is more likely because the algorithm like Snowflake requires a longer number of significant bits. But if the ID is incremented, it will show regularity (the preceding characters remain the same), and the other person can make several requests consecutively, calculate the incremented step, and then exhaust all of its resources. So, after getting the growing ID, there might be one encryption before the hexadecimal code.

So, how do you do encryption?

2. Encrypt the ID

The encryption result should meet the following points: 1. It is difficult to predict the original ID; 2. 2. Map the encrypted ID to the original ID one by one to avoid duplication; 3. Encrypt the integer ID(64-bit, with valid bits <=64). The length of the integer ID remains unchanged after encryption.

  • Point 1: MD5, SHA, AES, DES, etc.
  • Point 2: MD5 and other abstract algorithms are not satisfied, different text may calculate the same hash; For this to work, the function needs to be invertible.
  • The third point: AES, DES also do not meet. AES encryption results in the shortest 16 bytes. DES encrypts the original text less than 8 bytes, and the ciphertext is 8 bytes. To encrypt the original text of 8 bytes, the ciphertext is 16 bytes.

To sum up, we need a function that encrypts and decrypts Long values without “bloating” the ciphertext. Might as well go to see how AES is done, copy a homework.

Three, AES algorithm

Using AES128 as an example, look at the AES encryption steps. Part of the text and text from the article “code algorithm detailed explanation – AES” and “AES algorithm description and C language implementation”, encroachment and deletion.

3.1 Overall Process

The core operations of AES128 are encryption and decryption of 16-byte (128bit) contents. If the original text is longer than 16 bytes, it is grouped into groups of 16 bytes. If the last group is less than 16 bytes, it is completed to 16 bytes.

AES128 goes through 10 rounds of operations, with rounds 1-9 identical and round 10 slightly different. Each round of operation involves four sub-operations, such as byte substitution, row shift, column obfuscation and round key addition. Each suboperation has an inverse operation, and the original text can be decrypted by inverse operation in reverse order.

3.2 Byte Replacement (SubBytes)

Byte substitution requires an S-box, which has two arrays named SBOX and INV_SBOX. If y = SBOX[x], x = INV_SBOX[y]; X = INV_SBOX[SBOX[x]].

For example, a simple 2-order S-box:

    int[] s_box = new int[] {2.0.3.1};
    int[] inv_s_box = new int[] {1.3.0.2};
    for (int x = 0; x < s_box.length; x++) {
        System.out.print(inv_s_box[s_box[x]] + "");
    }
    System.out.println();
    for (int x = 0; x < s_box.length; x++) {
        System.out.print(s_box[inv_s_box[x]] + "");
    }
Copy the code

Output:

0, 1, 2, 3, 0, 1, 2, 3

AES S box is order 8 (8 x 8), just take a byte range (0-255), how to construct S box will not expand this article. Byte substitution operation:

	for (int j = 0; j < 16; j++) {
		state[j] = SBOX[state[j]];
	}
Copy the code

Reverse byte substitution:

	for (int j = 0; j < 16; j++) {
		state[j] = INV_S_BOX[state[j]];
	}
Copy the code

Byte substitution operations provide algorithmic obfuscation. As with code obfuscation, the original text is readable, but after obfuscation it becomes unreadable. However, the pattern does not change after the confusion. For example, the original method foo() becomes a() after the confusion, then all calls to foo() are a(). That is, there are still statistical characteristics after byte substitution.

3.3 ShiftRows

Row displacement is simple: take 16 bytes as a matrix of 4 rows and 4 columns, where 1, 2, and 3 rows are shifted 1, 2, and 3 bytes to the left (or to the right) respectively.

3.4 MixColumns

Column obfuscation is the most complex of all suboperations. As with row displacement, divide 16 bytes into 4 rows and 4 columns; The difference is that column obfuscation is performed on the four bytes of each column (and left multiplied by a 4×4 matrix). The decryption is also left multiplied by the matrix, and the decryption matrix is the inverse of the encryption matrix.

The operation code for a column is as follows:

static inline uint8_t mul2(uint8_t a) {
	return (a & 0x80)? ((a <<1) ^ 0x1b) : (a << 1);
}

/* * Left multiply permutation matrix * [d0] [02 03 01 01] [b0] * [d1] = [01 02 03 01]. [b1] * [d2] [01 01 02 03] [b2] * [d3] [03 01 01 02] [b3] * /
void multiply(uint8_t *b, uint8_t *d) {
	uint8_t t = b[0] ^ b[1] ^ b[2] ^ b[3];
	// d0 = (b0 << 1) + (b1 << 1) + b1 + b2 + b3 = ((b0 + b1) << 1) - b0 + t
	d[0] = mul2(b[0] ^ b[1]) ^ b[0] ^ t;
	d[1] = mul2(b[1] ^ b[2]) ^ b[1] ^ t;
	d[2] = mul2(b[2] ^ b[3]) ^ b[2] ^ t;
	d[3] = mul2(b[3] ^ b[0]) ^ b[3] ^ t;
}
Copy the code

Matrix operation requires addition operation and multiplication operation. The direct integer addition and multiplication of the computer may overflow and thus lose information irreversibly. So AES introduces “Galois domain computing”, interested readers can read the article “Galois domain computing and C language implementation” to learn about it.

Row displacement and column obfuscation together provide the diffusion of the algorithm. The purpose of diffusion is to make a single number in the plaintext affect multiple numbers in the ciphertext so that the statistical characteristics of the plaintext disappear in the ciphertext. Ideally, an avalanche effect is achieved:

Even the smallest change in the input (for example, reversing one bit) can cause a drastic change in the output (for example, reversing half of the bits in the output).

If only the column obfuscation operation is performed, the final effect is that the four groups [0,3], [4,7], [8,11], [12,15] are expanded separately; The line shift is added to achieve the diffusion effect of [0, 15] bytes (if one byte of plaintext changes, all 16 bytes of ciphertext will change randomly). Of course, you have to go through multiple rounds of computation to get the avalanche effect, and it’s not enough to just do one round of computation.

3.5 Adding round Key (AddRoundKey)

The round key addition is the simplest of the four word operations. Specifically, 16 bytes are xOR to the round key. The round key is calculated from the original key, and AES128 does a total of 11 rounds of key addition. Combined with the key operation, the algorithm provides the confidentiality. If there is no key operation, just like the door is not locked, and then the strong security door is useless.

4. Integer encryption scheme

AES is a typical SP cipher, the SP cipher network structure is very clear, S is called confusion layer (nonlinear layer), mainly plays a role of confusion, P is called diffusion layer, mainly plays a role of diffusion. In the AES algorithm, 1, use the S-box to do byte replacement operation to achieve the obfuscation effect; 2. By doing matrix operation (MixColumns) in Galois domain, the 4 bytes in the grouping are diffused; 3. At the same time, the subgroups of MixColumns are interlaced with row displacement, so as to realize the diffusion of the whole group; 4, finally mixed with the “key”, to achieve the confidentiality of the algorithm.

It’s worth noting that AES128’s group size is 16 bytes and can be used as a 4×4 matrix; The group size of an AES256 is 32 bytes and can be used as a 4×8 matrix. MixColumns operation can be viewed as a 4×4 matrix left multiplied by a 4xN matrix. For an 8-byte integer, we can also treat it as a 4×2 matrix for MixColumns.

So, we can use the structure and operation of AES to encrypt a value of type Long:

public class LongEncoder {
    private static final int ROUND = 8;

    private static final byte[] S_BOX = {
            99.124.119.123, -14.107.111, -59. };private static final byte[] KEY = {
            -14.40.52, -119, -126, -47.74.73. };private static byte mul2(byte a) {
        return (byte) (((a & 0x80) != 0)? ((a <<1) ^ 0x1b) : (a << 1));
    }

    public static void mix_column(byte[] s, int i) {
        byte t = (byte) (s[i] ^ s[1 + i] ^ s[2 + i] ^ s[3 + i]);
        byte b0 = (byte) (mul2((byte) (s[i] ^ s[1 + i])) ^ s[i] ^ t);
        byte b1 = (byte) (mul2((byte) (s[1 + i] ^ s[2 + i])) ^ s[1 + i] ^ t);
        byte b2 = (byte) (mul2((byte) (s[2 + i] ^ s[3 + i])) ^ s[2 + i] ^ t);
        byte b3 = (byte) (mul2((byte) (s[3 + i] ^ s[i])) ^ s[3 + i] ^ t);

        s[i] = b0;
        s[1 + i] = b1;
        s[2 + i] = b2;
        s[3 + i] = b3;
    }

    private static void shift_rows(byte[] state) {
        byte t1 = state[7];
        byte t0 = state[6];
        state[7] = state[5];
        state[6] = state[4];
        state[5] = state[3];
        state[4] = state[2];
        state[3] = state[1];
        state[2] = state[0];
        state[1] = t1;
        state[0] = t0;
    }

    public static long encode64(long value) {
        byte[] state = long2Bytes(value);
        for (int i = 0; i < ROUND; i++) {
            for (int j = 0; j < 8; j++) {
                int m = ((i << 3) + j);
                // AddRoundKey and SubBytes
                state[j] = S_BOX[(state[j] ^ KEY[m]) & 0xFF];
            }
            shift_rows(state);
            mix_column(state, 0);
            mix_column(state, 4);
        }
        for (int j = 0; j < 8; j++) {
            state[j] ^= KEY[(ROUND << 3) + j];
        }
        returnbytes2Long(state); }... }Copy the code

In fact, the length (in bytes) of the input and output does not have to be a multiple of 4. For example, for 6 bytes of input, you can do this:

    public static long encode48(long value) {
        byte[] state = long2Bytes(value);
        for (int i = 0; i < ROUND; i++) {
            for (int j = 0; j < 6; j++) {
                int m = ((i << 3) + j);
                // AddRoundKey and SubBytes
                state[j] = S_BOX[(state[j] ^ KEY[m]) & 0xFF];
            }
            // ShiftRows is not needed for 48bit input
            // MixColumns for [0,3], [2,5] can spread the entire input
            mix_column(state, 0);
            mix_column(state, 2);
        }
        for (int j = 0; j < 6; j++) {
            state[j] ^= KEY[(ROUND << 3) + j];
        }
        // The output Long, with the highest two bytes unchanged
        // So if the input is less than 2^48, the output is also less than 2^48 array
        return bytes2Long(state);
    }
Copy the code

After the encrypted value is converted to the base, the ID in the form of a string can be output:

    private static final byte[] HEX_DIGITS = {
            '0'.'1'.'2'.'3'.'4'.'5'.'6'.'7'.'8'.'9'.'a'.'b'.'c'.'d'.'e'.'f'};

    /** * Long values less than 2^48 are converted to a hexadecimal string *@paramN An integer of the type long *@returnA 12-byte string (hexadecimal) */
    public static String long48ToHex(long n) {
        if((n >>> 48) > 0) {throw new IllegalArgumentException(n + " is bigger than 2^48");
        }
        byte[] buf = new byte[12];
        for (int i = 5; i >= 0; i--) {
            int b = (int) n;
            int index = i << 1;
            buf[index] = HEX_DIGITS[(b >> 4) & 0xF];
            buf[index + 1] = HEX_DIGITS[b & 0xF];
            n = n >>> 8;
        }
        return new String(buf);
    }
Copy the code

At this point, we are done encrypting and encoding the integer ID. As for generating integer ids, there are many “ID generation schemes” on the Internet, which we will not expand here.

Five, the summary

Encrypting Long values can be used not only to construct the ID of a resource, but also for other purposes. For example, in my earlier article, “Talking about Unique Device ids,” I mentioned the need to compute another ID(as the UDID) from the primary key ID and return it to the client. The method in this paper can meet the requirements.

The complete code has been uploaded to Github at github.com/No89757/Lon…

[1] Block cipher [2] Galohua domain operation and C language implementation [3] Code algorithm detail — AES [4] AES algorithm description and C language implementation [5] Avalanche effect [6] Talk about unique device ID