background

The coding principle of Base64 is more explained online, but the decoding principle is less explained, and the internal implementation principle is not analyzed. Want to thoroughly understand the base64 encoding and decoding principle, please patiently read this article, you will gain.

It deals with algorithms and logical operation concepts

In the process of exploring the coding and decoding principles of Base64, we first need to understand the concepts of algorithms and logical operations that will be used below, so as to truly understand the coding and decoding principles of Base64, experience the subtlety of the algorithm, and even get unexpected harvest in the process of thinking. Those who do not know the Base64 encoding table and ascII encoding table can go directly to the end of the article.

Short division

A short division operation divides a divisor by a prime number that can be divisible by it, and so on, until the quotient is prime. By short division, a decimal number can be continuously divided by 2 to get multiple remainders. Finally, the remainder is permutations from bottom to top to get the binary number. Let’s take the ascII code 110 corresponding to the character N as an example.

110/2 = 55... 0 55/2 = 27... 1 27/2 = 13... 1 13/2 = 6... 1 6/2 = 3... 0 3/2 = 1... 1 1/2 = 0... 1Copy the code

By permutation and combination of remainder from bottom to top, the ascII code 110 to binary corresponding to character N is 1101110. Since a byte corresponds to 8 bits (bits), 0 needs to be added forward to complement 8 bits to get 01101110. The other characters are equally valid.

Sum by weight

The sum is expanded by weight, 8-bit binary numbers from right to left in ascending order from 0 to 7, base * base number, summing from left to right, resulting in the corresponding decimal number. Let’s take the binary number 01101110 to base 10 as an example:

01101110 = 0 * 20 + 1 * 21 + 1 * 22 + 1 * 23 + 0 * 24 + 1 * 25 + 1 * 26 + 0 * 27

A concept

In a binary number system, each 0 or 1 is a bit, also called a storage unit. A bit is the smallest unit of data storage. Eight bits are called a Byte.

Shift operator

In programming, the shift operator is a type of bit-operation operator. Shift operators can shift numbers on a binary basis. According to the direction of translation and the rules for filling numbers, there are three types: <<(left move), >>(right move with sign) and >>>(unsigned right move). We operate on positive numbers again in base64 encoding and decoding, so we use only the <<(left shift) and >>(signed right shift) operators.

  1. Left shift operation: it is to move the operand of a binary bit to the left according to the specified number of moving bits.
  2. Right shift operation: it is to move the operand of a binary bit to the right according to the specified number of moving bits. The moved bits are discarded, and the left bits are filled with 0, or sign bits, depending on the machine. In machines that use complement as the machine number, the sign bit for positive numbers is 0 and the sign bit for negative numbers is 1.

We use plain English to describe the left shift, there are 8 seats, there are 8 people sitting, in the case of 8 seats not moving, now I ask these 8 people to move 2 seats to the left, so the two people on the far left stand up, there are no seats to sit on, and the two seats on the far right open up. The shift operation is equivalent to the person standing up out, leaving the empty space to fill the 0.

    / / shift to the left
    01101000 << 2 -> 101000(left side removed is discarded) ->10100000All seats on the right will be filled0)
    / / moves to the right
    01101000 >> 2 -> 011010(The right side is removed and discarded) ->00011010All seats on the left will be filled0)
Copy the code

And operation, or operation

And operation, or operation is a basic logical operation in the computer.

  1. And operation: the symbol is &. Operation rule: The result is “1” only when the two digits are “1” at the same time; otherwise, it is 0
  2. Or operation: the symbol is |. Operation rule: if one of the two digits is “1”, the result is “1”, otherwise, it is 0

What is Base64 encoding

Base64 encoding is to divide the string into 4 6-bit bytes (6-bit valid bytes, the two leftmost bytes are always 0, which are actually 8-bit bytes) in every 3 8-bit byte subsequences, and then search the encoding index table of Base64 for the obtained subsequences. An encoding in which the corresponding characters are concatenated into a new string.

The splitting process of every three 8-bit byte subsequences into four 6-bit bytes is shown in the figure below:

Why is base64 encoded 3/4 times larger

Since the largest common multiple of 6 and 8 is 24, three eight-bit bytes can be split into four six-bit bytes, three8 = 64. In a computer, since it takes eight memory units to store a byte, we need to add two zeros to the six bits to make up eight bits. As shown below:Obviously, 32 storage units are needed, which is 3/4 times the original 24. Now you can see why base64 is encoded to be four thirds the size.

Why base64?

This is because there are 2 to the power of 6 binary digits (00000000-00111111), which are 64 binary digits between 0 and 63.

Isn’t it said that a byte is represented in 8-bit binary, why not 2 to the eighth?

Since the first two bits of the 8-bit binary we get are always zeros, and there are only six truly significant bits, the number of binary numbers we can get is only 2 to the sixth.

What are the 64 Base64 characters?

The encoding index table of Base64, the characters are “A-Z, A-Z, 0-9, +, /” 64 printable characters to represent the 64 binary numbers (00000000-00111111). namely

    let base64EncodeChars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
Copy the code

Coding principle

Let’s think about it for a second, what can we do to split 3 bytes into 4 bytes? What is the difference between your idea of implementation and mine, and what kind of spark will we touch?

The flow chart

Train of thought

Analyze the mapping: ABC -> xyzi. We analyze the process by adding indexes from high to low

  • X: (add two zeros before it) The first six digits of a => 00A7a6a5a4a3a2
  • Y: the last two digits of a + the first four digits of B => 00A1a0b7b6b5b4
  • Z: (add two zeros before it) The last four digits of B + the first two digits of C => 00B3b2b1b0c7c6
  • I: (add two zeros before it) The last six digits of C => 00C5c4c3c2c1c0

Through the above mapping, we can easily get the following implementation ideas:

  1. Converts the ascII encoding of a character to an 8-bit binary number
  2. Do the following for every three 8-bit binary numbers
    • Shift the first number right by 2 bits to get the first 6-bit significant binary number
    • Shift the first number & 0x3 by 4 bits to the left to get the first and second significant bits of the second 6-bit binary number, shift the second number & 0xf0 by 4 bits to the right to get the last four significant bits of the second 6-bit significant bit binary number, and get the second 6-bit significant bit binary number
    • Shift the second number & 0xf by 2 bits to the left to get the first four significant bits of the third 6-bit significant binary number, shift the third number & 0xC0 by 6 bits to the right to get the last two significant bits of the third 6-bit significant binary number, and take and get the third 6-bit significant binary number
    • Add the third number & 0x3f to get the fourth 6-bit significant binary number
  3. Converts the obtained 6-bit valid binary number to decimal and looks for the corresponding Base64 character

Let’s take the HAO string as an example and look at the base64 encoding process. Let’s do the above conversion through code logic analysis.

Code implementation

// Enter a string
let str = 'hao'
// base64 Is a character string
let base64EncodeChars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
// Define binary number of input and output bytes
let char1, char2, char3, out1, out2, out3, out4, out
// Convert the ascII encoding of a character to an 8-bit binary number
char1 = str.charCodeAt(0) & 0xff // 104  01101000
char2 = str.charCodeAt(1) & 0xff // 97  01100001
char3 = str.charCodeAt(2) & 0xff // 111  01101111
// Outputs a 6-bit valid binary number of bytes
6out1 = char1 >> 2 / / 26, 011010
out2 = (char1 & 0x3) < <4 | (char2 & 0xf0) > >4 / / 6, 000110
out3 = (char2 & 0xf) < <2 | (char3 & 0xc0) > >6 / / 5, 000101
out4 = char3 & 0x3f / / 47, 101111

out = base64EncodeChars[out1] + base64EncodeChars[out2] + base64EncodeChars[out3] + base64EncodeChars[out4] // aGFv
Copy the code

Algorithm analysis

  1. out1: char1 >> 2
    01101000 - > 00011010Copy the code
  2. out2 = (char1 & 0x3) << 4 | (char2 & 0xf0) >> 4
    // And operation 01101000 01100001 00000011 11110000 -------- -------- 00000000 01100000 // After the shift operation 00000000 00000110 // or operation 00000000-00000110-00000110Copy the code

Same with the third character and the fourth character

Tidy up the above code to extend the multi-character string

// Enter a string
let str = 'haohaohao'
// base64 Is a character string
let base64EncodeChars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'

// Get the length of the string
let len = str.length
// The current character index
let index = 0
// Outputs a string
let out = ' '
while(index < len) {
    // Define binary number of input and output bytes
    let char1, char2, char3, out1, out2, out3, out4
    // Convert the ascII encoding of a character to an 8-bit binary number
    char1 = str.charCodeAt(index++) & 0xff // 104  01101000
    char2 = str.charCodeAt(index++) & 0xff // 97  01100001
    char3 = str.charCodeAt(index++) & 0xff // 111  01101111
    // Outputs a 6-bit valid binary number of bytes
    out1 = char1 >> 2 / / 26, 011010
    out2 = (char1 & 0x3) < <4 | (char2 & 0xf0) > >4 / / 6, 000110
    out3 = (char2 & 0xf) < <2 | (char3 & 0xc0) > >6 / / 5, 000101
    out4 = char3 & 0x3f / / 47, 101111

    out = out + base64EncodeChars[out1] + base64EncodeChars[out2] + base64EncodeChars[out3] + base64EncodeChars[out4] // aGFv
}
Copy the code

Special handling is required when the original string length is not an integer multiple of 3

. char1 = str.charCodeAt(index++) &0xff // 104  01101000
    if (index == len) {
        out2 = (char1 & 0x3) < <4
        out = out + base64EncodeChars[out1] + base64EncodeChars[out2] + '= ='
        return out
    }
    char2 = str.charCodeAt(index++) & 0xff // 97  01100001
    if (index == len) {
        out1 = char1 >> 2 / / 26, 011010
        out2 = (char1 & 0x3) < <4 | (char2 & 0xf0) > >4 / / 6, 000110
        out3 = (char2 & 0xf) < <2
        out = out + base64EncodeChars[out1] + base64EncodeChars[out2] + base64EncodeChars[out3] + '='
        return out
    }
    ...

Copy the code

All the code

function base64Encode(str) {
    // base64 Is a character string
    let base64EncodeChars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'

    // Get the length of the string
    let len = str.length
    // The current character index
    let index = 0
    // Outputs a string
    let out = ' '
    while(index < len) {
        // Define binary number of input and output bytes
        let char1, char2, char3, out1, out2, out3, out4
        // Convert the ascII encoding of a character to an 8-bit binary number
        char1 = str.charCodeAt(index++) & 0xff
        out1 = char1 >> 2
        if (index == len) {
            out2 = (char1 & 0x3) < <4
            out = out + base64EncodeChars[out1] + base64EncodeChars[out2] + '= ='
            return out
        }
        char2 = str.charCodeAt(index++) & 0xff
        out2 = (char1 & 0x3) < <4 | (char2 & 0xf0) > >4 
        if (index == len) {
            out3 = (char2 & 0xf) < <2
            out = out + base64EncodeChars[out1] + base64EncodeChars[out2] + base64EncodeChars[out3] + '='
            return out
        }
        char3 = str.charCodeAt(index++) & 0xff
        // Outputs a 6-bit valid binary number of bytes
        out3 = (char2 & 0xf) < <2 | (char3 & 0xc0) > >6
        out4 = char3 & 0x3f

        out = out + base64EncodeChars[out1] + base64EncodeChars[out2] + base64EncodeChars[out3] + base64EncodeChars[out4]
    }
    return out
}
base64Encode('haohao') // aGFvaGFv
base64Encode('haoha') // aGFvaGE=
base64Encode('haoh') // aGFvaA==
Copy the code

The decoding principle

By backward derivation, the binary number of every 4 6-bit significant bits is merged into 3 8-bit binary numbers, which are mapped to corresponding characters according to THE ascII encoding and then concatenated strings

Train of thought

Analyze the mapping

  • A: The last six digits of x + y the third and fourth digits => x5x4x3x2x1x0y5y4
  • B: last four digits of y + z The third, fourth, fifth, and sixth digits => y3y2y1y0z5z4z3z2
  • C: The last two digits of Z + the last six digits of I => Z1z0i5i4i3i2i1i0
  1. Converts the index of a character’s Base64 character set to a 6-bit valid binary number
  2. Do the following for every four 6-bit significant binary numbers
    1. The first binary is shifted two bits to the left to get the first six bits of the new binary, and the second binary is shifted four bits to the right after &0x30, or the operation to get the first new binary
    2. The second binary number & 0xf is left shifted by 4 bits, and the third binary number & 0x3c is right shifted by 2 bits, or the operation yields the second new binary number
    3. The second binary number &0x3 is shifted 6 bits to the left, and the fourth binary number or operation yields the second new binary number
  3. Concatenate string after mapping to corresponding character according to ascII encoding

Code implementation

// base64 Is a character string
let str = 'aGFv'
// Base64 character set
let base64CharsArr = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'.split(' ')
// Get the index value
let char1 = base64CharsArr.findIndex(char= > char==str[0]) & 0xff / / 26, 011010
let char2 = base64CharsArr.findIndex(char= > char==str[1]) & 0xff / / 6, 000110
let char3 = base64CharsArr.findIndex(char= > char==str[2]) & 0xff / / 5, 000101
let char4 = base64CharsArr.findIndex(char= > char==str[3]) & 0xff / / 47, 101111
let out1, out2, out3, out
/ / operation
out1 = char1 << 2 | (char2 & 0x30) > >4
out2 = (char2 & 0xf) < <4 | (char3 & 0x3c) > >2
out3 = (char3 & 0x3) < <6 | char4
console.log(out1, out2, out3)
out = String.fromCharCode(out1) + String.fromCharCode(out2) + String.fromCharCode(out3)
Copy the code

When a ‘=’ is used to fill the position

function base64decode(str) {
    // Base64 character set
    let base64CharsArr = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'.split(' ')
    let char1 = base64CharsArr.findIndex(char= > char==str[0])
    let char2 = base64CharsArr.findIndex(char= > char==str[1])
    let out1, out2, out3, out
    if (char1 == -1 || char2 == -1) return out
    char1 = char1 & 0xff
    char2 = char2 & 0xff
    let char3 = base64CharsArr.findIndex(char= > char==str[2])
    // If the third bit is not in the base64 table, only the first string is concatenated
    if (char3 == -1) {
        out1 = char1 << 2 | (char2 & 0x30) > >4
        out = String.fromCharCode(out1)
        return out
    }
    let char4 = base64CharsArr.findIndex(char= > char==str[3])
    // If the third bit is not in the base64 table, only the first and second strings are concatenated
    if (char4 == -1) {
        out1 = char1 << 2 | (char2 & 0x30) > >4
        out2 = (char2 & 0xf) < <4 | (char3 & 0x3c) > >2
        out = String.fromCharCode(out1) + String.fromCharCode(out2)
        return out
    }
    / / operation
    out1 = char1 << 2 | (char2 & 0x30) > >4
    out2 = (char2 & 0xf) < <4 | (char3 & 0x3c) > >2
    out3 = (char3 & 0x3) < <6 | char4
    console.log(out1, out2, out3)
    out = String.fromCharCode(out1) + String.fromCharCode(out2) + String.fromCharCode(out3)
    return out
}
Copy the code

Decoded the entire string, cleaned up the code

function base64decode(str) {
    // Base64 character set
    let base64CharsArr = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'.split(' ')
    let i = 0
    let len = str.length
    let out = ' '
    while(i < len) {
        let char1 = base64CharsArr.findIndex(char= > char==str[i])
        i++
        let char2 = base64CharsArr.findIndex(char= > char==str[i])
        i++
        let out1, out2, out3
        if (char1 == -1 || char2 == -1) return out
        char1 = char1 & 0xff
        char2 = char2 & 0xff
        let char3 = base64CharsArr.findIndex(char= > char==str[i])
        i++
        // If the third bit is not in the base64 table, only the first string is concatenated
        out1 = char1 << 2 | (char2 & 0x30) > >4
        if (char3 == -1) {
            out = out + String.fromCharCode(out1)
            return out
        }
        let char4 = base64CharsArr.findIndex(char= > char==str[i])
        i++
        // If the third bit is not in the base64 table, only the first and second strings are concatenated
        out2 = (char2 & 0xf) < <4 | (char3 & 0x3c) > >2
        if (char4 == -1) {
            out = out + String.fromCharCode(out1) + String.fromCharCode(out2)
            return out
        }
        / / operation
        out3 = (char3 & 0x3) < <6 | char4
        console.log(out1, out2, out3)
        out = out + String.fromCharCode(out1) + String.fromCharCode(out2) + String.fromCharCode(out3)
    }
    return out
}
base64decode('aGFvaGFv') // haohao
base64decode('aGFvaGE=') // haoha
base64decode('aGFvaA==') // haoh
Copy the code

The core of the decoding above is the mapping between characters and base64 character set index. The method of mapping Base64 character index using AccII encoding index has been seen on the Internet

let base64DecodeChars = [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1.62, -1, -1, -1.63.52.53.54.55.56.57.58.59.60.61, -1, -1, -1, -1, -1, -1, -1.0.1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.22.23.24.25, -1, -1, -1, -1, -1, -1.26.27.28.29.30.31.32.33.34.35.36.37.38.39.40.41.42.43.44.45.46.47.48.49.50.51, -1, -1, -1, -1, -1]
// 
let char1 = 'hao'.charCodeAt(0) // h -> 104
base64DecodeChars[char1] // 33 -> base64 encoding table h
Copy the code

Thus, base64 decodechars control accII encoding table index storage is base64 encoding table of the corresponding character index.

conclusion

Base64 encoding may seem strange, because most encoding involves converting characters to binary, and the conversion from binary to character is called decoding. The Base64 concept is exactly the opposite, from binary to character called encoding, from character to binary called decoding. Base64 is a data encoding method, which can be used for simple encryption. We can change the Base64 encoding mapping order to form our own unique encryption algorithm for encryption and decryption.

Code table