1. What is Base64?

Base64 is a representation of binary data based on 64 printable characters. From A to Z (26), A to Z (26), 0-9 (10), and + /, = (3) is 65 characters (note: the = sign used as suffixes purposes), as shown below

let _keyStr = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='

Copy the code

Purpose: Base64 is often used to represent, transmit and store binary data in the context of text data processing, including MIME E-mail and SOME complex data of XML. In MIME format E-mail, Base64 can be used to encode binary byte sequence data into text composed of ASCII character sequences, which can prevent errors in content caused by improper processing of invisible characters during transmission.

Note: ASCII characters are unicode characters ranging from 0 to 127, and 128-255 are invisible characters

2. Base64 principle

Base64 excluding the complement character = A total of 64 characters (26) can represent the digits between binary 000000 and 111111, a total of six bits. We know that there are eight bits in a byte, so the least common multiple of the two is 24, i.e., three bytes of data can be represented by four Base64 characters:


Examples demonstrate

We use the word HI to demonstrate: H for ASCII code 104 for binary 01101000, and I for ASCII code 105 for binary 01101001. The total number of bytes not divisible by 3 should be supplemented to be divisible by 3, and the resulting 6-bit binary of 000000 is represented by Base64 encoding =, as shown in the figure:


3. Realization of Base64 encoding and decoding

In the Window object, there are two methods btoa() and atob() to encode and decode, and this article walks you through implementing their functions in JS step by step.

Before you implement it, do some preparatory work.

  • Method of obtaining the corresponding ASCII characterString.charCodeAt(index)
  • Gets the character method corresponding to Base64String.charAt(index)

If three ASCII codes are chr1,chr2, and chr3, how to obtain the corresponding base64 index (enc1,enc2,enc3,enc4)? That’s where the in-place operation comes in.

  • >>If I move to the right, I add 0 in front of it, as in104 > > 2namely01101000= >00011010
  • &And operation, the result is 1 only if the corresponding bits of both operands are 1, otherwise 0. Such as104 & 3namely01101000 & 00000011= >00000000
  • |For each bit, the result is 1 if there is at least one 1 in the corresponding bit of the two operands, and 0 otherwise. Such as01101000 | 00000011= >01101011
  • >>Sign movement can take the first n bits or the last n bits; And operations can take the last few bits, such as 104 & 3, which takes the last two bits, and 104 & 15, which takes the last four bits

With the combination of bit operation, the corresponding base64 character index can be obtained

  • enc1 = chr1 >> 2, take the first six digits of CHR1 and move two to the right
  • enc2 = ((chr1 & 3) << 4) | (chr2 >> 4), take the last 2 bits of CHR1 + the first 4 bits of chr2
  • enc3 = ((chr2 & 15) << 2) | (chr3 >> 6), take the last four bits of CHR2 + the first two bits of chr3
  • enc4 = chr3 & 63, take the last 6 digits of CHR3

Encoding and decoding of base64 is actually the mutual conversion process of 3 bytes and 4 base64 characters. We define two methods: encode() and decode().

// Base64 characters, 65 characters in total

let _keyStr =

    'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=';

/ / code

function encode(input{

    let output = ' '.

        i = 0.

        chr1,

        chr2,

        chr3,

        enc1,

        enc2,

        enc3,

        enc4;

    while (i < input.length) {

        // Get the first three ASCII characters

        chr1 = input.charCodeAt(i++);

        chr2 = input.charCodeAt(i++);

        chr3 = input.charCodeAt(i++);

        // Convert the three characters into four base64 characters

        // Take the first 6 bits of the first character chr1 as the index of the base64 character 1

        enc1 = chr1 >> 2;

        // Take the last 2 bits of chr1 and add the first 4 bits of chr2 as the index of base64 character 2

        enc2 = ((chr1 & 3) < <4) | (chr2 >> 4);

        // Take the last four bits of CHR2 and add the first two bits of chr3 as the index of the base64 character 3

        enc3 = ((chr2 & 15) < <2) | (chr3 >> 6);

        // take the last 6 bits of chr3 as the index of the base64 character 4

        enc4 = chr3 & 63;



        // Determine whether to complement, that is, + 0, complement set index to 64, corresponding to the '=' character

        if (Number.isNaN(chr2)) {

            enc3 = enc4 = 64;

        } else if (Number.isNaN(chr3)) {

            enc4 = 64;

        }

        output =

            output +

            _keyStr.charAt(enc1) +

            _keyStr.charAt(enc2) +

            _keyStr.charAt(enc3) +

            _keyStr.charAt(enc4);

    }

    return output;

}

/ / decoding

function decode(input{

    let output = ' '.

        i = 0.

        chr1,

        chr2,

        chr3,

        enc1,

        enc2,

        enc3,

        enc4;

    while (i < input.length) {

        enc1 = _keyStr.indexOf(input.charAt(i++));

        enc2 = _keyStr.indexOf(input.charAt(i++));

        enc3 = _keyStr.indexOf(input.charAt(i++));

        enc4 = _keyStr.indexOf(input.charAt(i++));

        // Take the first two bits of enc1 + enc2 to form 8 bits, i.e. 1 byte

        chr1 = (enc1 << 2) | (enc2 >> 4);

        // Take the last 4 bits of ENC2 + the first 4 bits of enc3 to form 8 bits, i.e. 1 byte

        chr2 = ((enc2 & 15) < <4) | (enc3 >> 2);

        // Take the first two bits of ENC3 + enc4 to form 8 bits (1 byte)

        chr3 = ((enc3 & 3) < <6) | enc4;



        output = output + String.fromCharCode(chr1);

        // Determine if it is base64 =, if not, add it

        if(enc3 ! =64) {

            output = output + String.fromCharCode(chr2);

        }

        if(enc4 ! =64) {

            output = output + String.fromCharCode(chr3);

        }

    }

    return output;

}

console.log(encode('hello world')); // aGVsbG8gd29ybGQ=

console.log(encode('hello world') === btoa('hello world')); // true

console.log(decode('aGVsbG8gd29ybGQ='))// 'hello world'

console.log(decode('aGVsbG8gd29ybGQ=') === atob('aGVsbG8gd29ybGQ=')) // true

Copy the code

4. Problems and optimization

We found that these methods do not work when the character is not ASCII, or unicode is greater than 255, as is the case with atob() and btoa() on Windows.

The unicode of the word hello is 20320 and 22909, respectively, which are far more than 255. Can the number 20320 be converted into multiple numbers between 0 and 255 by some means, and the decoding also refer to the same rules? Try bai

Because charCodeAt() returns the Unicode encoding for the character at the specified position. The return value is an integer between 0 and 65535, that is, 216-1, which can be represented by 16 bits, whereas a normal character is 8 bits, so the character passed in can be represented by 1 to 2 8-bit characters.

There is also a problem with large characters = 8 bits * number, but we don’t have any spare bits to store the number, so 1-2 characters is not enough, increase it to 1-3 characters.

Judge the first digit. If it is greater than or equal to 11100000, that is greater than 224, then the digit should be converted to 3 characters. If the number is greater than or equal to 11000000 and less than 11100000, that is greater than or equal to 192 and < 224, the number should be converted to 2 characters. The rest is converted to 1 character

function encodeTransform(input{

    let output = ' ';

    for (var n = 0; n < input.length; n++) {

        var c = input.charCodeAt(n); // Returns the Unicode encoding for the character at the specified position. The return value is an integer between 0 and 65535.

        if (c < 128) {

            / / 0 to 7

            If less than 128 is an ASCII code, the ASCII code is returned directly

            output += String.fromCharCode(c);

        } else if (c > 127 && c < 2048) {

            // 8-11 digits

            // Add '11' at the beginning of the binary to add a number greater than 192 but less than 224

            output += String.fromCharCode((c >> 6) | 192);

            // Add a '1' to the last six bits of binary, which is less than 255 and greater than or equal to 128

            output += String.fromCharCode((c & 63) | 128);

        } else {

            // 12-16 bits, because unicode's maximum number of bits is 16

            // Add '111' at the beginning of the binary to add a number greater than 224 and less than 255

            output += String.fromCharCode((c >> 12) | 224);

            // Add a '1' at the beginning of the 8-bit binary to create a number less than 192 and greater than 128

            output += String.fromCharCode(((c >> 6) & 63) | 128);

            // Take the digits from 0 to 6, and then add a '1' at the beginning to make up the 8-bit binary, which is less than 192 and greater than 128

            output += String.fromCharCode((c & 63) | 128);

        }

    }

    return output;

}

Copy the code

Similarly, decoding is some boundary judgment and bit operation

function decodeTransform(input{

    let output = ' '.

        i = 0.

        c = (c1 = c2 = 0);

    while (i < input.length) {

        c = input.charCodeAt(i);

        if (c < 128) {

            / / 1 characters

            output += String.fromCharCode(c);

            i++;

        } else if (c > 191 && c < 224) {

            / / 2 characters

            c1 = input.charCodeAt(i + 1);

            output += String.fromCharCode(((c & 31) < <6) | (c1 & 63));

            i += 2;

        } else {

            / / 3 characters

            c1 = input.charCodeAt(i + 1);

            c2 = input.charCodeAt(i + 2);

            output += String.fromCharCode(

                ((c & 15) < <12) | ((c1 & 63) < <6) | (c2 & 63)

            );

            i += 3;

        }

    }

    return output;

}

Copy the code

Here is the complete code, please click to see!

5. To summarize

The origin of this article: A friend asked me to write a Base64 conversion page for him, and I used BTOA and ATOB without thinking. Later, when he used it, he found that Chinese could not be encoded, and errors would be reported. A little embarrassed, so I went to the Internet to find the Base64 transformation library, carefully study it, understand its principle, it is still very interesting, involving a lot of bit operations and bit operations, this part needs to spend some thoughts to understand, also can be a harvest!

If this article is of any help to you, please feel free to give a thumbs up or follow my official account: Xiao PI Ka