How to convert UTF-8 to Base64 encoding (Implementation principle of Base64 encoding)

Computer storage is binary encoding, UTF-8 and Base64 are different encoding methods, today understand base64 is how to convert, its encoding principle is what

– The difference between UTF-8 and Base64 encoding

The Kanji in UTF-8 is 3 bytes with 8 bits in each byte 3*8= 24 bits in total

Base64 encoding is a Chinese character that is four bytes long and uses six bits to hold the bits, with two zeros in front of it if the bits are less than eight

– How do I convert UTF-8 encoding to Base64 encoding

For example, use the Node implementation to convert the words “Xiaolu XL” to Base64 encoding

Base64 encoding mapping table

Implementation steps:

> Buffer.from("Xiao Lou xl")
<Buffer e6 99 93 e9 9c b2 78 6c>

// xiao e6 99 93
//
// x 78
// l 6c

// Node is a hexadecimal system, which needs to be converted to base 2 first
// Turn "xiao" first
> (0xe6).toString(2)
'11100110'
> (0x99).toString(2)
'10011001'
> (0x93).toString(2)
'10010011'

// Concatenates a string
/ / '11100110' + '10011001', '10010011'

// The binary string is segmented by 6 bits after concatenation
/ / '111001' + '101001' + '100110', '010011'

// select * from *
/ / '00111001' + '00101001' + '00100110', '00010011'

// Convert base 2 to base 10

> (0b00111001).toString(10)
'57'
> (0b00101001).toString(10)
'and'
> (0b00100110).toString(10)
'38'
> (0b00010011).toString(10)
'the'

// Select base64 from base64

57 -> 5
41 -> p
38 -> m
19 -> T
// The result of concatenation is '5pmT'
// Verify that the result is correct

> new Buffer("Xiao").toString('base64')
'5pmT'
// The rest of the words are too long to practice -- just follow the steps above one by one
Copy the code

– Steps summary:

Node gets hexadecimal
Convert to base 2
After binary concatenation, it is divided into 4 strings according to 6 bits
The high end of the string completes the 8 bits with 0
Binary to base 10
The corresponding results are obtained by querying the base64 table
Concatenate corresponding results

– Now, let’s talk about how many characters a Base64 table can hold

Base64 requires 6 bits, the largest number can be represented is 0B111111 = 63, the minimum can be represented 0B000000 = 0, a total of 64 characters, and the above base64 encoding represents the corresponding, save the case character +/ this 64 characters

ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
+ / 
Copy the code

Base64 only uses 6 bits to map the corresponding 64 characters. Compared with utf-8, 1bit only has 6 significant digits, wasting 2 bits

Why base64, a byte-wasting approach

(Below is a reference link from the Internet)

We know that there are 256 combinations of bytes in a computer, corresponding to the ASCII code, and the values between 128 and 255 of the ASCII code are invisible characters. However, when data is exchanged on the network, for example, from point A to point B, multiple routing devices often pass through. Because different devices have different processing methods for characters, invisible characters may be processed incorrectly, which is not conducive to transmission. So you do a Base64 encoding of the data first and make it all visible, so that the probability of error is much lower. Certificates, especially root certificates, are generally Base64 encoded because they are downloaded by many people on the web. E-mail attachments are also commonly called Base64 encoded because an attachment data usually has invisible characters. The original intention of using Base64. Is to facilitate the information containing invisible strings as visible strings, easy to copy and paste;

Example of base64 usage scenarios

One XML contains another XML data. At this time, it is obviously inappropriate to write XML data directly, and it is more convenient to encode XML properly. In fact, characters in XML are generally visible characters (0-127), but invisible characters may exist due to the existence of Chinese characters. It certainly doesn’t make sense to print characters directly into the data in the outer XML, so what do you do? You can encode it in Base64, store it in XML, and decode it. Alternatively, you can write byte values in XML, space them, or separate them. You can also pass in byte data, but it’s more space-wasting and hard to save. The other one, like the key value field in the HTTP protocol, you have to URLEncode otherwise the equal sign will cause parsing to fail and the whitespace will cause parsing of the HTTP request to fail, For example, the request line is separated by Spaces POST/Guowuxin /hehe HTTP/1.1. For example, some text protocols do not support the transmission of invisible characters. Only visible characters larger than 32 can be used to transmit information.

Scenarios using Base64 in front-end work.

Small image files, can be converted to Base64 encoding, advantages:

It can reduce the number of HTTP requests to the page and optimize front-end performance to avoid cross-domain problemsCopy the code

Such as:

Base64 is only suitable for smaller images because it takes up more space

IE7 and the following browsers do not support data URL images too large encoding bytes too long, occupying more space than the image, but the loss is not worth the gain. Generally, it applies to images less than 3k. If large images are encoded in HTML/CSS, the page size will significantly increase, significantly affecting the opening speed of the web page.Copy the code

Refer to the link: www.zhihu.com/question/36…

Blog.csdn.net/lishuai_it_…