Introduction to the

We know that the files in the computer can be divided into two kinds, one is the human eye readable text file, the other is the naked eye unreadable binary file. Generally speaking, binary files will display garbled characters if opened with a text editor, and binary files and text files are stored and transmitted in different ways. Is there any way to convert binary files into text files for transmission or storage? The answer is yes.

This is the Base64 encoding that we’re going to talk about today.

Base64 and its encoding principle

Base64 is a form of converting a binary encoding format to a text encoding. We know that binary code is in the form of zeros and ones, and its units are usually one byte, eight bits, and each bit represents either a zero or a one.

And there are a lot of different text encoding formats, and the earliest and simplest encoding format is ASCII, The full name of ASCII encoding is American Standard Code for Information Interchange, also known as the American Standard Code for Information Interchange, which mainly represents some common Western European characters.

ASCII encoding ranges from 0x00 to 0x7F, which in decimal form is 0 to 127, with 128 characters in total, which is exactly in the 7bits range.

The ASCII encoding contains 33 control characters and 95 printable characters, as follows:

ASCII meaning ASCII meaning
hexadecimal decimal 2 base hexadecimal decimal 2 base
0x00 0 0 NUL empty 0x40 64 1000000 @
0x01 1 1 SOH title begins 0x41 65 1000001 A
0x02 2 10 STX text starts 0x42 66 1000010 B
0x03 3 11 The ETX text is complete 0x43 67 1000011 C
0x04 4 100 The EOT transmission is complete 0x44 68 1000100 D
0x05 5 101 ENQ query character 0x45 69 1000101 E
0x06 6 110 ACK admitted 0x46 70 1000110 F
0x07 7 111 BEL alarm 0x47 71 1000111 G
0x08 8 1000 BS a lattice 0x48 72 1001000 H
0x09 9 1001 HT Horizontal tabulating 0x49 73 1001001 I
0x0A 10 1010 LF a newline 0x4A 74 1001010 J
0x0B 11 1011 VT vertical tabulating 0x4B 75 1001011 K
0x0C 12 1100 FF paper control 0x4C 76 1001100 L
0x0D 13 1101 CR enter 0x4D 77 1001101 M
0x0E 14 1110 SO shift output 0x4E 78 1001110 N
0x0F 15 1111 SI shift input 0x4F 79 1001111 O
0x10 16 10000 DLE data link escape 0x50 80 1010000 P
0x11 17 10001 DC1 Device Control 1 0x51 81 1010001 Q
0x12 18 10010 DC2 equipment control 2 0x52 82 1010010 R
0x13 19 10011 DC3 equipment control 3 0x53 83 1010011 S
0x14 20 10100 DC4 Equipment control 4 0x54 84 1010100 T
0x15 21 10101 NAK negative 0x55 85 1010101 U
0x16 22 10110 SYN idle synchronization 0x56 86 1010110 V
0x17 23 10111 ETB packet transmission completed 0x57 87 1010111 W
0x18 24 11000 CAN become invalid 0x58 88 1011000 X
0x19 25 11001 EM paper do 0x59 89 1011001 Y
0x1A 26 11010 SUB hypallage 0x5A 90 1011010 Z
0x1B 27 11011 ESC escape 0x5B 91 1011011 [
0x1C 28 11100 FS Text separator 0x5C 92 1011100 \
0x1D 29 11101 GS group delimiter 0x5D 93 1011101 ]
0x1E 30 11110 RS record separator 0x5E 94 1011110 ^
0x1F 31 11111 US cell separator 0x5F 95 1011111 _
0x20 32 100000 (space) 0x60 96 1100000 `
0x21 33 100001 ! 0x61 97 1100001 a
0x22 34 100010 0x62 98 1100010 b
0x23 35 100011 # 0x63 99 1100011 c
0x24 36 100100 $ 0x64 100 1100100 d
0x25 37 100101 % 0x65 101 1100101 e
0x26 38 100110 & 0x66 102 1100110 f
0x27 39 100111 0x67 103 1100111 g
0x28 40 101000 ( 0x68 104 1101000 h
0x29 41 101001 ) 0x69 105 1101001 i
0x2A 42 101010 * 0x6A 106 1101010 j
0x2B 43 101011 + 0x6B 107 1101011 k
0x2C 44 101100 . 0x6C 108 1101100 l
0x2D 45 101101 0x6D 109 1101101 m
0x2E 46 101110 . 0x6E 110 1101110 n
0x2F 47 101111 / 0x6F 111 1101111 o
0x30 48 110000 0 0x70 112 1110000 p
0x31 49 110001 1 0x71 113 1110001 q
0x32 50 110010 2 0x72 114 1110010 r
0x33 51 110011 3 0x73 115 1110011 s
0x34 52 110100 4 0x74 116 1110100 t
0x35 53 110101 5 0x75 117 1110101 u
36 54 110110 6 0x76 118 1110110 v
0x37 55 110111 7 0x77 119 1110111 w
0x38 56 111000 8 0x78 120 1111000 x
0x39 57 111001 9 0x79 121 1111001 y
0x3A 58 111010 : 0x7A 122 1111010 z
0x3B 59 111011 ; 0x7B 123 1111011 {
0x3C 60 111100 < 0x7C 124 1111100 |
0x3D 61 111101 = 0x7D 125 1111101 }
0x3E 62 111110 > 0x7E 126 1111110 ~
0x3F 63 111111 ? 0x7F 127 1111111 DEL deletes

Base64 is a mapping of 64 characters from ASCII encoding and binary 8bits a byte, which is the meaning of 64 in Base64. Why choose ASCII encoding? This is because ASCII encoding is the earliest form of encoding, almost all computer applications fully support it, there is no content conversion in the process of data transmission, very safe.

Of course, Base64 encodings also have A variety of encodings. For example, in MIME, Base64 selects A-Z, A-Z, and 0-9 characters in total 62 characters, plus two other optional characters to form 64 encodings.

64 characters are represented as 6bits in binary, and the common binary is represented as 8bits in one byte, so how can 8bits be represented as 6bits in Base64?

Quite simply, we just need to concatenate three 8bits to make them 24bits, which can be represented by four base64s.

Why do you have to convert binary? This is because some transport protocols over the Internet only support certain character sets, and others do not. Such as commonly used email attachments. SMTP was originally designed to support 7-bit ASCII characters, so if you want to transfer files, you need to encode them before transferring them.

Another use of Base64 is to embed images in HTML into web pages to display images.

Base64 is great, but because it can only use a 6bits character mapping set, there is a loss of data mapping, which leads to the disadvantage of larger files after binary encoding.

Base64 variation

Base64 is simply a bit to bit mapping, so there must be more than one mapping. Let’s take a look at the various variations of Base64 encoding. In general, the first 62 bits are basically the same, except for the last two characters and the characters used for padding (which may be mandatory in some protocols, Or may be deleted in other protocols).

The following table is a common Base64 encoding variant:

Code name Coded character Coded character Coded character
The 62th The 63th Completion operator
RFC 1421: Base64 for Privacy-Enhanced Mail (deprecated) + / = mandatory
RFC 2045: Base64 transfer encoding for MIME + / = mandatory
RFC 2152: Base64 for UTF-7 + / No
RFC 3501: Base64 encoding for IMAP mailbox names + . No
RFC 4648: base64 (standard) + / = optional
RFC 4648: base64url (URL- and filename-safe standard) - _ = optional
RFC 4880: Radix-64 for OpenPGP + / = mandatory

Base64 encoding details

In the last section we covered the basic principles and common variations of Base64 encoding, but how exactly do you map?

In this section, RFC 4648, the standard form of Base64, is used as an example to explain in detail.

RFC 4648 selects the + and/characters as bits 62 and 63 in the encoding, and selects = as the completion character.

Let’s look at the mapping table for RFC 4648:

The index binary character The index binary Char The index binary Char The index binary Char
0 000000 A 16 010000 Q 32 100000 g 48 110000 w
1 000001 B 17 010001 R 33 100001 h 49 110001 x
2 000010 C 18 010010 S 34 100010 i 50 110010 y
3 000011 D 19 010011 T 35 100011 j 51 110011 z
4 000100 E 20 010100 U 36 100100 k 52 110100 0
5 000101 F 21 010101 V 37 100101 l 53 110101 1
6 000110 G 22 010110 W 38 100110 m 54 110110 2
7 000111 H 23 010111 X 39 100111 n 55 110111 3
8 001000 I 24 011000 Y 40 101000 o 56 111000 4
9 001001 J 25 011001 Z 41 101001 p 57 111001 5
10 001010 K 26 011010 a 42 101010 q 58 111010 6
11 001011 L 27 011011 b 43 101011 r 59 111011 7
12 001100 M 28 011100 c 44 101100 s 60 111100 8
13 001101 N 29 011101 d 45 101101 t 61 111101 9
14 001110 O 30 011110 e 46 101110 u 62 111110 +
15 001111 P 31 011111 f 47 101111 v 63 111111 /
Completion operator =

Let’s look at the Base64 encoding process using the word man as an example.

The word man is represented in ASCII as 77, 97, and 110, respectively, and translated into binary is 01001101, 01100001, and 01101110.

Combine in the above three binary is: 010011010110000101101110, a total of 24 – bit, selection of the corresponding characters from the above table, so we can get man get after base64 encoding: TWFu.

In the above example, man is exactly three characters, which is 24 bits, and can be fully represented in base64. If we only have ma, how do we encode it?

As above, the binary values of MA are 01001101 and 01100001, respectively, which together are 0100110101100001.

But the bits above are only 16 bits, and since a base64 is 6bits, it can be represented as three base64 bits. Since the original bits are missing two bits, it is completed with 0:

+ 0100110101100001 = 010011010110000100, 00.

010011010110000100 is TWE when converted to base64. Since base64 encoding requires 4 characters, the last character is completed with =, that is to say, me becomes TWE= after passing through base64.

conclusion

The above is the basic meaning of Base64 and conversion rules, in fact, the protocol is very simple, the data to be converted into binary, and then the conversion and completion can be compared to the conversion table.

This article is available at www.flydean.com/18-base64-e…

The most popular interpretation, the most profound dry goods, the most concise tutorial, many tips you didn’t know waiting for you to discover!

Welcome to pay attention to my public number: “procedures those things”, understand technology, more understand you!