Introduction to the
We know that the files in the computer can be divided into two kinds, one is the human eye readable text file, the other is the naked eye unreadable binary file. Generally speaking, binary files will display garbled characters if opened with a text editor, and binary files and text files are stored and transmitted in different ways. Is there any way to convert binary files into text files for transmission or storage? The answer is yes.
This is the Base64 encoding that we’re going to talk about today.
Base64 and its encoding principle
Base64 is a form of converting a binary encoding format to a text encoding. We know that binary code is in the form of zeros and ones, and its units are usually one byte, eight bits, and each bit represents either a zero or a one.
And there are a lot of different text encoding formats, and the earliest and simplest encoding format is ASCII, The full name of ASCII encoding is American Standard Code for Information Interchange, also known as the American Standard Code for Information Interchange, which mainly represents some common Western European characters.
ASCII encoding ranges from 0x00 to 0x7F, which in decimal form is 0 to 127, with 128 characters in total, which is exactly in the 7bits range.
The ASCII encoding contains 33 control characters and 95 printable characters, as follows:
ASCII | meaning | ASCII | meaning | ||||
---|---|---|---|---|---|---|---|
hexadecimal | decimal | 2 base | hexadecimal | decimal | 2 base | ||
0x00 | 0 | 0 | NUL empty | 0x40 | 64 | 1000000 | @ |
0x01 | 1 | 1 | SOH title begins | 0x41 | 65 | 1000001 | A |
0x02 | 2 | 10 | STX text starts | 0x42 | 66 | 1000010 | B |
0x03 | 3 | 11 | The ETX text is complete | 0x43 | 67 | 1000011 | C |
0x04 | 4 | 100 | The EOT transmission is complete | 0x44 | 68 | 1000100 | D |
0x05 | 5 | 101 | ENQ query character | 0x45 | 69 | 1000101 | E |
0x06 | 6 | 110 | ACK admitted | 0x46 | 70 | 1000110 | F |
0x07 | 7 | 111 | BEL alarm | 0x47 | 71 | 1000111 | G |
0x08 | 8 | 1000 | BS a lattice | 0x48 | 72 | 1001000 | H |
0x09 | 9 | 1001 | HT Horizontal tabulating | 0x49 | 73 | 1001001 | I |
0x0A | 10 | 1010 | LF a newline | 0x4A | 74 | 1001010 | J |
0x0B | 11 | 1011 | VT vertical tabulating | 0x4B | 75 | 1001011 | K |
0x0C | 12 | 1100 | FF paper control | 0x4C | 76 | 1001100 | L |
0x0D | 13 | 1101 | CR enter | 0x4D | 77 | 1001101 | M |
0x0E | 14 | 1110 | SO shift output | 0x4E | 78 | 1001110 | N |
0x0F | 15 | 1111 | SI shift input | 0x4F | 79 | 1001111 | O |
0x10 | 16 | 10000 | DLE data link escape | 0x50 | 80 | 1010000 | P |
0x11 | 17 | 10001 | DC1 Device Control 1 | 0x51 | 81 | 1010001 | Q |
0x12 | 18 | 10010 | DC2 equipment control 2 | 0x52 | 82 | 1010010 | R |
0x13 | 19 | 10011 | DC3 equipment control 3 | 0x53 | 83 | 1010011 | S |
0x14 | 20 | 10100 | DC4 Equipment control 4 | 0x54 | 84 | 1010100 | T |
0x15 | 21 | 10101 | NAK negative | 0x55 | 85 | 1010101 | U |
0x16 | 22 | 10110 | SYN idle synchronization | 0x56 | 86 | 1010110 | V |
0x17 | 23 | 10111 | ETB packet transmission completed | 0x57 | 87 | 1010111 | W |
0x18 | 24 | 11000 | CAN become invalid | 0x58 | 88 | 1011000 | X |
0x19 | 25 | 11001 | EM paper do | 0x59 | 89 | 1011001 | Y |
0x1A | 26 | 11010 | SUB hypallage | 0x5A | 90 | 1011010 | Z |
0x1B | 27 | 11011 | ESC escape | 0x5B | 91 | 1011011 | [ |
0x1C | 28 | 11100 | FS Text separator | 0x5C | 92 | 1011100 | \ |
0x1D | 29 | 11101 | GS group delimiter | 0x5D | 93 | 1011101 | ] |
0x1E | 30 | 11110 | RS record separator | 0x5E | 94 | 1011110 | ^ |
0x1F | 31 | 11111 | US cell separator | 0x5F | 95 | 1011111 | _ |
0x20 | 32 | 100000 | (space) | 0x60 | 96 | 1100000 | ` |
0x21 | 33 | 100001 | ! | 0x61 | 97 | 1100001 | a |
0x22 | 34 | 100010 | “ | 0x62 | 98 | 1100010 | b |
0x23 | 35 | 100011 | # | 0x63 | 99 | 1100011 | c |
0x24 | 36 | 100100 | $ | 0x64 | 100 | 1100100 | d |
0x25 | 37 | 100101 | % | 0x65 | 101 | 1100101 | e |
0x26 | 38 | 100110 | & | 0x66 | 102 | 1100110 | f |
0x27 | 39 | 100111 | ‘ | 0x67 | 103 | 1100111 | g |
0x28 | 40 | 101000 | ( | 0x68 | 104 | 1101000 | h |
0x29 | 41 | 101001 | ) | 0x69 | 105 | 1101001 | i |
0x2A | 42 | 101010 | * | 0x6A | 106 | 1101010 | j |
0x2B | 43 | 101011 | + | 0x6B | 107 | 1101011 | k |
0x2C | 44 | 101100 | . | 0x6C | 108 | 1101100 | l |
0x2D | 45 | 101101 | – | 0x6D | 109 | 1101101 | m |
0x2E | 46 | 101110 | . | 0x6E | 110 | 1101110 | n |
0x2F | 47 | 101111 | / | 0x6F | 111 | 1101111 | o |
0x30 | 48 | 110000 | 0 | 0x70 | 112 | 1110000 | p |
0x31 | 49 | 110001 | 1 | 0x71 | 113 | 1110001 | q |
0x32 | 50 | 110010 | 2 | 0x72 | 114 | 1110010 | r |
0x33 | 51 | 110011 | 3 | 0x73 | 115 | 1110011 | s |
0x34 | 52 | 110100 | 4 | 0x74 | 116 | 1110100 | t |
0x35 | 53 | 110101 | 5 | 0x75 | 117 | 1110101 | u |
36 | 54 | 110110 | 6 | 0x76 | 118 | 1110110 | v |
0x37 | 55 | 110111 | 7 | 0x77 | 119 | 1110111 | w |
0x38 | 56 | 111000 | 8 | 0x78 | 120 | 1111000 | x |
0x39 | 57 | 111001 | 9 | 0x79 | 121 | 1111001 | y |
0x3A | 58 | 111010 | : | 0x7A | 122 | 1111010 | z |
0x3B | 59 | 111011 | ; | 0x7B | 123 | 1111011 | { |
0x3C | 60 | 111100 | < | 0x7C | 124 | 1111100 | | |
0x3D | 61 | 111101 | = | 0x7D | 125 | 1111101 | } |
0x3E | 62 | 111110 | > | 0x7E | 126 | 1111110 | ~ |
0x3F | 63 | 111111 | ? | 0x7F | 127 | 1111111 | DEL deletes |
Base64 is a mapping of 64 characters from ASCII encoding and binary 8bits a byte, which is the meaning of 64 in Base64. Why choose ASCII encoding? This is because ASCII encoding is the earliest form of encoding, almost all computer applications fully support it, there is no content conversion in the process of data transmission, very safe.
Of course, Base64 encodings also have A variety of encodings. For example, in MIME, Base64 selects A-Z, A-Z, and 0-9 characters in total 62 characters, plus two other optional characters to form 64 encodings.
64 characters are represented as 6bits in binary, and the common binary is represented as 8bits in one byte, so how can 8bits be represented as 6bits in Base64?
Quite simply, we just need to concatenate three 8bits to make them 24bits, which can be represented by four base64s.
Why do you have to convert binary? This is because some transport protocols over the Internet only support certain character sets, and others do not. Such as commonly used email attachments. SMTP was originally designed to support 7-bit ASCII characters, so if you want to transfer files, you need to encode them before transferring them.
Another use of Base64 is to embed images in HTML into web pages to display images.
Base64 is great, but because it can only use a 6bits character mapping set, there is a loss of data mapping, which leads to the disadvantage of larger files after binary encoding.
Base64 variation
Base64 is simply a bit to bit mapping, so there must be more than one mapping. Let’s take a look at the various variations of Base64 encoding. In general, the first 62 bits are basically the same, except for the last two characters and the characters used for padding (which may be mandatory in some protocols, Or may be deleted in other protocols).
The following table is a common Base64 encoding variant:
Code name | Coded character | Coded character | Coded character |
---|---|---|---|
The 62th | The 63th | Completion operator | |
RFC 1421: Base64 for Privacy-Enhanced Mail (deprecated) | + |
/ |
= mandatory |
RFC 2045: Base64 transfer encoding for MIME | + |
/ |
= mandatory |
RFC 2152: Base64 for UTF-7 | + |
/ |
No |
RFC 3501: Base64 encoding for IMAP mailbox names | + |
. |
No |
RFC 4648: base64 (standard) | + |
/ |
= optional |
RFC 4648: base64url (URL- and filename-safe standard) | - |
_ |
= optional |
RFC 4880: Radix-64 for OpenPGP | + |
/ |
= mandatory |
Base64 encoding details
In the last section we covered the basic principles and common variations of Base64 encoding, but how exactly do you map?
In this section, RFC 4648, the standard form of Base64, is used as an example to explain in detail.
RFC 4648 selects the + and/characters as bits 62 and 63 in the encoding, and selects = as the completion character.
Let’s look at the mapping table for RFC 4648:
The index | binary | character | The index | binary | Char | The index | binary | Char | The index | binary | Char |
0 | 000000 | A |
16 | 010000 | Q |
32 | 100000 | g |
48 | 110000 | w |
1 | 000001 | B |
17 | 010001 | R |
33 | 100001 | h |
49 | 110001 | x |
2 | 000010 | C |
18 | 010010 | S |
34 | 100010 | i |
50 | 110010 | y |
3 | 000011 | D |
19 | 010011 | T |
35 | 100011 | j |
51 | 110011 | z |
4 | 000100 | E |
20 | 010100 | U |
36 | 100100 | k |
52 | 110100 | 0 |
5 | 000101 | F |
21 | 010101 | V |
37 | 100101 | l |
53 | 110101 | 1 |
6 | 000110 | G |
22 | 010110 | W |
38 | 100110 | m |
54 | 110110 | 2 |
7 | 000111 | H |
23 | 010111 | X |
39 | 100111 | n |
55 | 110111 | 3 |
8 | 001000 | I |
24 | 011000 | Y |
40 | 101000 | o |
56 | 111000 | 4 |
9 | 001001 | J |
25 | 011001 | Z |
41 | 101001 | p |
57 | 111001 | 5 |
10 | 001010 | K |
26 | 011010 | a |
42 | 101010 | q |
58 | 111010 | 6 |
11 | 001011 | L |
27 | 011011 | b |
43 | 101011 | r |
59 | 111011 | 7 |
12 | 001100 | M |
28 | 011100 | c |
44 | 101100 | s |
60 | 111100 | 8 |
13 | 001101 | N |
29 | 011101 | d |
45 | 101101 | t |
61 | 111101 | 9 |
14 | 001110 | O |
30 | 011110 | e |
46 | 101110 | u |
62 | 111110 | + |
15 | 001111 | P |
31 | 011111 | f |
47 | 101111 | v |
63 | 111111 | / |
Completion operator | = |
Let’s look at the Base64 encoding process using the word man as an example.
The word man is represented in ASCII as 77, 97, and 110, respectively, and translated into binary is 01001101, 01100001, and 01101110.
Combine in the above three binary is: 010011010110000101101110, a total of 24 – bit, selection of the corresponding characters from the above table, so we can get man get after base64 encoding: TWFu.
In the above example, man is exactly three characters, which is 24 bits, and can be fully represented in base64. If we only have ma, how do we encode it?
As above, the binary values of MA are 01001101 and 01100001, respectively, which together are 0100110101100001.
But the bits above are only 16 bits, and since a base64 is 6bits, it can be represented as three base64 bits. Since the original bits are missing two bits, it is completed with 0:
+ 0100110101100001 = 010011010110000100, 00.
010011010110000100 is TWE when converted to base64. Since base64 encoding requires 4 characters, the last character is completed with =, that is to say, me becomes TWE= after passing through base64.
conclusion
The above is the basic meaning of Base64 and conversion rules, in fact, the protocol is very simple, the data to be converted into binary, and then the conversion and completion can be compared to the conversion table.
This article is available at www.flydean.com/18-base64-e…
The most popular interpretation, the most profound dry goods, the most concise tutorial, many tips you didn’t know waiting for you to discover!
Welcome to pay attention to my public number: “procedures those things”, understand technology, more understand you!