What is a Base64?

The specification document is shown in:RFC 2045

Let’s start with an online topic:

When transferring data over a network, it is often necessary to convert binary data into a printable string. The common printable character set contains 64 characters and is therefore called Base64 notation. There is a char array of length 12. To represent it as a Base64 string, the Base64 string requires at least ____ char; If the char array is 20 in length, ____ chars are required.


  • What is Base64?
  • Why Base64?
  • What are printable characters?
  • What are ASCII characters?

Let’s look at the problem backwards, and first let’s look at what ASCII is.

A, ASCII

The following excerpt from [baidu encyclopedia] (baike.baidu.com/item/ASCII/…

ASCII (American Standard Code for Information Interchange) is a computer coding system based on the Latin alphabet used to display modern English and other Western European languages. It is the most common information exchange standard and is equivalent to the international standard ISO/IEC 646.

It is a standard that specifies the binary values of characters commonly used in systems. The ASCII code table has a total of 128 bits. The corresponding **ASCII code table ** is described below.

  • **0 ~ 31 and 127(33 in total) are control characters or communication characters (the rest are displayable characters), ** such as control characters: LF (line feed), CR (carriage return), FF (page feed), DEL (delete), BS (backspace), BEL (ring bell); Special communication characters: SOH (header), EOT (end), ACK (acknowledgement), etc. ASCII values 8, 9, 10, and 13 are converted to backspace, TAB, newline, and carriage return characters, respectively. They do not have a specific graphical display, but have different effects on text display depending on the application [1].

  • 32 to 126(a total of 95 characters) are characters (32 is a space). 48 to 57 are Arabic numerals ranging from 0 to 9.

  • There are 26 uppercase letters from 65 to 90, 26 lowercase letters from 97 to 122, and the rest are punctuation marks and operation symbols.

Therefore, all 95 characters in the ASCII code table from 32 to 126 are printable characters. Characters that can be transmitted over the network. This solves the problem of what is a printable character.

Why Base64

Anyone familiar with Web development knows that HTTP packets are composed of simple lines of strings. HTTP packets are plain text, not binary. So you’ve probably seen complaints about why the HTTP protocol uses text instead of binary.

In addition, we transmit data over HTTP, and in HTTP version 1.1 it is possible to transmit binary data. How do we transmit binary streams in HTTP0.9 or in AN ASCII-only text transfer protocol (SMTP/POP3)? This requires the corresponding transcoding method to transcode and transmit binary data. Base64 is one of them.

What is Base64

Base64 is one of the most common encodings for transmitting 8-bit bytecode on the network. Base64 is a method of representing binary data based on 64 printable characters. You can view RFC2045 to RFC2049 for detailed MIME specifications.

Base64 encoding is a binary to character process that can be used to pass long identity information in an HTTP environment

See the Base64 mapping table. We need to use the basic 64 characters to represent binary data. Each character has an index, and the maximum index value is 63. The binary value of 63 is 00111111, which can be represented by 6 bits. So if we use the binary of the index value of these characters to derive the normal 8bit binary code, we can use it to represent the value we want to encode.

Conversion rules:

  1. Turn every three 8bit bytes into four 6bit bytes. (Think about why there are 3 8bits)
  2. Add a newline every 76 characters.
  3. The final terminator is also processed.
  4. If there are less than 3 8-bit numbers, use 0’s complement and use “=” instead

It might be a bit vague to say this, but take my initials lSY for example:

  • lsyThe BINARY ASCII code is 01101100 01110011 01111001
  • Change three 8-bit bytes into four 6-bit bytes: 011011 000111 001101 111001
  • Make up the 6bit 0:00 011011 00000111 00001101 00111001
  • Get Base64 code index: 27, 7, 13, 57
  • Get Base64 code: bHN5

What if the value to be encoded is a decimal 1:

  • The binary ASCII representation of 1 is 00110001
  • Change three 8-bit bytes into four 6-bit bytes, and the complement of less than three is 0: 001100 010000 000000 000000
  • Make up the 6bit 0:00 001100 00010000 00000000 00000000
  • Get Base64 code index: 12 16
  • Base64 code: MQ== (where the last two digits are 0, add =)

We can take the result value and look it up in the corresponding Base64 transcoding tool, which should be the same.

So let’s go back to the original problem. Did your accountant calculate it? The char type occupies 1 byte, that is, 8 bits. The number of positive integers that can be stored is 0111 1111, that is, 127

The answer is at the end of the passage

ASCII code reference table

Bin(binary) Oct(octal) Dec(decimal) Hex(Hex) Abbreviation/character explain
0000, 0000, 00 0 0x00 NUL(null) Null character
0000, 0001, 01 1 0x01 SOH(start of headline) The title to start
0000, 0010, 02 2 0x02 STX (start of text) The text start
0000, 0011, 03 3 0x03 ETX (end of text) The body of the end
0000, 0100, 04 4 0x04 EOT (end of transmission) End of transmission
0000, 0101, 05 5 0x05 ENQ (enquiry) request
0000, 0110, 06 6 0x06 ACK (acknowledge) Receive notification
0000, 0111, 07 7 0x07 BEL (bell) Ring the bell
0000, 1000, 010 8 0x08 BS (backspace) backspace
0000, 1001, 011 9 0x09 HT (horizontal tab) Horizontal TAB
0000, 1010, 012 10 0x0A LF (NL line feed, new line) linefeed
0000, 1011, 013 11 0x0B VT (vertical tab) Vertical TAB
0000, 1100, 014 12 0x0C FF (NP form feed, new page) The page key
0000, 1101, 015 13 0x0D CR (carriage return) The enter key
0000, 1110, 016 14 0x0E SO (shift out) Without switching
0000, 1111, 017 15 0x0F SI (shift in) Enable switch
0001, 0000, 020 16 0x10 DLE (data link escape) Data link escape
0001, 0001, 021 17 0x11 DC1 (device control 1) Device Control 1
0001, 0010, 022 18 0x12 DC2 (device control 2) Device Control 2
0001, 0011, 023 19 0x13 DC3 (device control 3) Device Control 3
0001, 0100, 024 20 0x14 DC4 (device control 4) Equipment Control 4
0001, 0101, 025 21 0x15 NAK (negative acknowledge) Refused to accept
0001, 0110, 026 22 0x16 SYN (synchronous idle) Synchronous idle
0001, 0111, 027 23 0x17 ETB (end of trans. block) End transfer block
0001, 1000, 030 24 0x18 CAN (cancel) cancel
0001, 1001, 031 25 0x19 EM (end of medium) End of the medium
0001, 1010, 032 26 0x1A SUB (substitute) Instead of
0001, 1011, 033 27 0x1B ESC (escape) Escape code (overflow)
0001, 1100, 034 28 0x1C FS (file separator) File separator
0001, 1101, 035 29 0x1D GS (group separator) Grouping operators
0001, 1110, 036 30 0x1E RS (record separator) Record separator
0001, 1111, 037 31 0x1F US (unit separator) Cell separator
0010, 0000, 040 32 0x20 (space) The blank space
0010, 0001, 041 33 0x21 ! Exclamation point
0010, 0010, 042 34 0x22 Double quotation marks
0010, 0011, 043 35 0x23 # Well no.
0010, 0100, 044 36 0x24 $ The dollar sign
0010, 0101, 045 37 0x25 % percent
0010, 0110, 046 38 0x26 & And no.
0010, 0111, 047 39 0x27 Close your quotes
0010, 1000, 050 40 0x28 ( Opening parenthesis
0010, 1001, 051 41 0x29 ) Closing parenthesis
0010, 1010, 052 42 0x2A * The asterisk
0010, 1011, 053 43 0x2B + A plus sign
0010, 1100, 054 44 0x2C . The comma
0010, 1101, 055 45 0x2D Minus/dash
0010, 1110, 056 46 0x2E . An end
0010, 1111, 057 47 0x2F / slash
0011, 0000, 060 48 0x30 0 Character 0
0011, 0001, 061 49 0x31 1 Character 1
0011, 0010, 062 50 0x32 2 Character 2
0011, 0011, 063 51 0x33 3 Character 3
0011, 0100, 064 52 0x34 4 4 characters
0011, 0101, 065 53 0x35 5 5 characters
0011, 0110, 066 54 0x36 6 6 characters
0011, 0111, 067 55 0x37 7 7 characters
0011, 1000, 070 56 0x38 8 Character 8
0011, 1001, 071 57 0x39 9 9 characters
0011, 1010, 072 58 0x3A : The colon
0011, 1011, 073 59 0x3B ; A semicolon
0011, 1100, 074 60 0x3C < Less than
0011, 1101, 075 61 0x3D = The equal sign
0011, 1110, 076 62 0x3E > Is greater than
0011, 1111, 077 63 0x3F ? The question mark
0100, 0000, 0100 64 0x40 @ E-mail symbol
0100, 0001, 0101 65 0x41 A Capital letter A
0100, 0010, 0102 66 0x42 B Capital B
0100, 0011, 0103 67 0x43 C Capital C
0100, 0100, 0104 68 0x44 D Capital D
0100, 0101, 0105 69 0x45 E Capital E
0100, 0110, 0106 70 0x46 F Capital F
0100, 0111, 0107 71 0x47 G Capital G
0100, 1000, 0110 72 0x48 H Capital H
0100, 1001, 0111 73 0x49 I Capital I
0100, 1010, 0112 74 0x4A J Capital J
0100, 1011, 0113 75 0x4B K Capital K
0100, 1100, 0114 76 0x4C L Capital LETTER L
0100, 1101, 0115 77 0x4D M Capital M
0100, 1110, 0116 78 0x4E N Capital letter N
0100, 1111, 0117 79 0x4F O Capital O
0101, 0000, 0120 80 0x50 P Capital P
0101, 0001, 0121 81 0x51 Q Capital Q
0101, 0010, 0122 82 0x52 R Capital R
0101, 0011, 0123 83 0x53 S Capital S
0101, 0100, 0124 84 0x54 T Capital LETTER T
0101, 0101, 0125 85 0x55 U Capital U
0101, 0110, 0126 86 0x56 V Capital V
0101, 0111, 0127 87 0x57 W Capital W
0101, 1000, 0130 88 0x58 X Capital X
0101, 1001, 0131 89 0x59 Y Capital Y
0101, 1010, 0132 90 0x5A Z Capital Z
0101, 1011, 0133 91 0x5B [ The open
0101, 1100, 0134 92 0x5C \ The backslash
0101, 1101, 0135 93 0x5D ] Close brackets
0101, 1110, 0136 94 0x5E ^ caret
0101, 1111, 0137 95 0x5F _ The underline
0110, 0000, 0140 96 0x60 ` Order quotes
0110, 0001, 0141 97 0x61 a Lowercase letter A
0110, 0010, 0142 98 0x62 b Lowercase B
0110, 0011, 0143 99 0x63 c Lowercase C
0110, 0100, 0144 100 0x64 d Lowercase D
0110, 0101, 0145 101 0x65 e Lowercase E
0110, 0110, 0146 102 0x66 f Lowercase F
0110, 0111, 0147 103 0x67 g Lowercase G
0110, 1000, 0150 104 0x68 h Lowercase H
0110, 1001, 0151 105 0x69 i Lowercase I
0110, 1010, 0152 106 0x6A j Lowercase J
0110, 1011, 0153 107 0x6B k Lowercase LETTER K
0110, 1100, 0154 108 0x6C l Lowercase LETTER L
0110, 1101, 0155 109 0x6D m Lowercase M
0110, 1110, 0156 110 0x6E n Lowercase letter N
0110, 1111, 0157 111 0x6F o Lowercase o
0111, 0000, 0160 112 0x70 p Lowercase P
0111, 0001, 0161 113 0x71 q Lowercase letter Q
0111, 0010, 0162 114 0x72 r Lowercase R
0111, 0011, 0163 115 0x73 s Lowercase S
0111, 0100, 0164 116 0x74 t Lowercase T
0111, 0101, 0165 117 0x75 u Lowercase U
0111, 0110, 0166 118 0x76 v Lowercase V
0111, 0111, 0167 119 0x77 w Lowercase W
0111, 1000, 0170 120 0x78 x Lowercase x
0111, 1001, 0171 121 0x79 y Lowercase y
0111, 1010, 0172 122 0x7A z Lowercase Z
0111, 1011, 0173 123 0x7B { Flowering brackets
0111, 1100, 0174 124 0x7C | vertical
0111, 1101, 0175 125 0x7D } Closing curly braces
0111, 1110, 0176 126 0x7E ~ The waves,
0111, 1111, 0177 127 0x7F DEL (delete) delete

The Base64 Alphabet Mapping Table

The index Corresponding character The index Corresponding character The index Corresponding character The index Corresponding character
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 +
12 M 29 d 46 u 63 /
13 N 30 e 47 v
14 O 31 f 48 w
15 P 32 g 49 x
16 Q 33 h 50 y

The topic answer

Since a char character takes one byte, that is, 8bit. So the original character binary code length is: 12 * 8 = 96; Because the original length is just a multiple of 3. Binary length converted to 6 bits: 96/6 = 16; If the length is 20:20 times 8/6, it’s not a multiple of 3. So we’re going to have to make it a multiple of 3. So 21 times 8/6 is 28;