Why is there Base64 encoding?

In order to use other languages in the mail and support the transmission of pictures, audio, video and other multimedia content, people designed and extended the mail transmission protocol. Enabling it to support a standard called Multipurpose Internet Mail Extentions (MIME).

In MIME, in order to be able to transmit this content using a text protocol, transcoding the transmitted object is required. Without transcoding, this part of data cannot be directly transmitted in the mail. One important reason is the use of mail transfer protocols. Symbol as the end of the text, direct transmission of binary data may lead to misjudgment, and the complete content of the message cannot be obtained.

There are two most commonly used transcoding methods in MIME: quoted-Printable encoding is mostly used for data containing a large number of 7BIT characters, and Base64 encoding is mostly used for binary data.

How does Base64 encoding work?

Base64 encoding is to transform binary data into 64 predefined characters through a mapping relationship. These 64 characters do not contain control characters, so they can be recognized by mail and other correct transmission.

The 64 character code table index is defined as follows:




How is binary data converted into characters in the code table above?

Usually, the data transmitted in the computer is in bytes (usually 8 bits). We need to group the data to be transcoded in every 3 bytes, that is, a group of 24 bits. The 24 bits are divided into 4 parts, each 6 bits, and each 6 bits forms a new byte (the deficiency of the first two bits is 0). This gives you n groups of 4 bytes of data.

Since each byte after the split has only 6 bits of real meaning, that is, the maximum can only represent 64, so it can be converted into the corresponding characters through the code table above, and then they can be transmitted through the converted characters.

Some of you might be asking, can’t you just break it up into 4 bytes and transfer it? Why do I need to do code table conversion again? This is because although split into 4 bytes, the data still contains control characters, such as 39 in ASCII characters. Number, this is the end of the text in the mail transfer protocol, so it does not meet the requirements, as long as converted into the above code table characters, can be correctly identified.

At the same time, some students will ask, what do we do when the data we want to transmit may not be multiples of 3? For data that is not an integer multiple of 3, we first fill the blank characters, that is, the binary is 00000000, and record the difference of several characters. After the conversion is completed in the above way, the last difference character is replaced with = sign. For example, if the difference is 1 character, the last digit is replaced with =, and if the difference is 2 characters, the last two digits are replaced with ==. Of course, the biggest also can difference only two, as to why, can oneself ponder next ~

Here we use the string hello encoding as an example:




According to the principle of coding, we can also easily know the principle of decoding.

How do you do that in Go?

For encoding, first we need to define a code table, where the numbers in the code table are the corresponding ASCII codes.

var table = []byte{ 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 43, 47,}Copy the code

In fact, it is something like this:

var table = []string{
   "A"."B"."C"."D"."E"."F"."G"."H"."I"."J"."K"."L"."M"."N"."O"."P"."Q"."R"."S"."T"."U"."V"."W"."X"."Y"."Z"."a"."b"."c"."d"."e"."f"."g"."h"."i"."j"."k"."l"."m"."n"."o"."p"."q"."r"."s"."t"."u"."v"."w"."x"."y"."z"."0"."1"."2"."3"."4"."5"."6"."Seven"."8"."9"."+"."/",}Copy the code

We also define a variable to hold the missing character’s replacement character = symbol, 61 being the ASCII code for the = symbol:

var byteEq byte = 61Copy the code

Coding method:

Func Encode(bytes []byte) String {bytesLen := len(bytes) // Calculate the missing bits, that is, the number of bits not enough to block 3 bytes bytesMiss := 0ifbytesLen%3 ! = 0 {bytesMiss = 3 - bytesLen%3} // Fill the missing bits with empty charactersifbytesMiss ! = 0 {fori := 0; i < bytesMiss; I++ {bytes = append(bytes, 0b00000000)} bytesLen += bytesMiss} OriGroup := make([]byte, 3) newGroupIndex := make([]byte, 3) newGroupIndex := make([]byte, NewBytes := make([]byte, bytesLen/3*4) // Group executionfori := 0; i < groupCount; I ++ {// Get the current group's bytecode oriGroup = bytes[I *3: I * 3 + 3] / / get the corresponding index position newGroupIndex [0] = oriGroup [0] > > 2 newGroupIndex [1] = (oriGroup [0] & 0 b00000011 < < 4) | (oriGroup[1] & 0b11110000 >> 4) newGroupIndex[2] = (oriGroup[1] & 0b00001111 << 2) | (oriGroup[2] & 0b11000000 >> 6) NewGroupIndex [3] = oriGroup[2] & 0b00111111 // You can locate newBytes[I *4] = table[newGroupIndex[0]] newBytes[I *4+1] = table[newGroupIndex[1]] newBytes[I *4+2] = Table [newGroupIndex[2]] newBytes[I *4+3] = table[newGroupIndex[3]]} // Replace missing bits with =ifbytesMiss ! = 0 {for i := 1; i <= bytesMiss; i++ {
         newBytes[len(newBytes)-i] = byteEq
      }
   }
   return string(newBytes)
}
Copy the code

Incidentally, we have also implemented the decoding method.

First, we need to define a code table corresponding index, key is the ASCII character:

var tableMap = map[byte]byte{ 65: 0, 66: 1, 67: 2, 68: 3, 69: 4, 70: 5, 71: 6, 72: 7, 73: 8, 74: 9, 75: 10, 76: 11, 77: 12, 78:13, 79:14, 80:15, 81:16, 82:17, 83:18, 84:19, 85:20, 86:21, 87:22, 88:23, 89:24, 90:25, 97:26, 98: 27, 99:28, 100:29, 101:30, 102:31, 103:32, 104:33, 105:34, 106:35, 107:36, 108:37, 109:38, 110:39, 111: 40, 112:41, 113:42, 114:43, 115:44, 116:45, 117:46, 118:47, 119:48, 120:49, 121:50, 122:51, 48:52, 49: 53, 50: 54, 51: 55, 52: 56, 53: 57, 54: 58, 55: 59, 56: 60, 57: 61, 43: 62, 47: 63,}Copy the code

It looks something like this:

var tableMap = map[string]byte {
   "A": 0."B": 1, "C": 2."D": 3."E": 4."F": 5, "G": 6, "H": 7,
   "I": 8, "J":9 , "K": 10,"L": 11."M": 12."N": 13."O": 14."P": 15."Q": 16."R": 17."S": 18."T": 19,"U": 20."V": 21."W": 22."X": 23."Y": 24,"Z": 25."a": 26."b": 27,"c": 28."d": 29."e": 30."f": 31."g": 32."h": 33,"i": 34."j": 35."k": 36."l": 37."m": 38."n": 39,"o": 40."p": 41."q": 42."r": 43."s": 44,"t": 45,"u": 46."v": 47."w": 48."x": 49."y": 50."z": 51."0": 52,"1": 53."2": 54."3": 55."4": 56."5": 57."6": 58."Seven": 59."8": 60."9": 61,"+": 62,"/": 63}Copy the code

The decoded code is as follows:

Func Decode(bytes []byte) (string, error) {// If the data is not divisible by 4, it is not normal Base64 encoded dataif len(bytes)%4 > 0 {
      return "", errors.New("not valid base64 code"ByteMiss := 0 var ok bool // Only last missing character conversion code =, var missLock boolfor i := len(bytes) - 1; i >= 0; i-- {
      if bytes[i] == byteEq {
         if missLock {
            return "", errors.New("only the end could have =")
         }
         byteMiss += 1
         bytes[i] = 0
      } else {
         missLock = trueBytes [I], ok = tableMap[bytes[I]]if! ok {return "", errors.New("not a valid base64 byte"// The number of missing bits cannot exceed 2if byteMiss > 2 {
      return "", errors.New("byte miss more than 2"} // Define an original group and a new group, OriGroup := make([]byte, 4) newGroup := make([]byte, 3) groupCount := len(bytes) / 4 newBytes := make([]byte, len(bytes)/4*3)fori := 0; i < groupCount; I ++ {oriGroup = bytes[I *4: I * 4 + 4] / / according to the rules to compute newGroup [0] = (oriGroup [0] < < 2) | (oriGroup [1] & 0 b00110000 > > 4) newGroup [1] = (oriGroup [1] 0b00001111 << 4) | (oriGroup[2] & 0b00111100 >> 2) newGroup[2] = (oriGroup[2] & 0b00000011 << 6) | oriGroup[3] NewBytes [3* I] = newGroup[0] newBytes[3* I +1] = newGroup[1] newBytes[3* I +2] = newGroup[2]} // If there are missing bits,if byteMiss > 0 {
      newBytes = newBytes[0:len(newBytes) - byteMiss]
   }
   return string(newBytes), nil
}
Copy the code

Above, we use Go language to achieve a Base64 codec package, we can easily call the inside of the codec method to achieve Base64 data codec.