The origin of Base64 encoding

Why Base64 encoding? Because some network transmission channels do not support all bytes, for example, traditional mail only supports visible characters, such as ASCII control characters cannot be sent through the mail. This is very limited, for example, every byte in the binary stream of an image cannot be all visible characters, so it cannot be transmitted. The best way to do this is to make an extension scheme to support binary transfer without changing the traditional protocol. The problem is solved when unprintable characters can also be represented as printable characters. The Base64 encoding is a representation of binary data based on 64 printable characters.

Ii. Base64 encoding principles

Most of our images are data: Images that can be converted to Base64 encoding. This is especially useful when saving the Canvas as IMG. Modern browsers already support native Base64-based encode and decode, such as BTOA and ATOB

Looking at the Base64 index table, the characters are “A-z, A-z, 0-9, +, /” 64 printable characters. The value represents the index of the character, which is specified by the standard Base64 protocol and cannot be changed. All 64 characters can be represented in 6 bits. Each byte has 8 bits. The remaining two bits are wasted, so some space has to be sacrificed. The thing to understand here is that a Base64 character has eight bits, but only the right six bits are valid, and the left two are always zeros.

So how do you use six valid bits to represent the eight bits of a traditional character? The least common multiple of 8 and 6 is 24, which means that three traditional bytes can be represented by four Base64 characters, ensuring the same number of significant bits. This makes up for Base64’s six valid bits. You can also say that two Base64 characters can represent a traditional character, but using the least common multiple scheme is the least wasteful. It is easy to understand with the diagram below. Man is three characters with a total of 24 valid bits. You have to use four Base64 characters to make up the 24 valid bits. The red box indicates the corresponding Base64. The six valid bits are converted into corresponding index values and then corresponding to the Base64 character table. The Base64 character corresponding to “Man” is “TWFU”. Now, there’s a rule here that I don’t know if you’ve noticed, but the smallest unit of conversion to Base64 is three bytes, so it’s three bytes every time for a string. Three bytes conversion, which corresponds to the four bytes of Base64. That’s pretty much all you need to know.

But what happens at the end of the conversion when you find that you don’t have three bytes? Finally, we can use two base64s to represent one character or three base64s to represent two characters. As shown in the following figure, the second Base64 has only two bits, and the last four bits are complemented by zeros. So the Base64 character for A is QQ. As mentioned above, the principle is that Base64 characters are in minimum units of four characters, so this is only two characters, followed by two “=”. In fact, “=” is not used and does not delay decoding, and the reason for using “=” is probably because the combination of multiple encoded Base64 strings does not cause confusion. It follows that Base64 strings can only end with one or two “=” s, but not with a “=” s in the middle. The encoding process for the character “BC” in the figure below is the same.

Iii. Concrete implementation:

Let’s see, how do you use Base64 transcoding in javascript

var str = 'javascript';

window.btoa(str)
//Transcoding results"amF2YXNjcmlwdA=="

window.atob("amF2YXNjcmlwdA==")
//The decoding results"javascript"
Copy the code

For transcoding, Base64 transcoding objects can only be strings, so there are certain limitations for other data, especially for Unicode transcoding.

var str = "China, China"
window.btoa(str)
Copy the code

Uncaught DOMException: Failed to execute ‘btoa’ on ‘Window’: The string to be encoded contains characters outside of the Latin1 range.

It is obvious that this way is not enough, so how do you let him support Chinese characters, this is about to use Windows, encodeURIComponent and window. DecodeURIComponent

var str = "China, China";

window.btoa(window.encodeURIComponent(str))
//"Q2hpbmElRUYlQkMlOEMlRTQlQjglQUQlRTUlOUIlQkQ="

window.decodeURIComponent(window.atob('Q2hpbmElRUYlQkMlOEMlRTQlQjglQUQlRTUlOUIlQkQ='))
//"China, China"Copy the code

function Base64() {
 
	// private property
	_keyStr = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";
 
	// public method for encoding
	this.encode = function (input) {
		var output = "";
		var chr1, chr2, chr3, enc1, enc2, enc3, enc4;
		var i = 0;
		input = _utf8_encode(input);
		while (i < input.length) {
			chr1 = input.charCodeAt(i++);
			chr2 = input.charCodeAt(i++);
			chr3 = input.charCodeAt(i++);
			enc1 = chr1 >> 2;
			enc2 = ((chr1 & 3) << 4) | (chr2 >> 4);
			enc3 = ((chr2 & 15) << 2) | (chr3 >> 6);
			enc4 = chr3 & 63;
			if (isNaN(chr2)) {
				enc3 = enc4 = 64;
			} else if (isNaN(chr3)) {
				enc4 = 64;
			}
			output = output +
			_keyStr.charAt(enc1) + _keyStr.charAt(enc2) +
			_keyStr.charAt(enc3) + _keyStr.charAt(enc4);
		}
		return output;
	}
 
	// public method for decoding
	this.decode = function (input) {
		var output = "";
		var chr1, chr2, chr3;
		var enc1, enc2, enc3, enc4;
		var i = 0;
		input = input.replace(/[^A-Za-z0-9\+\/\=]/g, "");
		while (i < input.length) {
			enc1 = _keyStr.indexOf(input.charAt(i++));
			enc2 = _keyStr.indexOf(input.charAt(i++));
			enc3 = _keyStr.indexOf(input.charAt(i++));
			enc4 = _keyStr.indexOf(input.charAt(i++));
			chr1 = (enc1 << 2) | (enc2 >> 4);
			chr2 = ((enc2 & 15) << 4) | (enc3 >> 2);
			chr3 = ((enc3 & 3) << 6) | enc4;
			output = output + String.fromCharCode(chr1);
			if(enc3 ! = 64) { output = output + String.fromCharCode(chr2); }if(enc4 ! = 64) { output = output + String.fromCharCode(chr3); } } output = _utf8_decode(output);return output;
	}
 
	// private method for UTF-8 encoding
	_utf8_encode = function (string) {
		string = string.replace(/\r\n/g,"\n");
		var utftext = "";
		for (var n = 0; n < string.length; n++) {
			var c = string.charCodeAt(n);
			if (c < 128) {
				utftext += String.fromCharCode(c);
			} else if((c > 127) && (c < 2048)) {
				utftext += String.fromCharCode((c >> 6) | 192);
				utftext += String.fromCharCode((c & 63) | 128);
			} else{ utftext += String.fromCharCode((c >> 12) | 224); utftext += String.fromCharCode(((c >> 6) & 63) | 128); utftext += String.fromCharCode((c & 63) | 128); }}return utftext;
	}
 
	// private method for UTF-8 decoding
	_utf8_decode = function (utftext) {
		var string = "";
		var i = 0;
		var c = c1 = c2 = 0;
		while ( i < utftext.length ) {
			c = utftext.charCodeAt(i);
			if (c < 128) {
				string += String.fromCharCode(c);
				i++;
			} else if((c > 191) && (c < 224)) {
				c2 = utftext.charCodeAt(i+1);
				string += String.fromCharCode(((c & 31) << 6) | (c2 & 63));
				i += 2;
			} else{ c2 = utftext.charCodeAt(i+1); c3 = utftext.charCodeAt(i+2); string += String.fromCharCode(((c & 15) << 12) | ((c2 & 63) << 6) | (c3 & 63)); i += 3; }}returnstring; }}Copy the code

3. To summarize

It may seem strange to speak of Base64 encoding, since most encodings involve a conversion from character to binary, and the conversion from binary to character process is called decoding. In Base64, the concept is reversed. The transition from binary to character is called encoding, and the transition from character to binary is called decoding.

Base64 encoding is used to transmit, store, represent binary, and so on. It can also be used for encryption, but it is simple enough to not know what is going on at first glance, or it can be customized to encrypt the sequence of characters in Base64.

Base64 encoding is a process from binary to character. For example, when some Chinese characters are converted into binary by different encodings, the resulting binary is different, so the resulting Base64 characters are also different. For example, the Base64 encoding of “Internet access” in UTF-8 is “5LiK572R”, and the Base64 encoding of GB2312 is “YC /N+A==”.