Recently, when learning the relevant knowledge of source-Map, I became deeply interested in the coding generation process of its mapping field. After consulting various materials, I decided to implement my own base64, VLQ and Base64-VLQ coding methods.

This article does not involve the knowledge popularization and explanation of source-Map, but only focuses on the principle and implementation of three coding functions. Therefore, you are required to have a certain background of source-map knowledge. If you need to supplement knowledge in advance, these articles are strongly recommended:

  • Swastika: This article is enough about Sourcemap
  • Amazing, I didn’t expect a Source map to involve so many blind spots

Base64 encoding

concept

Base64 encoding, that is, the use of [0-9A-ZA-Z +/] this 64 characters and as a complement of ‘=’ a total of 65 bits as the basic character set, the other characters in a group of three, through a variety of ways into a base64 character encoding method.

Why these 64 basic characters?

We know that any data in a computer is stored in bytes. Some complex data (such as Chinese characters) may be represented by multiple bytes, while simple data (such as a single English character) may be represented by one byte. The byte is the basic unit of storage in memory. Each byte is represented as an 8-bit binary (00000000)

We know that a byte can be represented in the range of 0 to 255, where ASCII values are in the range of 0 to 127; Values between 128 and 255 beyond the ASCII range are invisible characters. Of course, not all ASCII characters are visible. There are only 95 visible characters (ranging from 32 to 126) in ASCII, and the rest of ASCII is invisible.

When the invisible byte stream transmission in the network, computer, for example from A computer to B often go through multiple routing equipment, * * due to the different equipment (in particular the routing of the old equipment) to the byte stream processing mode is A bit different, so those invisible bytes may be handling errors, * * this is unfavorable to transfer. So the data first do a Base64 encoding, all into visible bytes, that is, ASCII code can represent the visible characters, to ensure reliable data transmission. The content of base64 is composed of 0 ~ 9, A ~ z, A ~ z, +, /, exactly 64 characters, which are part of the 95 visible characters in the ASCII representable range.

Source: www.zhihu.com/question/36…

Although there are 95 visible characters, 2^7=128 > 95 > 2^6=64, and using 7-bit binary to represent 95 bits is somewhat wasteful. In 6-bit binary notation, some visible characters will not be used, but it is acceptable. So base64 content is 0 to 9, a to z, a to z, +, /, exactly 64 characters.

Note ⚠️ : this is not really a 6-bit binary representation of base64 characters. Base64 characters are still one byte, or 8-bit binary, except for the first two digits, which are 00 by default. Personally, I guess that’s why we only use 64 bits, because if we go beyond 64, the first two won’t both be 00.

The 65 characters actually used. But the presence of = as a complement is not included in the actual 64 characters that are valid.

Why convert 3 primitives to 4 bits of base64?

Since 6-bit binary base64 characters are required to represent 8-bit binary primitives, the least common multiple is 24, so that four base64 characters can represent three primitives.

This is why the converted Base64 file is about 33% larger than the original file. I only have 3 bytes, you turned them into 4 bytes, can not be big

What if the length of the string is not a multiple of 3? I’ll answer it for you later.

Javascript built-in methods

In JavaScript, there are two functions that handle decoding and encoding base64 strings, respectively:

  • atob()
  • btoa()

The atob() function decodes base-64 encoded string data. In contrast, the btoa() function creates a BASE-64 encoded ASCII string from the binary data “string.” Later examples can be verified using both methods.

Coding principle

Let me take the string “ABC” as an example.

The conversion process is mainly divided into three steps:

  1. Converting three original characters into 8-bit binary representation yields 24-bit binary.
  2. Divide the 24-bit binary into four 6-bit binary parts.
  3. After converting each 6-bit binary to a number, the base64 character per representation is obtained according to the Base64 character set.

Some articles here will say that extending a 6-bit binary with 00 in front of it to a 1-byte 8-bit binary will get the correct value with or without js. So I didn’t add it here. The benefits of using only 6 bits of binary will also be explained later.

If the string length is less than 3

Case 1: Length 1, using “A” as an example

Converts a primitive character into two sets of 6-bit binaries, the last set followed by four zeros. This results in two bits of Base64 encoding, followed by two complement symbols “=” at the end.

Case 2: Length 2. Take “AB” as an example

Converts two original characters into three sets of 6-bit binaries, the last of which is followed by two zeros. This results in a three-digit Base64 encoding, followed by a complement symbol “=” at the end.

Transformation process

Code implementation

const base64Chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'

const ascii2base64 = (str) = > {
	let baseStr = ' '
	// Cut the original string into a group of 3 characters. 'ABCD' => 'ABC' 'D'
	const strArr = str.match(/. / g {1, 3}s)
	if(! strArr)return baseStr

	strArr.forEach((str) = > {
		// Split string into A single character 'ABC' => 'A' 'B' 'c'
		const chars = str.split(' ')
		const len = str.length

		// Convert characters to 8-bit binary 'A' => 01000001
		const charsCodes = chars.map((char) = > extraASCIICode(char))
		console.log('charsCodes:', charsCodes)

		// Split 8-bit binary into 6-bit binary and complete 0
		const base64Codes = split8To6(charsCodes)
		console.log('base64Codes:', base64Codes)

		// Cyclically cut the 6-bit binary, convert the binary into corresponding digits, and splice the corresponding characters in the base64 character set
		base64Codes.map((code) = > (baseStr += base64Chars[parseInt(code, 2)))if (len < 3) {
			baseStr += len === 1 ? '= =' : '=' // Perform the last bit complement operation with a length less than 3
		}
		return baseStr
	})

	return baseStr
}

const extraASCIICode = (char) = > {
	let binary8 = char.codePointAt(0).toString(2) // 'A' => 65 => 01000001
	while (binary8.length < 8) {
		binary8 = ` 0${binary8}` // js will not output the full 8-bit binary, need to complete 0
	}
	return binary8
}

const split8To6 = (charsCodes) = > {
	// Split three groups of 8-bit binaries into 24-bit binaries and four groups of 6-bit binaries
	const binary6s = charsCodes.join(' ').match(/ [01] {1, 6} / g)
	if(! binary6s)return []

	const len = charsCodes.length
	switch (len) {
		case 1:
			binary6s[1] = `${binary6s}0000 ` // Case 1 when the string length is 1
			break

		case 2:
			binary6s[2] = `${binary6s}00 ` // Case 2 when the string length is 2
			break
	}

	return binary6s
}

console.log(ascii2base64('ABC'))
console.log(ascii2base64('AB'))
console.log(ascii2base64('AC'))
Copy the code

Base64 ASCII

The method is to reverse the above process, which is not shown here, and you can implement it yourself if you are interested.

VLQ coding

concept

VLQ, short for variable-length quantity, is a way of coding large numeric values in a compact way by using arbitrary binary bits.

Coding principle

There are five main steps in transformation:

  1. Convert a number to an N-bit binary.
  2. Check whether n is a multiple of 7 -1. If not, prefill with 0 until it is a multiple of 7 -1.
  3. The extended binary partition unit is divided into 6-bit binary for the first unit and 7-bit binary for the rest.
  4. Each cell is preceded by an extension that indicates whether the number represented is over, 1 indicates that it is not over, and 0 indicates that it is over.
  5. The first element is extended by one bit. This bit represents the value plus or minus, with 1 being minus and 0 positive.

Code implementation

const num2vlq = (num) = > {
	let binary = num.toString(2); // switch to binary
	while (binary.length % 7 < 6) {
		binary = ` 0${binary}`; // add 0 in front
	}

	let binaryArr = [];
	binaryArr[0] = binary.substring(0.6); // Divide the first unit
	const othersBinary = binary.substring(6).match(/[01]{7}/g) | | [];// Divide other unitsbinaryArr.push(... othersBinary);// So the unit precedes the end bit
	binaryArr = binaryArr.map((item, i, arr) = > `${arr.length - 1 === i ? 0 : 1}${item}`);
	// the first unit is followed by positive and negative bits
	binaryArr[0] = `${binaryArr[0]}${num >= 0 ? 0 : 1}`;

	return binaryArr;
};

console.log(num2vlq(255));
Copy the code

Base64 VLQ coding

concept

After the value is converted to VLQ encoding, it is transcoded by Base64 to realize the ability of representing the value by Base64.

Coding principle

Note ⚠️ : Base64 is a character set represented by 6-bit binary, so to save one step of conversion, the VLQ representation is converted from 8-bit binary to 6-bit binary, that is, when VLQ is extended, the extension is a multiple of 5 -1, when the cell is divided, the first cell is 4 bits, and the rest are 5 bits.

There are two main steps in transformation:

  1. Convert numeric values to VLQ encoding for 6 bit binary representation.

  2. Convert 6-bit binary VLQ encoding to Base64 encoding.

Complete conversion process

Code implementation

const base64Chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';

const base64VLQ = (num) = > {
	let binary = num.toString(2); // switch to binary
	while (binary.length % 5 < 4) {
		binary = ` 0${binary}`; // add 0 in front
	}

	let binaryArr: string[] = [];
	binaryArr[0] = binary.substring(0.4); // Divide the first unit
	const othersBinary = binary.substring(4).match(/[01]{5}/g) | | [];// Divide other unitsbinaryArr.push(... othersBinary);// So the unit precedes the end bit
	binaryArr = binaryArr.map((item, i, arr) = > `${arr.length - 1 === i ? 0 : 1}${item}`);
	// the first unit is followed by positive and negative bits
	binaryArr[0] = `${binaryArr[0]}${num >= 0 ? 0 : 1}`;

	// Switch to base64 encoding
	let baseStr = ' ';
	binaryArr.map((code) = > (baseStr += base64Chars[parseInt(code, 2)));return baseStr;
};
console.log(base64VLQ(255));
Copy the code

reference

  • Swastika: This article is enough about Sourcemap
  • Amazing, I didn’t expect a Source map to involve so many blind spots
  • Why use Base64 encoding, and what are the situational requirements?
  • base64 MDN
  • Base64 notes