Summary of Buffer

Before the introduction of TypedArray in ES6, the JavaScript language had no mechanism for reading or manipulating binary data streams. The Buffer class was introduced as part of the NodeJS API to make it possible to handle binary data streams in scenarios such as TCP streams or file system operations. The Buffer is a Global object and does not need to be imported. The size of the Buffer is determined when it is created and cannot be adjusted.

Create a Buffer

Prior to NodeJS v6.0.0, Buffer instances were created using the Buffer constructor using the new keyword, which returned different buffers depending on the parameters provided, but this declaration has been deprecated in later versions. There are several ways to create new alternatives.

Alloc and Buffer. AllocUnsafe

Alloc and buffer. allocUnsafe are the same for creating a Buffer. The parameter is the length of the Buffer and its numeric type.

Alloc and buffer. allocUnsafe create buffers
1, 2, 3, 4, 5, 6, 7, 8Copy the code
// buffer. alloc Creates Bufferletbuf1 = Buffer.alloc(6); // buffer. allocUnsafe Creates a Bufferlet buf2 = Buffer.allocUnsafe(6);

console.log(buf1); // <Buffer 00 00 00 00 00 00>
console.log(buf2); // <Buffer 00 e7 8f a0 00 00>
Copy the code

Alloc and allocUnsafe are different for creating a Buffer. The Buffer created by buffer. alloc is initialized, and each item is filled with 00. Buffer.allocUnsafe, on the other hand, is an uninitialized Buffer that can be used immediately if it is not already in memory.

AllocUnsafe Creating buffers makes memory allocation very fast, but the allocated memory segment may contain potentially sensitive data, which has obvious performance advantages and is unsafe. Use buffers with caution.

2, Buffer. The from

Buffer.from supports three parameter transmission modes:

  • The first argument is a string, and the second argument is a character encoding, as inASCII,UTF-8,Base64And so on.
  • Pass in an array, and each item of the array is stored in hexadecimal as each item of the Buffer.
  • When a Buffer is passed in, each entry of the Buffer is treated as each entry of the newly returned Buffer.

Passing in the string and character encoding:

Pass in the string and character encoding
1
2
3
Copy the code
let buf = Buffer.from("hello"."utf8");

console.log(buf); // <Buffer 68 65 6c 6c 6f>
Copy the code

Passing an array:

Array members are decimal numbers
1
2
3
Copy the code
let buf = Buffer.from([1, 2, 3]);

console.log(buf); // <Buffer 01 02 03>
Copy the code
Array members are hexadecimal numbers
1
2
3
4
Copy the code
let buf = Buffer.from([0xe4, 0xbd, 0xa0, 0xe5, 0xa5, 0xbd]);

console.log(buf); // <Buffer e4 bd a0 e5 a5 bd>
console.log(buf.toString("utf8")); / / how are youCopy the code

In NodeJS, GB2312 encoding is not supported, and UTF-8 is supported by default. In GB2312, a Chinese character is two bytes, while in UTF-8, a Chinese character is three bytes, so the “hello” Buffer above is composed of six hexadecimal numbers.

Array members are strings of numbers
1
2
3
Copy the code
let buf = Buffer.from(["1"."2"."3"]);

console.log(buf); // <Buffer 01 02 03>
Copy the code

The array member passed in can be any numeric value. If the member is a string, it is automatically identified as a numeric value. If the value is not a number or the member is some other non-numeric data type, the member is initialized to 00.

The Buffer created can be converted using the toString method to specify the encoding directly. The default encoding is UTF-8.

The incoming Buffer:

Passing in a Buffer
1, 2, 3, 4, 5, 6, 7, 8Copy the code
let buf1 = Buffer.from("hello"."utf8");

let buf2 = Buffer.from(buf1);

console.log(buf1); // <Buffer 68 65 6c 6c 6f>
console.log(buf2); // <Buffer 68 65 6c 6c 6f>
console.log(buf1 === buf2); // true
console.log(buf1[0] === buf2[0]); // false
Copy the code

When a Buffer is passed in, a new Buffer is created and each of its members is copied.

A Buffer is a reference type. One Buffer copies the members of another Buffer. When the copied members of one Buffer change, the corresponding members of the other Buffer change because they refer to the same reference, similar to a “two-dimensional array”.

Buffers are analogous to two-dimensional arrays
One, two, three, four, fiveCopy the code
let arr1 = [1, 2, [3]];
letarr2 = arr1.slice(); arr2[2][0] = 5; console.log(arr1); / / [1, 2, [5]]Copy the code

Common methods of Buffer

1, the fill

The fill method of a Buffer can fill a Buffer with data and supports passing in three parameters:

  • Value: data to be populated.
  • Start: indicates the start position of filling data. The default value is not specified0;
  • End: Indicates the end position of the fill data. The default Buffer length is not specified.
1
2
3
4
Copy the code
let buf = Buffer.alloc(3);

buf.fill(1);
console.log(buf); // <Buffer 01 01 01>
Copy the code
1
2
3
4
Copy the code
let buf = Buffer.alloc(6);

buf.fill(1, 2, 4);
console.log(buf); // <Buffer 00 00 01 01 00 00>
Copy the code

As can be seen from the above code, the filling data is “before the packet is filled, not after the packet”. The first parameter of fill can also be multiple bytes, starting from the start position of the filled Buffer to the end, these bytes will be filled in cycle. The remaining position is not enough to fill these bytes, so it will be filled wherever it is. If the end position specified by Fill is greater than the length of the Buffer, a RangeError is raised.

1
2
3
4
Copy the code
let buf = Buffer.alloc(6);

buf.fill("abc", 1, 5);
console.log(buf); // <Buffer 00 61 62 63 61 00>
Copy the code
1
2
3
4
Copy the code
let buf = Buffer.alloc(3);

buf.fill("abc", 4, 8);
console.log(buf); // throw new errors.RangeError('ERR_INDEX_OUT_OF_RANGE');
Copy the code

2, slice

The slice method of Buffer is used in exactly the same way as the slice method of array. I believe that the slice of array is familiar enough, so I won’t go into details here.

1, 2, 3, 4, 5, 6, 7, 8, 9Copy the code
let buf = Buffer.from("hello"."utf8");

let a = buf.slice(0, 2);
let b = buf.slice(2);
let b = buf.slice(-2);

console.log(a.toString()); // he
console.log(b.toString()); // llo
console.log(c.toString()); // o
Copy the code

3, indexOf

The indexOf Buffer is used in a similar way to the indexOf arrays and strings. The first parameter is the item to be searched for, and the second parameter is the start of the search. The searched byte must have the same consecutive bytes in the Buffer. Returns the index of the first consecutive byte, or -1 if no bytes are found.

One, two, three, four, fiveCopy the code
let buf = Buffer.from("How * * are you?"."utf8");

console.log(buf); // <Buffer e4 bd a0 2a e5 a5 bd 2a e5 90 97>
console.log(buf.indexOf("*")); // 3
console.log(buf.indexOf("*", 4)); / / 7Copy the code

4, copy

The copy Buffer method is used to copy bytes from one Buffer to another, taking four arguments:

  • Target: indicates the target Buffer
  • TargetStart: Start position of the target Buffer
  • SourceStart: The starting position of the source Buffer
  • SourceEnd: Indicates the end position of the source Buffer
The container Buffer length is sufficient
One, two, three, four, five, six, sevenCopy the code
let targetBuf = Buffer.alloc(6);
let sourceBuf = Buffer.from("Hello"."utf8"); // Copy hello to targetBufsourceBuf.copy(targetBuf, 0, 0, 6); console.log(targetBuf.toString()); / / how are youCopy the code
The container Buffer length is insufficient. Procedure
One, two, three, four, fiveCopy the code
let targetBuf = Buffer.alloc(3);
let sourceBuf = Buffer.from("Hello"."utf8");

sourceBuf.copy(targetBuf, 0, 0, 6); console.log(targetBuf.toString()); / / youCopy the code

In the second example above, the entire source Buffer was copied into the target Buffer, but because the target Buffer was only 3 in length, only one “you” was copied into the target Buffer.

Unlike arrays, buffers cannot be changed by manipulating length and indexes. Once created, buffers remain the same length.

Array comparison Buffer – operation length
12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17Copy the code
/ / arraylet arr = [1, 2, 3];
arr[3] = 4;
console.log(arr); // [1, 2, 3, 4]

arr.length = 5;
console.log(arr); // [1, 2, 3, 4, empty]


// Buffer
letbuf = Buffer.alloc(3); buf[3] = 0x00; console.log(buf); // <Buffer 00 00 00> buf.length = 5; console.log(buf); // <Buffer 00 00 00> console.log(buf.length); / / 3Copy the code

As you can see from the above code, arrays can be changed by length and index, but similar operations in Buffer do not take effect.

Polyfill the copy method:

Simulated copy method
One, two, three, four, fiveCopy the code
Buffer.prototype.myCopy = function (target, targetStart, sourceStart, sourceEnd) {
    for(let i = 0; i < sourceEnd - sourceStart; i++) {
        target[targetStart + i] = this[sourceStart + i]; }}Copy the code

5, Buffer. Concat

Similar to arrays, there is a concat method for concatenating multiple buffers, except that the concat in buffers is not an instance method, but a static method that is called through buffer. concat and takes different parameters.

Buffer.concat takes two arguments and returns a new Buffer:

  • The first argument is an array in which each member is a Buffer.
  • The second argument represents the length of the new Buffer. The default value is the sum of the lengths of each Buffer in the array.

Buffer.concat concatenates the buffers in the array, stores them in a new Buffer and returns them. If the second parameter specifies the length of the returned Buffer, the return value is stored in bytes before concatenating.

1
2
3
4
5
6
7
8
9
10
11
Copy the code
let buf1 = Buffer.from("You"."utf8");
let buf2 = Buffer.from("Good"."utf8");

let result1 = Buffer.concat([buf1, buf2]);
letresult2 = Buffer.concat([buf1, buf2], 3); console.log(result1); // <Buffer e4 bd a0 e5 a5 bd> console.log(result1.toString()); / / hello console. The log (result2); // <Buffer e4 bd a0> console.log(result2.toString()); / / youCopy the code

Buffer.concat method Polyfill:

Analog Buffer. Concat
12 3 4 5 6 7 8 9 10 11 12 13 14 15Copy the code
Buffer.myConcat = function(bufferList, len) {/ / new Buffer length len = len | | bufferList. Reduce ((prev, next) = > prev + next. The length, 0).letnewBuf = Buffer.alloc(len); // Create a new Bufferletindex = 0; Bufferlist. forEach(buf => {buf.myCopy(newBuf, index, 0, buf.length); index += buf.length; });return newBuf;
}
Copy the code

6, Buffer. IsBuffer

Buffer.isBuffer is used to determine whether an object is a Buffer and returns a Boolean value.

One, two, three, four, fiveCopy the code
let obj = {};
let buf = Buffer.alloc(6);

console.log(Buffer.isBuffer(obj)); // false
console.log(Buffer.isBuffer(buf)); // true
Copy the code

Encapsulate a split

The split method in a string is commonly used. You can use a delimiter to split a string into several parts and store them in an array. Buffer itself does not have a split method, but similar scenarios can be used, so we encapsulate a split in Buffer.

The split method of a Buffer takes a separator, which may be the contents of one or more bytes, returns an array of values, and the split portion is stored as separate buffers in the returned array.

Split method to encapsulate Buffer
12 3 4 5 6 7 8 9 10 11 12 13 14 15 16Copy the code
Buffer.prototype.split = function (sep) {
    letlen = Buffer.from(sep).length; // The number of bytes in the delimiterletresult = []; // The array returnedletstart = 0; // Find the starting position of Bufferletoffset = 0; // Offset // loop to find delimiterswhile((offset = this.indexOf(sep, start)) ! == -1) {result.push(this.slice(start, offset)); start = offset + len; } // Handle the restreturn result.push(this.slice(start));
}
Copy the code

Verify split method:

Validation of the split
1, 2, 3, 4, 5, 6, 7, 8, 9, 10Copy the code
let buf = Buffer.from("Harden loves basketball and nightclubs."."utf8");
let bufs = buf.split("Love"); console.log(bufs); // [ <Buffer e5 93 88 e7 99 bb>, // <Buffer e7 af ae e7 90 83>, // <Buffer e5 a4 9c e5 ba 97> ] newBufs = bufs.map(buf => buf.toString()); console.log(newBufs); / / /'running'.'basketball'.'club' ]
Copy the code

Buffer encoding conversion

We know that the default encoding in NodeJS is UTF-8, and GB2312 encoding is not supported. If we have a GB2312 TXT file with the content of “Hello”, now we use NodeJS to read it. Because utF-8 and GB2312 encoding Chinese characters occupy different bytes, so the read content cannot be parsed, that is, it is garbled.

1
2
3
4
5
6
7
8
9
10
11
Copy the code
// introduce dependency const fs = require("fs");
const path = require("path");

let buf = Buffer.from("Hello"."utf8");
let result = fs.readFileSync(path.resolve(__dirname, "a.txt")); console.log(buf); // <Buffer e4 bd a0 e5 a5 bd> console.log(buf.toString()); / / hello console. The log (result); // <Buffer c4 e3 ba c3> console.log(result.toString()); / / � � �Copy the code

If you have to parse such content correctly in NodeJS, there is a solution to this problem. We need to use the IconV-Lite module, which encodes and decodes a Buffer according to the specified encoding format.

The iconV-Lite module is provided by a third party. You need to install iconV-Lite before using it. Run the following command to install iconV-Lite:

npm install iconv-lite

To correctly read the contents of files in other encoding formats, the above code should be changed to:

1, 2, 3, 4, 5, 6, 7, 8Copy the code
// introduce dependency const fs = require("fs");
const path = require("path");
const iconvLite = require("iconv-lite");

let result = fs.readFileSync(path.resolve(__dirname, "a.txt"));

console.log(iconvLite.decode(result, "gb2312")); / / how are youCopy the code

Remove the BOM header

TXT files that read GB2312 encoding above can also be re-saved as UTF-8 by opening the file or manually modified to UTF-8 with the editor directly. At this time, the files read do not need to be transformed into encoding, but new problems will occur.

Produce boms head
1, 2, 3, 4, 5, 6, 7, 8, 9Copy the code
// introduce dependency const fs = require("fs");
const path = require("path");

let buf = Buffer.from("Hello"."utf8");
let result = fs.readFileSync(path.resolve(__dirname, "a.txt"));

console.log(buf); // <Buffer e4 bd a0 e5 a5 bd>
console.log(result); // <Buffer ef bb bf e4 bd a0 e5 a5 bd>
Copy the code

After manually modifying the encoding of the TXT file, the above code is executed. It is found that the read Buffer has three more bytes than normal. As long as there is modification of the encoding of the file, extra bytes will be generated in the front of the file, called BOM head.

The BOM header is a marker used to determine which Unicode encoding a text file is, and is itself a Unicode character located at the head of a text file.

Although BOM head have the effect of the tag file code, but it is not part of the contents of the file, so will produce some problems, such as file cannot be read correctly after code changes, or multiple files in the process of merger, the middle will be mixed with these extra content, so in NodeJS source file operations, In iconV-Lite, the module of Buffer encoding conversion, and when Webpack compiles project files, the BOM header is removed.

In order for the above code to correctly read and parse the contents of the file whose encoding has been manually changed, we also need to remove the BOM header.

Method of removing BOM header
12 3 4 5 6 7 8 9 10 11 12 13 14 15Copy the code
function BOMStrip(result) {
    if(buffer.isbuffer (result)) {// If the contents read are Bufferif(result[0] === 0xef && result[1] === 0xBB && result[2] === 0xbf) {// If the first three bytes are the same as the first three bytes of the BOM header, remove the BOM headerreturnBuffer.slice(3); }}else{// If not Bufferif(result.charcodeat (0) === 0xfeff) {// Check whether the first item is the same as the hex of the BOM header, remove the BOM headerreturnresult.slice(1); }}}Copy the code

Example of using the method of removing the BOM header and verifying the above read file:

Verify the method of removing BOM headers
1, 2, 3, 4, 5, 6, 7, 8, 9, 10Copy the code
// introduce dependency const fs = require("fs");
const path = require("path"); // There are two ways to read fileslet result1 = fs.readFileSync(path.resolve(__dirname, "a.txt"));
let result2 = fs.readFileSync(path.resolve(__dirname, "a.txt"), "utf8"); console.log(BOMStrip(result1).toString()); / / hello console. The log (BOMStrip (result2)); / / how are youCopy the code

The cache Buffer

Generate garbled characters
One, two, three, four, five, six, sevenCopy the code
let buf = Buffer.from("Hello"."utf8");

let a = buf.slice(0, 2);
letb = buf.slice(2, 6); console.log(a.toString()); / / � the console. The log (b.t oString ()); / / �Copy the code

Utf-8 encoding, one Chinese character is three bytes. Slice method is used to intercept a Buffer expressing Chinese characters. If the intercepted length is not an integer multiple of 3, it cannot be correctly parsed and garbled characters will be displayed. Buffers that do not form Chinese characters can be cached using the string_decoder module. String_decoder is the core module and does not need to be installed.

The cache Buffer
12 3 4 5 6 7 8 9 10 11 12 13Copy the code
// introduce dependency const {StringDecoder} = require("string_decoder");

let buf = Buffer.from("Hello"."utf8");

let a = buf.slice(0, 2);
letb = buf.slice(2, 6); // Create StringDecoder instanceletsd = new StringDecoder(); console.log(sd.write(a)); console.log(sd.write(b)); / / how are youCopy the code

After string_decoder is used in the above code, if the intercepted Buffer cannot form a Chinese character, it will not be printed and cached. When it can be correctly parsed, it will be taken out of the cache and printed after restitching.

The original source: https://www.pandashen.com