(5) "Buffer and garbled story" has ten questions

1. Why Buffer objects?

In Node, applications need to deal with network protocols, operate databases, process images, receive uploaded files, etc. In the operation of network streams and files, they also need to deal with a large amount of binary data. JavaScript’s own strings are far from meeting these requirements, so Buffer objects came into being.

Buffer is widely used in file I/O and network I/O, especially in network transmission, its performance is very important. In applications, we normally manipulate strings, but once they are transferred over the network, they need to be converted to buffers for binary data transfer. In Web applications, string conversion to Buffer happens all the time. Improving the conversion efficiency of string to Buffer can greatly improve the network throughput rate.

2. Can you talk about Buffer objects you know?

Mm-hmm. Okay.

A Buffer is an object like Array, but it is primarily used to manipulate bytes. So I’m going to look at it in terms of module structure and object structure.

Modular structure

Buffer is a typical module combining JavaScript and C++, which implements performance-related parts in C++ and non-performance-related parts in JavaScript, as shown in the figure.

In “Memory Control” (4), we mentioned that the memory occupied by Buffer is not allocated by V8 and belongs to out-of-heap memory. Due to the performance impact of V8 garbage collection, it is a good idea to manage common operation objects with a more efficient and proprietary memory allocation recycling strategy. Because Buffer is so common, Node loads it when the process starts and places it on a global object. Therefore, when using Buffer, it can be used directly without requiring ().

Buffer object structure

A Buffer object is similar to an array in that its elements are hexadecimal two-digit numbers ranging from 0 to 255. The sample code looks like this:

As you can see from the above example, the number of elements taken up by different encoding strings varies, with Chinese characters taking up three elements in UTF-8 encoding and letters and half-corner punctuation taking up one element.

Buffer is heavily influenced by the Array type. It can access the length attribute to obtain the length, and can also access elements by subscript. It is similar when constructing objects, as follows:

The above code allocates a 100-byte Buffer object. The elements of the newly initialized Buffer can be accessed by subscripting as follows:The odd result is that the element value is a random value from 0 to 255. Similarly, we can assign it by subscript:

3. Wow, it turns out that Buffer objects are so interesting that they can be used as arrays. I wondered what would happen if elements were assigned decimal values instead of integers.

If the value assigned to an element is less than 0, theThis value is incremented by 256.Until you get an integer between 0 and 255. If I get a value greater than 255,So let’s subtract 256.Until you get something between 0 and 255.If it is a decimal, discard the decimal part and keep only the integer part.

4. Buffer objects look like strings. Are they interchangeable?

You can.

String to Buffer

The string-to-buffer object is done primarily through constructors:A Buffer converted by the constructor can store only one encoding type. Encoding If the encoding parameter is not transmitted, it is transcoded and stored according to utF-8 by default.

Buffer to string

Converting a Buffer to a string is also very simple. ToString () converts a Buffer to a string as follows:

Nifty, it is possible to set encoding (utF-8 by default), start, and end for global or local conversion. If the Buffer object is written to multiple encodings, you need to specify different encodings locally to switch back to normal encodings.

5. Buffer should be common in input streams. Can you tell me how to use it?

Buffers are typically transmitted in segments in usage scenarios. Here is common sample code for reading from an input stream:The above code is common in foreign countries, used for stream reading demonstration, data event fetch chunk object is the Buffer object. For starters, buffers are easy to understand as strings, so you won’t feel anything unusual when you accept the examples above.

6. Sometimes I read data like this and then print it out. Sometimes I get garbled characters.

Problems arise when wide-byte encoding is present in the input stream. If you see garbled characters on a website developed through Node, this is probably where the problem originated.

Characters represented by more than one byte are called wide characters, and Unicode is just an implementation of wide character encodings, and wide characters are not necessarily Unicode.

The underlying problem here is this code:Hidden in this line of code is the toString() operation, which is equivalent to the following:

It is worth noting that the foreign context usually refers to the English context, and in their scenario this toString() does not cause any problems. But for wide-byte Chinese, this can be a problem. To reproduce this problem, let’s simulate an approximate scenario and limit the Buffer size of a file readable stream to 11 per read as follows:

The test data with this code is Li Bai’s “Silent Night Thoughts”. Executing the program yields the following output:

7. Why are the four characters “month”, “yes”, “hope” and “low” not output normally, but 3 garbled characters instead?

The reason for this output is that the file readable stream reads each Buffer as it reads.

The original Buffer of this poem should be stored as:

Since we limit the length of the Buffer object to 11, the read-only stream needs to be read 7 times to complete the read, resulting in the following Buffer objects being printed in sequence:

The buf.tostring () method mentioned above is encoded in UTF-8 by default, with Chinese characters taking up three bytes in UTF-8. Therefore, when the first Buffer object is output, only 3 characters can be displayed. The remaining 2 bytes of Buffer (e6 9C) will be displayed as garbled characters. The first byte of the second Buffer object also does not form text, only garbled characters. So the formation of some text can not be displayed normally.

In this example we set the limit of 11, but for buffers of any length, it is possible for wide-byte strings to be truncated. The larger the Buffer, the less likely it is to be truncated, but this problem cannot be ignored.

8. So karma! If I limit the size of the Buffer to 12, I won’t have a problem. But it is troublesome to count every time. Is there an easy way?

Yes, let’s not forget that there is also a method for setting encoding for readable streams, setEncoding(), as shown in the following example:

Instead of a Buffer object being passed in the data event, this method is used to encode a string. To do this, we continue to improve the previous program for poetry, adding the following steps for setEncoding() :

Re-execute the program to get output:

9. Wow, this is really exciting. How does Node implement this output?

In fact, when setEncoding() is called, the readable stream object sets up a decoder object internally. Each data event is decoded from Buffer to string through the decoder object and passed to the caller. After encoding is set, data no longer receives the original Buffer object.

10. But after setting decoder, even if it is transcoding, it can not change the wide byte string truncated problem ah?

The decoder object comes from an instance object of the string_decoder module StringDecoder.

Take a look at the following code:

I wrote the first two Buffer objects mentioned earlier into the decoder. The odd thing is that the transcoding of the “moon” is not printed separately in the two parts as usual. When StringDecoder is encoded, it knows that the wide-byte string is stored as a three-byte utF-8 encoding, so the first write() outputs only the first nine bytes transcoded, and the first two bytes of the “month” word are kept inside the StringDecoder instance. The second time you write(), these two remaining bytes are combined with the following 11 bytes and transcoded again as integer multiples of 3. So the garble problem is solved in this intermediate form.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

(5) “Buffer and garbled story” has ten questions

1. Why Buffer objects?

2. Can you talk about Buffer objects you know?

Modular structure

Buffer object structure

3. Wow, it turns out that Buffer objects are so interesting that they can be used as arrays. I wondered what would happen if elements were assigned decimal values instead of integers.

4. Buffer objects look like strings. Are they interchangeable?

String to Buffer

Buffer to string

5. Buffer should be common in input streams. Can you tell me how to use it?

6. Sometimes I read data like this and then print it out. Sometimes I get garbled characters.

7. Why are the four characters “month”, “yes”, “hope” and “low” not output normally, but 3 garbled characters instead?

8. So karma! If I limit the size of the Buffer to 12, I won’t have a problem. But it is troublesome to count every time. Is there an easy way?

9. Wow, this is really exciting. How does Node implement this output?

10. But after setting decoder, even if it is transcoding, it can not change the wide byte string truncated problem ah?

(5) “Buffer and garbled story” has ten questions

1. Why Buffer objects?

2. Can you talk about Buffer objects you know?

Modular structure

Buffer object structure

3. Wow, it turns out that Buffer objects are so interesting that they can be used as arrays. I wondered what would happen if elements were assigned decimal values instead of integers.

4. Buffer objects look like strings. Are they interchangeable?

String to Buffer

Buffer to string

5. Buffer should be common in input streams. Can you tell me how to use it?

6. Sometimes I read data like this and then print it out. Sometimes I get garbled characters.

7. Why are the four characters “month”, “yes”, “hope” and “low” not output normally, but 3 garbled characters instead?

8. So karma! If I limit the size of the Buffer to 12, I won’t have a problem. But it is troublesome to count every time. Is there an easy way?

9. Wow, this is really exciting. How does Node implement this output?

10. But after setting decoder, even if it is transcoding, it can not change the wide byte string truncated problem ah?

Related Posts

Take notes of recent interview questions

Browser layer compositing and page rendering optimization

137 versions, 2,494 PR, and Element Plus is finally here