[sic][1] on my blog

Buffers and streams in Node can be confusing to front-end engineers who are new to Node because there is no such thing. On the back end, however, in node, buffers and streams are everywhere. Buffer is a Buffer and Stream is a Stream. In a computer, a buffer is an area of storage where intermediate variables are stored so that the CPU can read data easily. Flow is an analogy for the flow of data. Buffers and streams are typically byte-level operations. This article will introduce the details of these two modules and then introduce the file module to give readers a clearer understanding.

The body of the

Binary Buffer

On the front end, we only need to do string-level operations, and rarely touch the low-level operations such as bytes and bases. On the one hand, this is enough for everyday needs, and on the other hand, Javascript is not an application-layer language for this purpose. On the back end, however, it is very common to work with files, network protocols, images, videos, etc., especially files, network streams, etc., which deal with binary data. To enable javascript to handle binary data, Node encapsulates a Buffer class that manipulates bytes and handles binary data.

// Create a Buffer of length 10 filled with 30.
const buf1 = Buffer.alloc(10.30)
console.log(buf1)// <Buffer 1e 1e 1e 1e 1e 1e 1e 1e 1e 1e>
// String to Buffer
const buf2 = Buffer.from('javascript')
console.log(buf2)// <Buffer 6a 61 76 61 73 63 72 69 70 74>
// String to buffer
console.log(buf2.toString())// javascript
console.log(buf2.toString('hex')) //6a617661736372697074Copy the code

A Buffer is similar to an array of integers, with subscripts, length attributes, and cut-and-copy operations. Many apis are similar to arrays, but the size of a Buffer is determined when it is created and cannot be adjusted. Buffer deals with bytes, which are two hexadecimal numbers, and therefore integers ranging from 0 to 255.

As you can see, buffers can be converted to strings and can be set to character set encoding. Buffer is used to handle file I/O, binary data transferred by network I/O, and String is used for rendering. When handling file I/O and binary data transferred by network I/O, you should try to transfer it directly as Buffer. The speed is greatly improved, but manipulating strings is still much faster than manipulating buffers.

Buffer memory allocation and performance optimization

Buffer is a typical module combining javascript and C++. Performance-related modules are implemented in C++. Javascript is responsible for bridging and providing interfaces. The memory occupied by Buffer is not allocated by V8, but is independent of V8 heap memory. Memory application and javascript allocation are implemented through C++ layer. It is worth mentioning that whenever we use buffer. alloc(size) to request a Buffer, the Buffer determines whether it allocates a large object or a small object with a limit of 8KB. The small object is stored in the remaining memory pool, which is not enough to apply for another 8KB memory pool. Large objects directly use C++ level of memory. Therefore, for a large object, applying for one large memory is much faster than applying for many small memory pools.

Flow Stream

As mentioned above, flow describes the flow of data by analogy with water flow. Data transmission in file I/O and network I/O can be called flow. Flow is a model that can uniformly describe all common input and output types and an abstract representation of sequential read and write byte sequences. Data flowing from end A to end B is not the same as flowing from end B to end A. Therefore, the flow has direction. A inputs data to B, which is the input stream for B, and the resulting object is A readable stream. For A it is the output and the resulting object is A writable stream. Streams that can be both read and write, such as TCP connections and Socket connections, are called read/write flows. There is also a read/write flow that can modify and Transform data during the read/write process called the Transform flow.

In Node, the data in these streams is called Buffer objects. Readable and writable streams store the data in internal caches, waiting to be consumed. Duplex and Transform both maintain two separate caches for reads and writes. While maintaining reasonable and efficient data flow, it also makes the reading and writing can be carried out independently without affecting each other.

In Node, these four streams are all instances of EventEmitter with close and error events. The readable stream has a Data event that listens for incoming data, and the writable stream has a Finish event that listens for data being transmitted to the low-level system. Both Duplex and Transform implement both Readable and Writable events and interfaces.

Of note is the writable drain event, which indicates that cached data has been drained. Why this event? The write method returns false to indicate that the write has stopped. If the write value exceeds this size, the write method returns false to indicate that the write has stopped. The drain event is emitted to prevent the cache from bursting:

var rs = fs.createReadStream(src);
var ws = fs.createWriteStream(dst);

rs.on('data'.function (chunk) {
    if (ws.write(chunk) === false) { rs.pause(); }}); rs.on('end'.function () {
    ws.end();
});

ws.on('drain'.function () {
    rs.resume();
});Copy the code

Some common traffic classifications:

  • Can write flow: HTTP requests, on the client, HTTP responses, On the server, fs write streams, zlib streams, crypto streams, TCP Sockets, child process stdin, process.stdout, process.stderr
  • A readable stream: HTTP responses, on the client, HTTP requests, On the server, FS read streams, zlib streams, crypto streams, TCP Sockets, Child process stdout and stderr, process.stdin
  • Read-write streams: TCP Sockets, Zlib Streams, Crypto Streams
  • Transform streams: zlib streams, crypto streams

In addition, the mention of flow can not be removed from the concept of pipe, which is also very vivid: water flowing from one end to the other needs a pipe as a channel or medium. The same is true for streams, where data needs to be piped between ends, as in Node:

// Pipe all data from readable to a file named file.txt
const readable = getReadableStreamSomehow();
const writable = getWritableStreamSomehow('file.txt');
// All data in readable will be passed to 'file.txt'
readable.pipe(writable);

// run the following command
const r = fs.createReadStream('file.txt');
const z = zlib.createGzip();
const w = fs.createWriteStream('file.txt.gz');
r.pipe(z).pipe(w);Copy the code

Note that only readable streams have the PIPE capability, writable streams as destinations.

A pipe is not only a channel, but also a good way to control the flow in the pipe, controlling the balance between read and write, and not allowing either side to overdo it. In addition, PIPE can listen for the data and end events of a readable stream so that fast responses can be built:

// An example of a file download, using the callback function to wait until the server finishes reading the file before sending data to the browser
var http = require('http');var fs = require('fs');var server = http.createServer(function (req, res) {
    fs.readFile(__dirname + '/data.txt'.function (err, data) { res.end(data); }); }); server.listen(8888);// As long as the connection is established, the data will be received without waiting for the server to finish caching data.txt
var http = require('http') 
var fs = require('fs') 
var server = http.createServer(function (req, res) {
    var stream = fs.createReadStream(__dirname + '/data.txt') 
    stream.pipe(res) 
}) 
server.listen(8888)Copy the code

Therefore, using pipe can solve the above problem.

Fs file module

Fs file module is a high-order module, which inherits EventEmitter, Stream, path and other low-level modules. It provides operations on files, including reading, writing, renaming, deleting, traversing directories, and linking POSIX file systems. Unlike node and other modules, all operations in the FS module are available in both asynchronous and synchronous versions. Fs module is mainly composed of the following parts:

  • Encapsulation of the underlying POSIX file system, corresponding to the native file operation of the operating system
  • File streams fs.createreadStream and fs.createWritestream that inherit Stream
  • Operation methods for synchronizing files, such as fs.readFileSync and fs.writeFileSync
  • Asynchronous file manipulation methods, fs.readFile and fs.writeFile

Module API architecture is as follows:

Fs Main operations

Read and write operations:

const fs = require('fs'); // Import the fs module
/* Read the file */

/ / use the stream
const read = fs.createReadStream('sam.js', {encoding:'utf8'});
read.on('data',(str)=>{
    console.log(str);
})
/ / use readFile
fs.readFile('test.txt', {}, function(err, data) {
    if (err) {
        throw err;
    }
    console.log(data);
});
// open + read
fs.open('test.txt'.'r',(err, fd) => {
    fs.fstat(fd,(err,stat)=>{
        var len = stat.size;  // Check the file length
        var buf = new Buffer(len);
        fs.read(fd,buf,0,len,0,(err,bw,buf)=>{
            console.log(buf.toString('utf8')); fs.close(fd); })}); });/* Write file and read file API form is similar */Copy the code

There are three ways to read/write a file, so what are the differences?

  • CreateReadStream/createWriteStream will create a file’s contents for streaming data read ReadStream object, this method is the main purpose is to put data into the stream, get is readable, easy to flow to operate
  • ReadFile /writeFile: Node.js treats the contents of a file as a whole, allocates a cache for it, and reads/writes the contents of the file to the cache at once. During this time, Node.js cannot perform any other processing, so when reading or writing large files, the cache may “overflow”.
  • Read/Write The contents of a file are continuously read/ written to the cache from a small piece of the file and finally read from the cache

The same is true of the synchronization API. The most common of these is readFile, which is used to read large files, and read, which provides more detailed, low-level operations and works with Open.

Get the status of the file:

fs.stat('eda.txt', (err, stat) => {
  if (err)
    throw err
  console.log(stat)
})
/* Stats { dev: 16777220, mode: 33279, nlink: 1, uid: 501, gid: 20, rdev: 0, blksize: 4194304, ino: 4298136825, size: 0, blocks: 0, atimeMs: 1510317983760.94, - Time when file data was last accessed mtimeMs: 1510317983760.94, - Time when file data was last modified. CtimeMs: 1510317983777.8538, - Time when the file state was last changed birthtimeMs: 1509537398000, atime: 2017-11-10T12:46:23.761z, mtime: 2017-11-10T12:46:23.761Z, ctime: 2017-11-10T12:46:23.778Z, birthTime: 2017-11-01T11:56:38.000z}*/Copy the code

Listening file:

const FSWatcher = fs.watch('eda.txt', (eventType, filename) => {
    console.log(`${eventType}`)
})
FSWatcher.on('change', (eventType, filename) => {
    console.log(`${filename}`)})// Both watch and the callback function of the returned FSWatcher instance are bound to the change event

fs.watchFile('message.text', (curr, prev) => {
  console.log(`the current mtime is: ${curr.mtime}`);
  console.log(`the previous mtime was: ${prev.mtime}`);
})Copy the code

There are still two ways to listen for files:

  • Watch calls the underlying API to monitor files, which is fast and reliable
  • WatchFile is done through constant pollingfs.Stat(file statistics) to get changes in the monitored file, slower, less reliable, and the callback function parameter isfs.StatThe instance

So use watch as much as possible, and watchFile is used for scenarios where you need to get more information about the file.

other

Create, delete, copy, move, rename, check files, modify permissions…

conclusion

From Buffer to Stream, and then to FS file module, we can have a clearer understanding of the whole block of knowledge, and also have a deeper understanding of the mechanism and implementation of workflow construction by front-end automation tools such as Webpack and GULP. The same is true of learning other things — knowing the ins and outs, knowing why they exist, knowing the connections between them, connects the pieces of knowledge together, makes sense, allows you to “go up in the hall, go down in the kitchen.”

Reference:

NodeJs high-level module — FS

deep into node