What are the options for HTTP transfer of large files?

This article has participated in please check | you have a free application for nuggets peripheral gift activity, please leave a message to participate in the activity!

How to implement concurrent upload of large files in JavaScript? How to implement parallel downloading of large files in JavaScript? In these two articles, Bob explained how to use async-pool to optimize the ability to transfer large files. This article will introduce you to the HTTP transfer of large files several schemes. But before we get into the details, let’s use the Fs module of Node.js to generate a “large” file.

const fs = require("fs");

const writeStream = fs.createWriteStream(__dirname + "/big-file.txt");
for (let i = 0; i <= 1e5; i++) {
  writeStream.write(`${i}I am Po brother, welcome to pay attention to the whole stack of xiu xian road \ N '."utf8");
}

writeStream.end();
Copy the code

After the above code runs successfully, a 5.5MB text file will be generated in the current execution directory, which will be used as “material” for the following scenarios. With the preparation completed, we will introduce the first solution — data compression.

First, data compression

When using HTTP for large file transfers, we can consider compression of large files. Typically, browsers send requests with accept and Accept -* headers that tell the server what file types are supported by the browser, the list of supported compression formats, and the supported languages.

accept: * / *accept-encoding: gzip, deflate, br
accept-language: zh-CN,zh; Q = 0.9Copy the code

While Gzip typically achieves compression rates in excess of 60%, BR is designed specifically for HTML, with compression efficiency and performance even better than GZIP, with an additional 20% increase in compression density.

The accept-Encoding field in the HTTP request header above is used to tell the server the content Encoding (usually some kind of compression algorithm) that the client understands. With Content negotiation, the server selects a client-supported method and notifying the client of the choice via the response header Content-Encoding.

cache-control: max-age=2592000
content-encoding: gzip
content-type: application/x-javascript
Copy the code

The above response header tells the browser that the returned JS script has been processed by the GZIP compression algorithm. However, it should be noted that gzip and other compression algorithms usually only have a good compression rate for text files, while images, audio and video files and other multimedia file data are already highly compressed, and then using GZIP to compress the data will not have a good compression effect, and may even become larger.

With the Accept-Encoding and Content-Encoding fields in mind, let’s verify that gzip is off and on.

1.1 Gzip is Not Enabled

const fs = require("fs");
const http = require("http");
const util = require("util");
const readFile = util.promisify(fs.readFile);

const server = http.createServer(async (req, res) => {
  res.writeHead(200, {
    "Content-Type": "text/plain; charset=utf-8"});const buffer = await readFile(__dirname + "/big-file.txt");
  res.write(buffer);
  res.end();
});

server.listen(3000.() = > {
  console.log("app starting at port 3000");
});
Copy the code

1.2 open the gzip

const fs = require("fs");
const zlib = require("zlib");
const http = require("http");
const util = require("util");
const readFile = util.promisify(fs.readFile);
const gzip = util.promisify(zlib.gzip);

const server = http.createServer(async (req, res) => {
  res.writeHead(200, {
    "Content-Type": "text/plain; charset=utf-8"."Content-Encoding": "gzip"
  });
  const buffer = await readFile(__dirname + "/big-file.txt");
  const gzipData = await gzip(buffer);
  res.write(gzipData);
  res.end();
});

server.listen(3000.() = > {
  console.log("app starting at port 3000");
});
Copy the code

By looking at the above two pictures, we can see that when transferring 5.5MB big-file. TXT file, after enabling gzip compression, the file is compressed to 256 kB. This greatly speeds up the transfer of files. In a real working scenario, we can use nginx or KOA-static to enable gzip compression. Next, we introduce another scheme – block transfer coding.

Two, block transmission coding

Block transfer coding is mainly used in situations where a large amount of data is transferred, but the length of the response cannot be obtained until the request is processed. For example, when you need to generate a large HTML table with data retrieved from a database query, or when you need to transfer a large number of images.

To use the chunked Transfer Encoding, configure the Transfer-Encoding field in the response header and set its value to chunked or gzip, chunked:

Transfer-Encoding: chunked
Transfer-Encoding: gzip, chunked
Copy the code

The value of the transfer-Encoding field in the response header is chunked, indicating that the data is sent in a series of chunks. Note that transfer-Encoding and Content-Length fields are mutually exclusive, that is, the two fields cannot appear at the same time in the response packet. Let’s take a look at the coding rules of block transfer:

Each block contains two parts: block length and data block.
The block length is expressed in hexadecimal numbers\r\nAt the end.
Data blocks immediately following the block length are also used\r\nEnd, but the data does not contain\r\n;
A termination block is a regular partition that represents the end of a block. The difference is that its length is 0, which is0\r\n\r\n.

After learning about chunking transfer encoding, Bob will use the first 100 lines of the big-file. TXT file to demonstrate how the chunking transfer encoding is implemented.

2.1 Data Blocks

const buffer = fs.readFileSync(__dirname + "/big-file.txt");
const lines = buffer.toString("utf-8").split("\n");
const chunks = chunk(lines, 10);

function chunk(arr, len) {
  let chunks = [],
    i = 0,
    n = arr.length;
  while (i < n) {
    chunks.push(arr.slice(i, (i += len)));
  }
  return chunks;
}
Copy the code

2.2 Block transmission

// http-chunk-server.js
const fs = require("fs");
const http = require("http");

// omit data chunking code
http
  .createServer(async function (req, res) {
    res.writeHead(200, {
      "Content-Type": "text/plain; charset=utf-8"."Transfer-Encoding": "chunked"."Access-Control-Allow-Origin": "*"});for (let index = 0; index < chunks.length; index++) {
      setTimeout(() = > {
        let content = chunks[index].join("&");
        res.write(`${content.length.toString(16)}\r\n${content}\r\n`);
      }, index * 1000);
    }
    setTimeout(() = > {
      res.end();
    }, chunks.length * 1000);
  })
  .listen(3000.() = > {
    console.log("app starting at port 3000");
  });
Copy the code

After starting the server with the node http-chunk-server.js command, visit http://localhost:3000/ in browse and you will see the following output:

The above image shows what the first data block returns. When all data blocks have been transferred, the server returns the termination block, sending 0\r\n\r\n to the client. In addition, we can use the response object in the FETCH API to read the returned chunk of data as a stream by creating a reader with Response.body.getreader () and then calling the reader.read() method to read the data.

2.3 Streaming Transmission

In fact, when using Node.js to return large files to the client, it is best to return the file stream in the form of a stream, so as to avoid processing large files and occupy too much memory. The specific implementation is as follows:

const fs = require("fs");
const zlib = require("zlib");
const http = require("http");

http
  .createServer((req, res) = > {
    res.writeHead(200, {
      "Content-Type": "text/plain; charset=utf-8"."Content-Encoding": "gzip"}); fs.createReadStream(__dirname +"/big-file.txt")
      .setEncoding("utf-8")
      .pipe(zlib.createGzip())
      .pipe(res);
  })
  .listen(3000.() = > {
    console.log("app starting at port 3000");
  });
Copy the code

When file data is returned as a stream, the value of the TRANSFER-Encoding field of the HTTP response header is chunked, indicating that the data is sent in a series of chunks.

Connection: keep-alive
Content-Encoding: gzip
Content-Type: text/plain; charset=utf-8Date: Sun, 06 Jun 2021 01:02:09 GMT
Transfer-Encoding: chunked
Copy the code

If you’re interested in Node.js streams, read semlinker/ Node-deep on Github to learn more about the basics of Node.js streams.

Project address: github.com/semlinker/n…

Three, scope request

HTTP protocol range requests allow the server to send only a portion of the HTTP message to the client. Range requests are useful when transferring large media files or in conjunction with the breakpoint continuation function for file downloads. If there is an Accept-ranges header in the response (and it does not have a value of “None”), then the server supports range requests.

In a Range header, multiple parts can be requested at once, and the server returns them as a multipart file. If the server is returning a range response, use the 206 Partial Content status code. If the Range requested is invalid, the server returns the 416 Range Not Satisfiable status code, indicating that the client has failed. The server allows ignoring the Range header to return the entire file with a status code of 200.

3.1 the Range of grammar

Range: <unit>=<range-start>-
Range: <unit>=<range-start>-<range-end>
Range: <unit>=<range-start>-<range-end>, <range-start>-<range-end>
Range: <unit>=<range-start>-<range-end>, <range-start>-<range-end>, <range-start>-<range-end>
Copy the code

unit: The unit in which the request is made, usually bytes.
<range-start>: an integer representing the start of a range in a specific unit.
<range-end>: An integer representing the end value of a range in a specific unit.This value is optional; if it does not exist, the scope extends to the end of the document.

With the Range syntax behind us, let’s look at an example in action:

3.1.1 Single range

$ curl http://i.imgur.com/z4d4kWk.jpg -i -H "Range: bytes=0-1023"
Copy the code

3.1.2 Multiple ranges

$ curl http://www.example.com -i -H "Range: bytes=0-50, 100-150"
Copy the code

3.2 Range Request Example

3.2.1 Server Code

// http/range/koa-range-server.js
const Koa = require("koa");
const cors = require("@koa/cors");
const serve = require("koa-static");
const range = require('koa-range');

const app = new Koa();

// Register middleware
app.use(cors()); // Register CORS middleware
app.use(range); // Register scope request middleware
app.use(serve(".")); // Register static resource middleware

app.listen(3000.() = > {
  console.log("app starting at port 3000");
});
Copy the code

3.2.2 Client code

<! DOCTYPEhtml>
<html lang="zh-cn">
  <head>
    <meta charset="UTF-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <meta name="viewport" content="Width = device - width, initial - scale = 1.0" />
    <title>Example of a large file range request</title>
  </head>
  <body>
    <h3>Example of a large file range request</h3>
    <div id="msgList"></div>
    <script>
      const msgList = document.querySelector("#msgList");
      function getBinaryContent(url, start, end, responseType = "arraybuffer") {
        return new Promise((resolve, reject) = > {
          try {
            let xhr = new XMLHttpRequest();
            xhr.open("GET", url, true);
            xhr.setRequestHeader("range".`bytes=${start}-${end}`);
            xhr.responseType = responseType;
            xhr.onload = function () {
              resolve(xhr.response);
            };
            xhr.send();
          } catch (err) {
            reject(new Error(err)); }}); } getBinaryContent("http://localhost:3000/big-file.txt".0.100."text"
      ).then((text) = > {
        msgList.append(`${text}`);
      });
    </script>
  </body>
</html>
Copy the code

Using the koa – range – server. Js command to start the server, in the browse to http://localhost:3000/index.html address, you will see the following output:

The corresponding HTTP request header and response header (containing only part of the header information) in this example are as follows:

3.2.3 HTTP Request Headers

GET /big-file.txt HTTP / 1.1
Host: localhost:3000
Connection: keep-alive
Referer: http://localhost:3000/index.html
Accept-Encoding: identity
Accept-Language: zh-CN,zh; Q = 0.9, en. Q = 0.8, id; Q = 0.7Range: bytes=0-100
Copy the code

3.2.4 HTTP Response Headers

HTTP / 1.1 206 Partial Content
vary: Origin
Accept-Ranges: bytes
Last-Modified: Sun, 06 Jun 2021 01:40:19 GMT
Cache-Control: max-age=0
Content-Type: text/plain; charset=utf-8
Date: Sun, 06 Jun 2021 03:01:01 GMT
Connection: keep-alive
Content-Range: bytes 0-100/5243
Content-Length: 101
Copy the code

That’s it for scope requests. If you want to see how it works in the real world, you can read on how to implement parallel downloading of large files in JavaScript. This article.

Four,

This article bao Ge introduced HTTP transfer of large files of three schemes, I hope to understand these knowledge, we can have some help in the future work. In practice, be aware of the difference between Transfer-encoding and Content-encoding. Transfer-encoding is automatically decoded to restore the original data after transmission, while Content-Encoding must be decoded by the application itself.

If you know of other solutions or have any suggestions for this article, feel free to leave a comment below. From the comments, the nuggets will select the most liked (comments) and the most enthusiastic (quality) comments, and each will receive a badge provided by the nuggets.

5. Reference resources

Perspective HTTP protocol
MDN – HTTP request range
MDN – Accept-Encoding