This article to understand the whole process of file upload (1.8w word depth analysis, advanced prerequisites)

preface

Generally, when writing business, we often use GET and POST requests to request interfaces. The interface related to GET is relatively easy and almost error-free. For form submission commonly used in POST, JSON submission is also relatively easy, but for file upload? You may be afraid of this step, because you may not be familiar with it, and the browser Network does not record it in detail, so it becomes a thorn in our heart, we are always not sure whether I wrote the file upload problem? Or back-end problems, of course, we are generally more modest, always in their own body to find the reason, but often the truth? It could be the backend, it could be that there’s a problem with what it accepts and you have to switch request libraries to try, Axios, Request, FETCH, etc. So how do we avoid this? We need to be familiar enough with this area to not write code by guessing. If you think that’s true, then you’ll come out of this article with confidence that you won’t be questioning yourself and writing code as a guess.

This article is long and may take some time to read, but it will take patience. I’ll do it from the top down, and all the examples will show you what you’re familiar with, then work your way down, from how the file is sent on the requestor side to how the file is parsed on the receiver side.

Front knowledge

What is a multipart/form – the data?

Multipart /form-data was originally proposed by the document RFC 1867: Form-based File Upload in HTML.

Since file-upload is a feature that will benefit many applications, this proposes an extension to HTML to allow information providers to express file upload requests uniformly, and a MIME compatible representation for file upload responses.

Since file upload capabilities will benefit many applications, it is recommended to extend HTML to allow information providers to express file upload requests uniformly and to provide a MIME-compatible representation of file upload responses.

The bottom line is that the original specification is not met, so I’m going to expand the specification.

Why use multipart/form-data for file upload?

The encoding type application/x-www-form-urlencoded is inefficient for sending large quantities of binary data or text containing non-ASCII characters. Thus, a new media type,multipart/form-data, is proposed as a way of efficiently sending the values associated with a filled-out form from client to server.

The 1867 document also explains why a new type was added instead of the old Application/X-wwW-form-urlencoded: it was not suitable for transmitting large binary data or data containing non-ASCII characters. Usually when we use this type, the form data is url encoded and then sent to the back end. Of course, the binaries cannot be encoded together. So multipart/form-data was born, designed to efficiently transfer files.

Maybe you have a question? It can be usedapplication/json?

In fact, I think you can do it whatever you want, but multipart/form-data is better when you take all these factors into account. For example, we know that the file is in binary form and the application/ JSON is transmitted as text, so in a sense we can actually convert the file to a Base64 form like text. But, when you convert to this form, the back end has to do special parsing of the form that you transmit. And the text in the transmission process is compared to the binary efficiency is low, so for us at dozens of M hundreds of M files is slower.

For example, if you are in China and you want to go to America, our multipart/form-data is a plane, and application/json is a high-speed train. There is no high speed train between China and America, you insist on taking high speed train, you can make a high speed train to America at high cost (extra parsing of your text in the back end), but there are cheaper ways to take a plane (using multipart/form-data) to America (to transfer files). What do you figure? (If you have the time and money, I apologize for interrupting you.)

What is the Multipart/form-Data specification?

From RFC 1867: Form-based File Upload in HTML 6.Example

Content-type: multipart/form-data, boundary=AaB03x

--AaB03x
content-disposition: form-data; name="field1"
Joe Blow
--AaB03x
content-disposition: form-data; name="pics"; filename="file1.txt"
Content-Type: text/plain

... contents of file1.txt ...
--AaB03x--
Copy the code

A boundary is used to describe the type of request. Actually look at the name, the delimiter, then the split effect, because there may be many files and many fields, between each field file, we can not accurately determine where the file to where the cut-off state. So you need a delimiter to do that. If it’s a file, you need to know the name of the file. You need to know what type of file the file is. You need to tell the back end, if I upload a file, what is it? Is it a TXT text? This information has to be given to people, so that people can make judgments, and we’ll talk about that later what happens if there’s no statement?

Ok, so with that out of the way, let’s move on to our main topic. In the face of the File, formData, Blob, Base64, ArrayBuffer, exactly how to do? And file uploads aren’t just for the front end. The server can also upload files (for example, we use a cloud to upload static resources to OSS object storage). There are also various types of servers and clients, Buffer, Stream, Base64…. Bald head, why? No hurry, just because the uploading of files is not only the front end, SO I call the following uploading party as the request side, and the receiving party as the receiving party. I will explain the various uploading methods on the requestor side, how the receiver parsed our files, and our final killer debugging tool – Wireshark. Here is an outline of how we will first upload the file from the browser, then upload the file to the server, and then parse how the file is parsed.

The request side

Browse the end

File

First, let’s write down the simplest form submission.

<form action="http://localhost:7787/files" method="POST">
	<input name="file" type="file" id="file">
	<input type="submit" value="Submit">
</form>
Copy the code

We select the file after upload, found that the back end returned that the file does not exist.

Don’t worry, those of you who are familiar with it will probably know why right away. SHH, listen to me if you know.

We open the console and check Preserve Log for log tracking because the form pickup is a web page jump.

We can see that the file field in FormData shows the name of the file and does not transmit the actual content. Now look at the request header.

Application/X-wwW-form-urlencoded could not upload files.

We add the request header and ask again.

<form action="http://localhost:7787/files" enctype="multipart/form-data" method="POST">
  <input name="file" type="file" id="file">
  <input type="submit" value="Submit">
</form>
Copy the code

A simple form upload is as simple as the above. But you have to memorize the format and type of file to upload.

FormData

I’ve randomly written the following forms for formData.

<input type="file" id="file">
<button id="submit">upload</button>
<script src="https://cdn.bootcss.com/axios/0.19.2/axios.min.js"></script>
<script>
submit.onclick = (a)= > {
    const file = document.getElementById('file').files[0];
    var form = new FormData();
    form.append('file', file);
  
  	// type 1
    axios.post('http://localhost:7787/files', form).then(res= > {
        console.log(res.data);
    })
  	// type 2
  	fetch('http://localhost:7787/files', {
        method: 'POST'.body: form
    }).then(res= > res.json()).tehn(res= > {console.log(res)});
  	// type3
  	var xhr = new XMLHttpRequest();
    xhr.open('POST'.'http://localhost:7787/files'.true);
    xhr.onload = function () {
    	console.log(xhr.responseText);
    };
    xhr.send(form);
}
</script>
Copy the code

All of these ways are ok. However, there are so many request libraries that a casual search on NPM will find hundreds of request related libraries.

Therefore, it is not our goal to know how to write the request library. The only goal is to know the request header and request content of the file upload.

Blob

The Blob object represents an immutable, raw data – like file object. A Blob does not necessarily represent data in a JavaScript native format. The File interface is based on the Blob, inheriting the functionality of the Blob and extending it to support files on the user’s system.

So if we encounter Blob mode files on the way don’t be afraid, can use the following two ways:

1. Use bloB to upload data

const json = { hello: "world" };
const blob = new Blob([JSON.stringify(json, null.2)] and {type: 'application/json' });
    
const form = new FormData();
form.append('file', blob, '1.json');
axios.post('http://localhost:7787/files', form);
Copy the code

2. Use the File object and wrap it again (File compatibility may be a little bit worse caniuse.com/#search=Fil…

const json = { hello: "world" };
const blob = new Blob([JSON.stringify(json, null.2)] and {type: 'application/json' });
    
const file = new File([blob], '1.json');
form.append('file', file);
axios.post('http://localhost:7787/files', form)
Copy the code

ArrayBuffer

ArrayBuffer objects are used to represent generic, fixed-length buffers of raw binary data.

It’s less used, but it’s the closest thing to a file stream.

In the browser, each byte exists in decimal notation. I prepared a picture in advance.

const bufferArrary = [137.80.78.71.13.10.26.10.0.0.0.13.73.72.68.82.0.0.0.1.0.0.0.1.1.3.0.0.0.37.219.86.202.0.0.0.6.80.76.84.69.0.0.255.128.128.128.76.108.191.213.0.0.0.9.112.72.89.115.0.0.14.196.0.0.14.196.1.149.43.14.27.0.0.0.10.73.68.65.84.8.153.99.96.0.0.0.2.0.1.244.113.100.166.0.0.0.0.73.69.78.68.174.66.96.130];
const array = Uint8Array.from(bufferArrary);
const blob = new Blob([array], {type: 'image/png'});
const form = new FormData();
form.append('file', blob, '1.png');
axios.post('http://localhost:7787/files', form)
Copy the code

Note here that new Blob([typedarray.buffer], {type: ‘XXX ‘}), the first argument is wrapped in an array. Inside is a buffer of typedArray.

Base64

const base64 = 'iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAABlBMVEUAAP+AgIBMbL/VAAAACXBIWXMAAA7EAAAOxAGVKw4bAAAACklEQVQImWNgAAAAAgA B9HFkpgAAAABJRU5ErkJggg==';
const byteCharacters = atob(base64);
const byteNumbers = new Array(byteCharacters.length);
for (let i = 0; i < byteCharacters.length; i++) {
	byteNumbers[i] = byteCharacters.charCodeAt(i);
}
const array = Uint8Array.from(byteNumbers);
const blob = new Blob([array], {type: 'image/png'});
const form = new FormData();
form.append('file', blob, '1.png');
axios.post('http://localhost:7787/files', form);
Copy the code

For information on base64 transformation and its principles, see these two articles on Base64 Principles and

The original browser natively supports JS Base64 encoding and decoding

summary

For browser-side File uploads, it boils down to a routine. The core idea of everything is to construct a File object. Then look at the content-type of the request and see if any information is missing from the request body. The conversion of these binary data types can be seen in the following table.

Shanyue.tech /post/binary…

The service side

Now that we’re done with the browser side, let’s move on to the server side, which, unlike the browser, has two challenges.

1. Browsers don’t have native formData and don’t convert it to binary like browsers do.

2. There is no visual Network debugger on the server.

Buffer

Request

Let’s start with the simplest example, and then go one step at a time. I believe the documentation can be found at github.com/request/req…

// request-error.js
const fs = require('fs');
const path = require('path');
const request = require('request');
const stream = fs.readFileSync(path.join(__dirname, '.. /1.png'));
request.post({
    url: 'http://localhost:7787/files'.formData: {
        file: stream,
    }
}, (err, res, body) => {
    console.log(body);
})
Copy the code

An error has been detected. As mentioned above, an error has been reported on the browser. You can use NetWork. What about the server? At this time we took out our sharp tool – Wireshark

We open the wireshark (if there is no or not can view the tutorial blog.csdn.net/u013613428/…

Set the configuration tcp.port == 7787, which is the port of our back end.

Run the above file node request-error-js

Let’s find the HTTP request that we sent. The mess in the middle is the contents of our files.

POST /files HTTP/1.1
host: localhost:7787
content-type: multipart/form-data; boundary=--------------------------437240798074408070374415
content-length: 305
Connection: close

----------------------------437240798074408070374415
Content-Disposition: form-data; name="file"Content-Type: application/octet-stream .PNG . ... IHDR............. %.V..... PLTE...... Ll..... pHYs.......... +... IDAT.. c`....... qd..... IEND.B`. ----------------------------437240798074408070374415--Copy the code

You can view the above packets. Content-type: application/octet-stream: content-type: application/octet-stream: content-type: application/octet-stream: content-type: application/octet-stream: content-type: application/octet-stream

ReadFileSync (path.join(__dirname, ‘.. /1.png’)) this function returns a Buffer. Is the following form, does not contain any file related information, only binary streams.

<Buffer 01 02>
Copy the code

So what I’m thinking is, you need to specify the name of the file and the format of the file, and thankfully request gives you that option.

key: {
    value:  fs.createReadStream('/dev/urandom'),
    options: {
      filename: 'topsecret.jpg',
      contentType: 'image/jpeg'}}Copy the code

Options can be specified, so the correct code should look like this (omit unimportant code)

. request.post({ url:'http://localhost:7787/files',
    formData: {
        file: {
            value: stream,
            options: {
                filename: '1.png'}}}});Copy the code

Through packet capture, we can analyze that the main point of file upload is still the specification. Most of the problems can be investigated through the specification template to determine whether the specification is constructed.

Form-data

Let’s dig a little deeper and look at the source code for Request and how it implements data transfer on the Node side.

Open the source code, we can easily find the related content of formData piece github.com/request/req…

So let’s take a look at formData first.

const path = require('path');
const FormData = require('form-data');
const fs = require('fs');
const http = require('http');
const form = new FormData();
form.append('file', fs.readFileSync(path.join(__dirname, '.. /1.png')), {
    filename: '1.png'.contentType: 'image/jpeg'});const request = http.request({
    method: 'post'.host: 'localhost'.port: '7787'.path: '/files'.headers: form.getHeaders()
});
form.pipe(request);
request.on('response'.function(res) {
    console.log(res.statusCode);
});
Copy the code

The original Node

After looking at formData, I may feel that the encapsulation is still too high level, so I plan to use the manual construction of multipart/form-data requests against the specification to explain. Let’s review the specification again.

Content-type: multipart/form-data, boundary=AaB03x

--AaB03x
content-disposition: form-data; name="field1"
Joe Blow
--AaB03x
content-disposition: form-data; name="pics"; filename="file1.txt"
Content-Type: text/plain

... contents of file1.txt ...
--AaB03x--
Copy the code

I simulate the way above, I write a multipart/form-data request using native Node.

It is divided into four parts

Construct request Header
Construct the content header
Write content
Write the end delimiter

const path = require('path');
const fs = require('fs');
const http = require('http');
// Define a delimiter to ensure uniqueness
const boundaryKey = '-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 461591080941622511336662';
const request = http.request({
    method: 'post'.host: 'localhost'.port: '7787'.path: '/files'.headers: {
        'Content-Type': 'multipart/form-data; boundary=' + boundaryKey, // Add a delimiter to the request header
        'Connection': 'keep-alive'}});// Write the content header
request.write(
    `--${boundaryKey}\r\nContent-Disposition: form-data; name="file"; filename="1.png"\r\nContent-Type: image/jpeg\r\n\r\n`
);
// Write the content
const fileStream = fs.createReadStream(path.join(__dirname, '.. /1.png'));
fileStream.pipe(request, { end: false });
fileStream.on('end'.function () {
    // Write the tail
    request.end('\r\n--' + boundaryKey + The '-' + '\r\n');
});
request.on('response'.function(res) {
    console.log(res.statusCode);
});
Copy the code

So far, the server has been implemented to upload files.

Stream, Base64

Since these two pieces are the conversion of and Buffer, relatively simple, I will not repeat the description. As homework for you, if you’re interested, you can contribute these two examples to my sample code repository.

// base64 to buffer
const b64string = /* whatever */;
const buf = Buffer.from(b64string, 'base64');
Copy the code

// stream to buffer
function streamToBuffer(stream) {  
  return new Promise((resolve, reject) = > {
    const buffers = [];
    stream.on('error', reject);
    stream.on('data', (data) => buffers.push(data))
    stream.on('end', () => resolve(Buffer.concat(buffers))
  });
}
Copy the code

summary

Since the server does not have a native object for formData like the browser does, the server’s core idea is to construct the format for uploading the file (header,filename, etc.) and then write a buffer. Then do not forget to use Wireshark to verify.

The receiving end

This section is for the Node side, for those who are used to koA-body, you may not know what happened in the whole process. Perhaps the only thing that is clear is that ctx.request.files?? If ctx.request.files did not exist, you would have no idea what it did and how the file stream was parsed.

I’m still going to talk about norms… The requestor constructs the request according to the specification.. So we at the receiving end naturally parse the request according to the specification.

Koa-body

const koaBody = require('koa-body');

app.use(koaBody({ multipart: true }));
Copy the code

Let’s take a look at the most commonly used KOA-body, it is very simple to use, just a few lines, can let us enjoy the file upload simple and happy (other source code library the same idea to find the source of the problem) can take a question to read, why it can parse out the file?

To seek the source of the problem, of course, we should open the koA-body source code, KOA-body source code is rarely only 211 lines, github.com/dlau/koa-bo… It’s easy to see that it actually uses a formidable library to parse files. The parsed files object is assigned to ctx.req.files. (So don’t just remember ctx.request.files, pay attention to the document, because today using KOA-body is CTx.request. files tomorrow might be ctX.request. body)

Therefore, after reading KOA-Body, we come to the conclusion that the core method of KOA-body is formidable

Formidable

So let’s dig a little deeper and see what formidable has done. First, let’s look at its directory structure.

. ├ ─ ─ lib │ ├ ─ ─ the file. The js │ ├ ─ ─ incoming_form. Js │ ├ ─ ─ index. The js │ ├ ─ ─ json_parser. Js │ ├ ─ ─ multipart_parser. Js │ ├ ─ ─ Octet_parser. Js │ └ ─ ─ querystring_parser. JsCopy the code

Looking at this list, we can sort of tease out the relationship.

index.js
|
incoming_form.js
|
type
?
|
1.json_parser
2.multipart_parser
3.octet_parser
4.querystring_parser
Copy the code

Because source code analysis is boring. So I’ll just take the important ones. Since we are analyzing file uploads, we only care about the multipart_parser file.

Github.com/node-formid…

. MultipartParser.prototype.write =function(buffer) {
	console.log(buffer);
  var self = this,
      i = 0,
      len = buffer.length,
      prevIndex = this.index,
      index = this.index,
      state = this.state,
...      
Copy the code

Let’s print out its buffer.

<Buffer 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 34 36 31 35 39 31 30 38 30 39 34 31 36 32 32 35 31 31 33 33 36 36 36... > 144 <Buffer 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 00 00 00 01 00 00 00 01 01 03 00 00 00 25 db 56 ca 00 00 00 06 50 4c 54 45 00 00 ff 80 80 80 4c 6c bf ... > 106 <Buffer 0d 0a 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 34 36 31 35 39 31 30 38 30 39 34 31 36 32 32 35 31 31 33 33 36... >Copy the code

Let’s look at the packets captured by wireshark

I marked the partition mark in red, corresponding to the section of formidable, so the package mainly divides the large buffer and then processes it in a loop.

I might add here, for those of you who are unfamiliar with the above watches. The left side is binary stream, each representing a byte, 1 byte =8 bits, the above 2D is actually a hexadecimal representation, the binary representation is 0010 1101, the right side is ASCII code for visualization, but assiI can be shown and not shown. Parts of it can’t be seen. For example, if you see a file that needs a small dot, it’s an invisible character.

You can look at it against the ASCII table against the ASCII table.

Let me summarize formidable’s documentation process.

The original Node

Ok, so we’ve got the documentation process, so let’s write our own.

const fs = require('fs');
const http = require('http');
const querystring = require('querystring');
const server = http.createServer((req, res) = > {
  if (req.url === "/files" && req.method.toLowerCase() === "post") {
    parseFile(req, res)
  }
})
function parseFile(req, res) {
  req.setEncoding("binary");
  let body = "";
  let fileName = "";
  // Boundary characters
  let boundary = req.headers['content-type']
    .split('; ') [1]
    .replace("boundary="."")
  
  req.on("data".function(chunk) {
    body += chunk;
  });
  req.on("end".function() {
    // Divide according to the factorator
    const list = body.split(boundary);
    let contentType = ' ';
    let fileName = ' ';
    for (let i = 0; i < list.length; i++) {
      if (list[i].includes('Content-Disposition')) {
        const data = list[i].split('\r\n');
        for (let j = 0; j < data.length; j++) {
          // Separate the name and type from the header
          if (data[j].includes('Content-Disposition')) {
            const info = data[j].split(':') [1].split('; ');
            fileName = info[info.length - 1].split('=') [1].replace(/"/g.' ');
            console.log(fileName);
          }
          if (data[j].includes('Content-Type')) {
            contentType = data[j];
            console.log(data[j].split(':') [1]); }}}}// Remove the previous request header
    const start = body.toString().indexOf(contentType) + contentType.length + 4; How much / / \ r \ n \ r \ n
    const startBinary = body.toString().substring(start);
    const end = startBinary.indexOf("--" + boundary + "--") - 2; // How many \r\n in front
     // Remove the following separator
    const binary = startBinary.substring(0, end);
    const bufferData = Buffer.from(binary, "binary");
    fs.writeFile(fileName, bufferData, function(err) {
      res.end("sucess");
    });
    ;
  })
}

server.listen(7787)

Copy the code

conclusion

I believe that with the above introduction, you will no longer be afraid of file upload, the whole process of file upload will be more clear, do not understand… Look for me.

To review our key points again:

If there is a problem on the request side, open network on the browser side to check whether the format (request header, request body) is correct. If the data is not detailed enough, open Wireshark to check the format (request header, request body) according to our standards.

The receiver out of the question, it is a request to end the lack of information, refer to the above problems at a service request, is the second request body content error, if the content of the request body is request to end their structure, so need to check the request body is correct binary stream (for example the blob structure, I started missing a [], led to the incorrect) content body.

In fact, there are only two words to say so much: specification, all ecology is around it and spread out. See my blog for more.

reading

Shark-cleaner: A Node Cli implementation of garbage cleaning tool (deep cleaning development garbage)

Node + NAPI implements the C++ extension – LRU elimination algorithm

The whole process of developing a Node command line toy — a beautiful statistics tool

Focus on

Hi, I’m Qiu Feng, author of open source projects Webchat (1528), shark-Cleaner (19), And Google-translate open-api (46), etc. If you’re interested in what I’ve summarized or what I’ve written about open source projects, Welcome to join me in wechat (below QR code) to discuss together.

reference

Juejin. Im/post / 684490…

My.oschina.net/bing309/blo…

Segmentfault.com/a/119000002…

This article to understand the whole process of file upload (1.8w word depth analysis, advanced prerequisites)

preface

Front knowledge

What is a multipart/form – the data?

Why use multipart/form-data for file upload?

What is the Multipart/form-Data specification?

The request side

Browse the end

File

FormData

Blob

ArrayBuffer

Base64

summary

The service side

Buffer

Request

Form-data

The original Node

It is divided into four parts

Stream, Base64

summary

The receiving end

Koa-body

Formidable

The original Node

conclusion

reading

Focus on

reference

Related Posts

Node.js Best Practices — How to Become a Better Node.js developer in 2018

React Notes (under update)

As a front end, how to help friends in the Capital rent a suitable house?