These links

– 01 Simple drag-and-drop uploads and progress bars

-02 Binary level format verification

V1.4: Upload large file slices – File hash calculation

Before we do that, we need to know why we want to hash the file.

In the first chapter is the most basic version of the file upload, the back-end storage file passed way is to use the front side of the file name, and we know that the different two files, can have the same file name, in that case, no matter which one store, will override the other, and the hash fingerprints can be understood as a file, Different file hashes must be different. If the file is stored in the back end using hash, files with the same name will not overwrite each other. This is one reason to use hash.

In addition, other functions can be developed, such as file transfer in seconds: Before uploading a file, the user sends the hash to the back end for query. If the file already exists in the back end, the user does not need to upload the file again. In this case, the front end prompts the user that the file has been successfully transferred in seconds.

With this pre-knowledge in hand, you’re ready to hash the file.

For file is not too big, we calculate the hash directly also won’t have too much of a problem, but if the file is started to become big, direct calculation of all, it is easy to make the browser caton even stuck directly, so we usually need to file for slice first, then to increase the hash slices.

File section

This feature is not too difficult and can be done with slice:

export const CHUNK_SIZE = 1 * 1024 * 1024;

export const createFileChunks = (file) = > {
  const chunks = [];
  const size = CHUNK_SIZE;
  let cur = 0;
  while (cur < file.size) {
    chunks.push({
      index: cur,
      fileChunk: file.slice(cur, cur + size)
    });
    cur += size;
  }
  return chunks;
};
Copy the code

After slicing, you can then incrementally hash the file. A handy library is spark-MD5.

However, when the file content is too large, the slice will be very much, so that too many tasks at once, will also cause browser lag, and for this problem, we usually have two solutions:

  • workjs
  • window.requestIdleCallback()

The former can be thought of as multi-threading js execution, the main thread does rendering and other work, leaving the hash calculation work to other threads, so as not to affect the main thread, while the latter is time slicing, inspired by react Fiber, the specific principle is to let the browser do something when idle.

requestIdleCallback will schedule work when there is free time at the end of a frame, or when the user is inactive.

The next step is to compute the hash in these two ways.

workjs

Node_modules is not accessible to workjs, so we will first copy spark-md5.min.js to public, and then create hash.js under public as well for subsequent computation:

We can then create the Worker via new Worker(“/hash.js”) and interact with it via postMessage and onMessage:

const calculateByWorker = async (chunks) => {
  return new Promise(resolve= > {
    const worker = new Worker("/hash.js");
    worker.postMessage({
      chunks
    });
    worker.onmessage = (e: any) = > {
      const data = e.data;
    };
  });
};
Copy the code

In hash.js, you communicate in the same way.

The key point here is:

  • For each chunk, we read it with a FileReader
  • Use spark.append() to incrementally calculate the read content
  • After the calculation is complete, end with spark.end()
  • Of course, don’t forget to use postMessage to hash back

To sum up, the code is as follows:

self.importScripts("spark-md5.min.js");

self.onmessage = e= > {
  const { chunks } = e.data;
  const spark = new self.SparkMD5.ArrayBuffer();
  let count = 0;

  const loadNext = index= > {
    const reader = new FileReader();
    reader.readAsArrayBuffer(chunks[index].fileChunk);
    reader.onload = e= > {
      count++;
      spark.append(e.target.result);

      if (count === chunks.length) {
        self.postMessage({
          hash: spark.end()
        });
      } else{ loadNext(count); }}; }; loadNext(0);
};
Copy the code

The hash of the file can then be successfully computed as follows:

There are also some minor optimizations that can be made, such as a progress bar for calculating the hash.

So where does progress come from?

We can calculate the current progress of each slice after the hash is completed and return it to the main thread. When the hash is finished, set progress to 100:

self.importScripts("spark-md5.min.js");

self.onmessage = e= > {
  const { chunks } = e.data;
  const spark = new self.SparkMD5.ArrayBuffer();
+ let progress = 0;
  let count = 0;

  const loadNext = index= > {
    const reader = new FileReader();
    reader.readAsArrayBuffer(chunks[index].fileChunk);
    reader.onload = e= > {
      count++;
      spark.append(e.target.result);

      if (count === chunks.length) {
        self.postMessage({
+         progress: 100.hash: spark.end()
        });
      } else {
+       progress += 100/ chunks.length; + self.postMessage({ + progress + }); loadNext(count); }}; }; loadNext(0);
};
Copy the code

The results are as follows:

requestIdleCallback

As mentioned earlier, the principle of requestIdleCallback is to let the browser perform tasks when it is idle, so the key points are as follows:

  • Start with requestIdleCallback
  • Perform tasks when you are free and still have tasks
  • Start the next requestIdleCallback

Then the macro structure is as follows:

const calculateByIdle = (chunks) = > {
	let count = 0
  const workLoop = async (deadline: any) => {
    while (count < chunks.length && deadline.timeRemaining() > 1) {
      /* do something */
			count++
    }
    window.requestIdleCallback(workLoop);
  };
  window.requestIdleCallback(workLoop);
};
Copy the code

Obviously, in the while, we have a judgment:

  • If the calculation is complete, the hash is returned
  • If not, proceed with the calculation

Here we implement a utility method to compute the hash:

const appendToSpark = (chunk) = > {
  return new Promise(resolve= > {
    const reader = new FileReader();
    reader.readAsArrayBuffer(chunk);
    reader.onload = (e) = > {
      spark.append(e.target.result);
      resolve();
    };
  });
};
Copy the code

So in the while it would look like this:

while (count < chunks.length && deadline.timeRemaining() > 1) {
  await appendToSpark(chunks[count].fileChunk);
  count++;

  if (count < chunks.length) {
    progressRef.value = Number(((100 * count) / chunks.length).toFixed(2));
  } else {
    progressRef.value = 100;
    return spark.end()
  }
}
Copy the code

Since this process is asynchronous, the entire calculateByIdle is also encapsulated:

const calculateByIdle = async (chunks) => {
+ return new Promise(resolve= > {
    const spark = new SparkMD5.ArrayBuffer();
    let count = 0;

    const workLoop = async (deadline) => {
      while (count < chunks.length && deadline.timeRemaining() > 1) {
        await appendToSpark(chunks[count].fileChunk);
        count++;

        if(count >= chunks.length) { + resolve(spark.end()); }}window.requestIdleCallback(workLoop);
    };
    window.requestIdleCallback(workLoop);
  });
};
Copy the code

Similarly, we can add progress to it:

const calculateByIdle = async (chunks, progressRef) => {
  return new Promise(resolve= > {
    const spark = new SparkMD5.ArrayBuffer();
    let count = 0;

    const workLoop = async (deadline: any) => {
      while (count < chunks.length && deadline.timeRemaining() > 1) {
        await appendToSpark(chunks[count].fileChunk);
        count++;

        if (count < chunks.length) {
+         progressRef.value = Number(((100 * count) / chunks.length).toFixed(2));
        } else {
+         progressRef.value = 100; resolve(spark.end()); }}window.requestIdleCallback(workLoop);
    };
    window.requestIdleCallback(workLoop);
  });
};
Copy the code

The results are as follows:

conclusion

Now that you’ve done enough advance preparation for large file uploads, I’ll show you how to upload file slices to the back end.

So let’s call it a day and look forward to the next meeting