In the development process, received such a problem feedback, on the website to upload more than 100 MB files often failed, retry also have to wait for a long time, which is difficult for users who need to upload large files. So what can you do to upload quickly, and even if you fail to send again, you can pick up where you left off? Here’s the answer

Warm tip: with Demo source code to read the effect better

The overall train of thought

The first step is to investigate and compare optimized solutions based on the project background. File uploading failure is a commonplace problem. A common solution is to slice a large file into multiple small files, request interfaces for uploading in parallel, and merge all fragments on the server after all requests are received. When a fragment fails to be uploaded, the fragment can be uploaded only when the fragment fails to be uploaded. This reduces the waiting time of users and relieves the server pressure. This is the shard upload file.

Large file upload

So how to implement large file fragment upload?

The flow chart is as follows:

It can be achieved by the following steps:

1. Encrypt files using MD5

MD5 is the unique identifier of a file. You can use MD5 to query the uploading status of a file.

Based on information such as the modification time, file name, and last modification time of a file, spark-MD5 is used to generate MD5 of the file. Note that large files need to be read in fragments. Add the read file content to the hash calculation of spark-MD5 until the file is read, and return the final hash code to the callback function. Here you can add a progress bar for file reading as needed.

The implementation method is as follows:

// Change time + file name + last change time -->MD5 md5File (file) {return new Promise((resolve, reject) => {
    let blobSlice =
      File.prototype.slice ||
      File.prototype.mozSlice ||
      File.prototype.webkitSlice
    let chunkSize = file.size / 100
    let chunks = 100
    let currentChunk = 0
    let spark = new SparkMD5.ArrayBuffer()
    let fileReader = new FileReader()
    fileReader.onload = function (e) {
      console.log('read chunk nr', currentChunk + 1, 'of', chunks)
      spark.append(e.target.result) // Append array buffer
      currentChunk++
      if (currentChunk < chunks) {
        loadNext()
      } else {
        let cur = +new Date()
        console.log('finished loading')
        // alert(spark.end() + The '-' + (cur - pre)); // Compute hash
        let result = spark.end()
        resolve(result)
      }
    }
    fileReader.onerror = function (err) {
      console.warn('oops, something went wrong.')
      reject(err)
    }
    function loadNext () {
      let start = currentChunk * chunkSize
      let end =
        start + chunkSize >= file.size ? file.size : start + chunkSize
      fileReader.readAsArrayBuffer(blobSlice.call(file, start, end))
    }
    loadNext()
  })
}
Copy the code

2. Query the file status

After the front-end gets the MD5 of the file, it checks whether there is a folder named MD5 from the background. If there is, all files in the folder are listed and the uploaded slice list is obtained. If there is no file, the uploaded slice list is empty.

// MD5 checkFileMD5 (file, fileName, fileMd5Value, onError) {const fileSize = file.size const {chunkSize, uploadProgress } = this this.chunks = Math.ceil(fileSize / chunkSize)return new Promise(async (resolve, reject) => {
    const params = {
      fileName: fileName,
      fileMd5Value: fileMd5Value,
    }
    const { ok, data } = await services.checkFile(params)
    if (ok) {
      this.hasUploaded = data.chunkList.length
      uploadProgress(file)
      resolve(data)
    } else {
      reject(ok)
      onError()
    }
  })
}
Copy the code

3. File fragments

The core of File upload optimization is File fragmentation. Slice method in Blob object can slice the File. File object inherits Blob object, so File object also has slice method.

ChunkSize is defined as the size variable of each shard file. Chunks are obtained by FileSize and chunkSize. For loop and file.slice() method are used to shard files, numbered 0-n. Compare with the uploaded slice list to get all unuploaded slices and push them to the requestList.

async checkAndUploadChunk (file, fileMd5Value, chunkList) {
  let { chunks, upload } = this
  const requestList = []
  for (let i = 0; i < chunks; i++) {
    let exit = chunkList.indexOf(i + ' ') > -1 // If the block already exists, no need to upload the current blockif (!exit) {
      requestList.push(upload(i, fileMd5Value, file))
    }
  }
  console.log({ requestList })
  const result =
    requestList.length > 0
      ? await Promise.all(requestList)
        .then(result => {
          console.log({ result })
          return result.every(i => i.ok)
        })
        .catch(err => {
          return err
        })
      : true
  console.log({ result })
  return result === true
}
Copy the code

4. Upload fragments

Call promise. all to upload all slices concurrently, and pass the slice serial number, slice file and file MD5 to the background.

After receiving the upload request, the background first checks whether the folder named MD5 exists. If it does not, the folder is created. Then, the fs-extra rename method is used to move slices from the temporary path to the slice folder.

When all fragments are successfully uploaded, the system notifies the server to merge the fragments. When one fragment fails to be uploaded, the system displays “Upload failed”. In to upload, file upload state is obtained by file MD5, when the server has the MD5 corresponding section, on behalf of the section has been uploaded, no need to upload again, when the server can’t find the MD5 corresponding section, on behalf of the section need to upload, users only need to upload this part of the section, can complete the upload the entire file, This is a breakpoint continuation of the file.

Upload a chunk upload (I, fileMd5Value, file) {const {uploadProgress, chunks} = thisreturn new Promise((resolve, reject) => {
    let{chunkSize} = this // Construct a form, FormData is new to HTML5let end =
      (i + 1) * chunkSize >= file.size ? file.size : (i + 1) * chunkSize
    let form = new FormData()
    form.append('data', file.slice(I * chunkSize, end)) // The slice method of the file object is used to slice out a portion of the file.'total', chunks) //'index', I) // How many form.append('fileMd5Value', fileMd5Value)
    services
      .uploadLarge(form)
      .then(data => {
        if(data.ok) { this.hasUploaded++ uploadProgress(file) } console.log({ data }) resolve(data) }) .catch(err => { reject(err) })})}Copy the code

5. Upload progress

Although the batch upload of fragmented files is much faster than the single upload of large files, there is still a period of loading time. In this case, the upload progress prompt should be added to display the upload progress in real time.

The native Javascript XMLHttpRequest provides a progress event that returns the uploaded size and total size of the file. The project uses AXIos to wrap ajax, and you can add the onUploadProgress method in config to listen for file upload progress.

const config = {
  onUploadProgress: progressEvent => {
    var complete = (progressEvent.loaded / progressEvent.total * 100 | 0) + The '%'
  }
}
services.uploadChunk(form, config)
Copy the code

6. Merge fragments

After uploading all file fragments, the front end actively notifies the server to merge the slices. When receiving the request, the server actively merges the slices and finds the folder with the same name in the file upload path of the server through file MD5. As can be seen from the above, file fragments are named according to the fragment serial number, and the fragment upload interface is asynchronous. Therefore, it is impossible to ensure that the slices received by the server are spliced in accordance with the requested sequence. Therefore, sort the fragment files by file name before merging them. Then merge the fragment files through concat-files to obtain the files uploaded by users. At this point the large file upload is complete.

Node side code:

// exports.merge = {validate: {query: {fileName: joi.string ().trim().required().description()'File name'),
      md5: Joi.string()
        .trim()
        .required()
        .description('file md5'),
      size: Joi.string()
        .trim()
        .required()
        .description('File size'),
    },
  },
  permission: {
    roles: ['user'],
  },
  async handler (ctx) {
    const { fileName, md5, size } = ctx.request.query
    let { name, base: filename, ext } = path.parse(fileName)
    const newFileName = randomFilename(name, ext)
    await mergeFiles(path.join(uploadDir, md5), uploadDir, newFileName, size)
      .then(async () => {
        const file = {
          key: newFileName,
          name: filename,
          mime_type: mime.getType(`${uploadDir}/${newFileName}`),
          ext,
          path: `${uploadDir}/${newFileName}`,
          provider: 'oss',
          size,
          owner: ctx.state.user.id,
        }
        const key = encodeURIComponent(file.key)
          .replace(/%/g, ' ')
          .slice(-100)
        file.url = await uploadLocalFileToOss(file.path, key)
        file.url = getFileUrl(file)
        const f = await File.create(omit(file, 'path'))
        const files = []
        files.push(f)
        ctx.body = invokeMap(files, 'toJSON')
      })
      .catch(() => {
        throw Boom.badData('Merge of large file fragments failed, please try again later ~')})}},Copy the code

conclusion

This article describes some methods of optimizing large file upload, which can be summarized as the following four points:

  1. Blob. Slice Slices a file and upload multiple slices concurrently. After uploading all slices, the server is informed to merge them to realize large file fragment uploading.
  2. The onProgress of native XMLHttpRequest monitors the upload progress of slices and obtains the upload progress of files in real time.
  3. Spark-md5 Calculates md5 based on the file content and obtains the unique identifier of the file, which is bound to the file uploading status.
  4. Before fragment uploading, you can query the uploaded slice list by using MD5 file. Only the unuploaded slices are uploaded during the upload to realize resumable transmission.

Refer to Demo source code can quickly start the above functions, I hope this article can help you, thank you for reading ❤️


Welcome to the bump Lab blog: AOtu.io

Or pay attention to the bump Laboratory public account (AOTULabs), push the article from time to time: