Fragment upload, breakpoint upload, these two terms for do or familiar with file upload friends should not be unfamiliar, summary of the article hope to be engaged in the relevant work of the students can help or inspired.

When our files are very large, does it take a long time to upload, such a long connection, if the network fluctuations? What if the intermediate network is disconnected? In the process of such a long time if there is an unstable situation, all the content uploaded this time will fail, and have to upload again.

Fragment upload means that the file to be uploaded is divided into multiple data blocks (called parts) according to a certain size for uploading. After uploading, the server summarizes all uploaded files into original files. Fragment upload can not only avoid the problem of uploading files from the starting position due to poor network environment, but also use multithreading to send data of different blocks concurrently, improving the sending efficiency and reducing the sending time.

The background,

After the sudden increase in the number of users of the system, in order to better meet the customization needs of various groups. In services, user – defined layout and configuration is gradually implemented on the C-terminal, resulting in a surge in I/OS reading configuration data.

To better optimize such scenarios, manage user – defined configurations statically! In other words, the corresponding configuration file is generated into a static file. During the static file generation, a difficult problem occurs. As a result, the waiting time on the file upload server is too long, and the performance of the entire service scenario deteriorates.

2. Generate a configuration file

There are three main elements of document generation

  • The file name

  • The file content

  • File storage format

File content, file storage format are easy to understand and handle, of course, the previous sorting of encryption methods commonly used in microservices

  • Micro service architecture micro service | what are the common way of encryption (a)

  • Micro service architecture | what data encryption is the common way of encryption (2)

Here is a supplementary note, if you want to encrypt the file content can be considered. However, the case scenario in this paper has a low degree of confidentiality for configuration information, so it is not extended here.

The naming conventions of file names are mainly based on the file summary + timestamp format according to business scenarios. However, such naming conventions tend to cause file name conflicts, causing unnecessary subsequent trouble.

Therefore, I have made a special treatment for the name of the file. If you have experience with front-end Route routing, you can think that the name of the file can be replaced by generating Hash values based on the content.

Spring 3.0 provides a way to compute summaries.

DigestUtils#md
Copy the code

Copy the code

Returns the hexadecimal string representation of the MD5 digest for the given byte.

Md5DigestAsHex source

/** * Compute digest bytes * @param a hexadecimal digest character * @return string returns the hexadecimal string representation of the MD5 digest for the given byte. */ public static String md5DigestAsHex(byte[] bytes) { return digestAsHexString(MD5_ALGORITHM_NAME, bytes); }Copy the code

The file name, content, and suffix (storage format) are determined

Public static void generateFile(String destDirPath, String fileName, String content) throws FileZipException { File targetFile = new File(destDirPath + File.separator + fileName); // Make sure the parent directory has if (! targetFile.getParentFile().exists()) { if (! targetFile.getParentFile().mkdirs()) { throw new FileZipException(" path is not found "); }} // Set the file encoding format try (PrintWriter writer = new PrintWriter(new BufferedWriter(new OutputStreamWriter) FileOutputStream(targetFile), ENCODING))) ) { writer.write(content); return; } catch (Exception e) { throw new FileZipException("create file error",e); }}Copy the code

The advantage of generating files through content is self-evident, which can greatly reduce the initiative to generate new files based on content comparison. If the file content is large and the corresponding file name is the same, it means that the content has not been adjusted, and we do not need to do subsequent file update operations at this time.

Fragment upload attachment

In the so-called fragment upload, the whole file to be uploaded is divided into multiple data blocks (called parts) according to a certain size for uploading. After uploading, the server summarizes all uploaded files into the original file. Fragment upload can not only avoid the problem of uploading files from the starting position due to poor network environment, but also use multithreading to send data of different blocks concurrently, improving the sending efficiency and reducing the sending time.

Fragment upload applies to the following scenarios:

  • Poor network environment: When an upload fails, the failed Part can be independently retried without re-uploading other parts.

  • Resumable: After a pause, you can resume uploading from where the last Part was uploaded.

  • Accelerated upload: When local files to be uploaded to OSS are large, multiple parts can be uploaded in parallel to speed up the upload.

  • Streaming upload: You can start uploading when the size of the file to be uploaded is uncertain. This scenario is common in video surveillance and other industries.

  • Large file: If a file is large, fragments are uploaded by default.

The whole process of sharding upload is as follows:

  • Divide the files to be uploaded into data blocks of the same size according to certain segmentation rules.

  • Initialize a fragment upload task and return the unique identifier of the fragment upload.

  • Send each fragmented data block according to a certain strategy (serial or parallel);

  • After the data is uploaded, the server determines whether the data is uploaded completely. If yes, the server synthesizes data blocks to obtain the original file

J. 2008 defines the sharding rule size

By default, file fragmentation is mandatory when the file size reaches 20MB

/** * Force fragment file size (20MB) */ long FORCE_SLICE_FILE_SIZE = 20L* 1024 * 1024;Copy the code

Copy the code

To facilitate debugging, the fragment file threshold is set to 1KB

J. 2008 defines the sharding upload object

For example, the file fragment with red number in the figure above, the basic attributes of the fragment upload object include attachment file name, original file size, MD5 value of original file, total number of fragments, size of each fragment, size of the current fragment, and sequence number of the current fragment

The definition basis is to facilitate the subsequent business development such as reasonable file segmentation and fragment merger. Of course, expansion attributes can be defined according to business scenarios.

The total number of fragmentation

long totalSlices = fileSize % forceSliceSize == 0 ? 
    fileSize / forceSliceSize : fileSize / forceSliceSize + 1;
Copy the code

Copy the code

Size per fragment

long eachSize = fileSize % totalSlices == 0 ? 
    fileSize / totalSlices : fileSize / totalSlices + 1;
Copy the code

Copy the code

MD5 value of the original file

MD5Util.hex(file)
Copy the code

Copy the code

Such as:

The current attachment size is 3382KB, and the fragment size limit is 1024KB

According to the preceding calculation, the number of fragments is four, and the size of each fragment is 846KB

Bradley J. 2008 reads the bytes of data from each shard

Mark the current byte subscript and loop through four fragmented data bytes

try (InputStream inputStream = new FileInputStream(uploadVO.getFile())) { for (int i = 0; i < sliceBytesVO.getFdTotalSlices(); I ++) {this.readsliceBytes (I, inputStream, sliceBytesVO); / / upload API function call divided String result = sliceApiCallFunction. Apply (sliceBytesVO); if (StringUtils.isEmpty(result)) { continue; } return result; } } catch (IOException e) { throw e; }Copy the code

Third, summary

The so-called fragment upload is to separate the file to be uploaded into multiple data blocks (called parts) according to a certain size.

Processing large files for sharding the main core to determine three points

  • File fragment granularity

  • How fragments are read

  • How shards are stored

This article mainly analyzes and processes how to compare and fragment the contents of large files in the process of uploading large files. Set sharding thresholds properly and how to read and mark sharding. Hope to engage in the relevant work of the students can help or inspiration. How shards store, mark, and merge files will be explained in more detail later.

Original text: micro service architecture large upload attachments divided | how to solve?

[Code Architecture] focus on system architecture, high availability, high performance, high concurrency technology sharing