I learned these things from a multithreaded breakpoint download

You can subscribe if you search “App Alang” on wechat. Like and see, unlimited power.

This article Github.com/niumoo/Java… And program ape Alang blog has been included, there are many knowledge points and series of articles.

Thanks to see guest master point came in, weekend idle to have no matter, think of colleague strong elder brother of that sentence: “have you ever played breakpoint continue to pass?” On second thought at that time, breakpoint continuation download is really a lot, the specific details, really not to think about ah. This article is the result of reflection. Thanks to John, let me have a water article, the following will use pure Java dependency to achieve a simple multithreaded breakpoint continuation download.

What is the content of this water article? Let’s just do a quick list and think about a few questions.

The principle of breakpoint continuation.
How do I ensure file consistency when I restart the file to be transmitted?
How to realize multithreaded download of the same file?
Network speed bandwidth fixed, why multithreaded download can speed up?

What do you know about multithreaded breakpoint continuation? I’ve already thrown up a few questions, so let’s think about them. The following will explain the above four questions one by one. Most services are now available online, and there are fewer and fewer download scenarios, but this does not prevent us from exploring the principle.

The principle of breakpoint continuation

To understand how breakpoint continuation is implemented, you must understand the HTTP protocol. HTTP is one of the most widely used network transmission protocols on the Internet. It transmits data based on TCP/IP communication protocol. So the secret of breakpoint continuation is hidden in the HTTP protocol.

We all know that an HTTP Request has a Request header and a Response header, and within the Request header and the Response header, there’s a parameter associated with Range. The following test is carried out through the PC client download link of Baidu web disk.

Use cURL to check out the response header. If you want to know more about cURL, you can check out my previous post on cURL.

$The curl -i http://wppkg.baidupcs.com/issue/netdisk/yunguanjia/BaiduYunGuanjia_7.0.1.1.exeHTTP/1.1 200 OK Server: jSP3/2.0.14 Date: Sat, 25 Jul 2020 13:41:55 GMT Content-Type: application/x-msdownload Content-Length: 65804256 Connection: keep-alive ETag: dcd0bfef7d90dbb3de50a26b875143fc Last-Modified: Tue, 07 Jul 2020 13:19:46 GMT Expires: Sat, 25 Jul 2020 14:05:19 GMT Age: 257796 Accept-Ranges: bytes Cache-Control: max-age=259200 Content-Disposition: attachment; Filename =" baiduyunGuanjia_7.0.0.1.exe "X-bs-client-ip: MTgwLjc2LjIyLjU0 x-bs-file-size: 65804256 x-Bs-request-id: MTAuMTM0LjM0LjU2Ojg2NDM6NDM4MTUzMTE4NTU3ODc5MTIxNzoyMDIwLTA3LTA3IDIyOjAxOjE1 x-bs-meta-crc32: 3545941535 Content-MD5: dcd0bfef7d90dbb3de50a26b875143fc superfile: 2 Ohc-Response-Time: 1 0 0 0 0 0 Access-Control-Allow-Origin: * Access-Control-Allow-Methods: GET, PUT, POST, DELETE, OPTIONS, HEAD Ohc-Cache-HIT: bj2pbs54 [2], bjbgpcache54 [4]Copy the code

It can be seen that baidu PC client has a lot of response header information, we only need to focus on a few.

Content-length: 65804256 // Size of the requested file, in byte Accept-ranges: Bytes: the unit of a range request is bytes. None: No unit of a range request is supported. Last-modified: Tue, 07 Jul 2020 13:19:46 GMT // Last modified time of a server file. X-bs-meta-crc32:3545941535 // crc32. Dcd0bfef7d90dbb3de50a26b875143fc / / Etag labels, can be used to check if the file changesCopy the code

It is not necessary that all downloads support breakpoint continuation. Breakpoint continuation can only be performed if the Accept-Ranges: bytes field in the Response header is present. If you have this information, how do you break it? All you need to do is specify the Content-range value in the Response header.

The content-range uses one of the following formats.

Content-range: <unit>=< Range -start>-< Range -end>/<size> // size <unit>=<range-start>-<range-end>/* Content-Range: <unit>=<range-start>- Content-Range: <unit>=*/<size>Copy the code

For example:

Bytes, starting from the 10th bytes: content-range: bytes=10-.

Bytes, starting from the 10th bytes and ending at the 100th bytes: Content-range: bytes=10-100.

This is how the breakpoint continuation implementation works, and as you can already see, the Start and end of The Content-range have made it possible for segmented downloads.

How to ensure the consistency of files?

There are two aspects to file integrity, one is the download phase, one is the write phase.

Since we are writing a downloader that supports breakpoint continuation, how can we be sure that the file has not been updated since we last downloaded it? You can actually determine this by looking at several property values in the Response header.

Last-modified: Tue, 07 Jul 2020 13:19:46 GMT // Last-modified: Tue, 07 Jul 2020 13:19:46 GMT // Time when a server file was Last Modified. Dcd0bfef7d90dbb3de50a26b875143fc / / Etag labels, can be used to check if the file changed x - bs - meta - crc32:3545941535 / / crc32, can be used to check if the file changesCopy the code

Both last-Modified and ETag can be used to check whether a file has been updated. According to the HTTP protocol, when a file is updated, a new ETag value is generated, which is similar to the fingerprint information of the file. Last-modified is only the Last modification time. Sometimes it may not be possible to prove that the contents of the document have been modified.

The above is the file consistency check in the download phase, but in the write phase? Regardless of single-thread or multi-thread, character appending must be performed at the specified position at write time due to breakpoint continuation. Is there a good implementation in Java?

The answer is yes, just use the RandomAccessFile class, which is different from other stream operations. It can specify read/write mode at the time of use, and use seek to move the file pointer position at will. This is ideal for breakpoint continuation writing scenarios.

For example, the character ABC is written at position 0 of test.txt and DDD is written at position 100.

try (RandomAccessFile rw = new RandomAccessFile("test.txt"."rw")) {// rw is read/write mode
    rw.seek(0); // Move the file content pointer position
    rw.writeChars("abc");
    rw.seek(100);
    rw.writeChars("ddd");
}
Copy the code

Breakpoint continuation of the write depends on it, in the continuation only need to move the file content pointer to the location to be continued.

The seek method also has many advantages. For example, you can use it to quickly locate a known location for quick retrieval. Concurrent reads and writes can also be performed at different locations in the same file.

How to implement multithreaded download?

Multithreaded download is bound to each thread download a part of the file, and then each thread download to the file content assembly into a complete file, in this process is certainly a byte can not go wrong, or you assemble the file is certainly not running. So how to download part of the file? The content-range argument, as described in the breakpoint continuation section, is just a matter of calculating the Range of bytes to download for each section.

For example, in bytes, the second part is downloaded from bytes 10 to bytes 100: content-range: bytes=10-100.

Network speed bandwidth fixed, why multithreaded download can speed up?

This is a more interesting question, the maximum network speed is fixed, the carrier gives you 100Mbs network speed, no matter how you use, the maximum speed is 100/8=12.5MB/S. If that’s the bottleneck, why can multithreaded downloads speed up? In theory, a single thread download can achieve maximum speed. But often the truth is that the network is not so smooth, very congested, it is difficult to achieve the desired maximum speed. This means that multithreaded downloads can only be sped up when the network is not as smooth. Otherwise, a single thread will do. But the maximum speed is always the network bandwidth.

So why does multithreaded downloading speed up? HTTP transmits data based on TCP. To understand this problem, you need to understand the congestion control mechanism of TCP. Congestion control is a TCP algorithm to avoid network congestion. It is based on the control method of sum increase/multiplicative decrease to control congestion.

In simple terms, when TCP starts transmitting data, the server continuously probes the available bandwidth. After a transmission segment is successfully received, the transmission is doubled by two segments. If the transmission segment is successfully received again, the transmission continues to be doubled until packet loss occurs. This is also called slow start. When the slow start threshold (SSthRESH) is reached, the full start algorithm converts to a linear growth phase, adding only one segment at a time and slowing down the growth rate. I think in fact slow start double growth process is not slow, just a way of calling it.

However, when packet loss occurs, that is, congestion is detected, the sender will reduce the size of the sending segment by a multiplier, such as half, the slow start threshold will be reduced to half the size of the congestion window before timeout, the congestion window will be reduced to 1 MSS, and return to the slow start stage. This is where the advantage of multithreading comes in, because your multithreading slows the speed down a little bit less, because there may be another thread that is in the slow start phase that is in the final acceleration phase, and the overall download speed is better than that of a single thread.

Multi-thread breakpoint continuation code implementation

Based on the above principle, the in the mind should have a specific idea of implementation. We just need to use multiple threads, with the content-range parameter to request the file contents to be saved to a temporary file, and then use RandomAccessFile to combine the downloaded files into a single file. For breakpoint continuation, simply read the current temporary file size and adjust the Content-range to continue the download.

There is not much code, but here is some of the core code, and the complete code can be directly accessed from the Github repository at the end of the article.

Content-RangeRequest to specify the interval contents of a file.

URL httpUrl = new URL(url);
HttpURLConnection httpConnection = (HttpURLConnection)httpUrl.openConnection();
httpConnection.setRequestProperty("User-Agent"."Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36");
httpConnection.setRequestProperty("RANGE"."bytes=" + start + "-" + end + "/ *");
InputStream inputStream = httpConnection.getInputStream();
Copy the code

Gets the ETag of the file.

Map<String, List<String>> headerFields = httpConnection.getHeaderFields();
List<String> eTagList = headerFields.get("ETag");
System.out.println(eTagList.get(0));
Copy the code

useRandomAccessFile Continuation writes to a file.

RandomAccessFile oSavedFile = new RandomAccessFile(httpFileName, "rw");
oSavedFile.seek(localFileContentLength); // The file write start position pointer moves to the downloaded position
byte[] buffer = new byte[1024 * 10];
int len = -1;
while((len = inputStream.read(buffer)) ! = -1) {
    oSavedFile.write(buffer, 0, len);
}
Copy the code

Breakpoint continuation test, download part of the program after closing and start again.

The full code has been uploaded to github.com/niumoo/down… .

Reference:

[1] HTTP headers

[2] Class RandomAccessFile

[3] RandomAccessFile

[4] Wikipedia – TCP congestion control

[5] Wikipedia – and sexual growth/multiplicity reduction

After < >

Hello world:) I’m Aaron, a tech tool guy on the front line. The article is constantly updated, you can follow the public account “program ape Alang” or “unread code blog” grow together.

I learned these things from a multithreaded breakpoint download

The principle of breakpoint continuation

How to ensure the consistency of files?

How to implement multithreaded download?

Network speed bandwidth fixed, why multithreaded download can speed up?

Multi-thread breakpoint continuation code implementation

Related Posts

IO multiplexing of SELECT, poll, epoll differences

MTU

Binary tree in order to traverse | Go theme month