One, a brief introduction

Breakpoint continuingly: refers to the upload/download, the task (a file or a zip) artificially divided into several parts, each part using a thread to upload/download, if encounter network failure, can start from already upload/download part continue to upload/download unfinished part, and it is not necessary to upload/download from the very beginning. It saves time and speeds up.

Second, the use of breakpoint continuation

Sometimes it takes several hours for a user to upload or download a file. In case of a line interruption, an HTTP/FTP server or download software that does not have breakpoint continuation can only be retransmitted from the beginning. A good HTTP/FTP server or download software has breakpoint continuation capability, allowing the user to continue the transfer from the place where the upload/download is disconnected. This greatly reduces the user’s annoyance.

Common upload/download software that supports breakpoint continuation: QQ whirlwind, Thunderbolt, express, Edonu, ku6, Tudou, Youku, Baidu Video, Sina Video, Tencent Video, Baidu Cloud, etc.

In Linux/Unix, the FTP client software LFTP is commonly used to support resumable FTP.

Range & Content-range

The HTTP1.1 protocol (RFC2616) began to support retrieving portions of a file, which provides support for parallel downloads and resumable breakpoints. It does this with two arguments in the Header, Range for the client to send a request and Content-range for the server to respond.

Range

Used in a request header to specify the position of the first byte and the position of the last byte in the general format:

Range:(unit=first byte pos)-[last byte pos]  
Copy the code

The format of the Range header is as follows:

Range: bytes=0-499According to the first0-499The contents of the byte rangeRange: bytes=500-999According to the first500-999The contents of the byte rangeRange: bytes=-500Said the last500Byte contentRange: bytes=500- indicates from the500Bytes from the beginning to the end of the fileRange: bytes=0-0, -1Represents the first and last byteRange: bytes=500-600.601-999Specify several scopes at onceCopy the code

Content-Range

Used in response headers, after making a request with Range, the server returns the currently accepted Range and total file size in the Content-range header. General format:

Content-Range: bytes (unit first byte pos) - [last byte pos]/[entity legth]
Copy the code

Such as:

Content-Range: bytes 0-499/22400
Copy the code

0-499 refers to the range of data currently sent, while 22400 is the total file size. HTTP/1.1 200 Ok HTTP/1.1 206 Partial Content HTTP/1.1 206 Partial Content

Fourth, enhanced verification

In actual scenarios, when a terminal sends a renewal request, the file content corresponding to the URL has been changed on the server. In this case, the data to be renewed must be incorrect. How to solve this problem? Obviously, you need a way to identify the uniqueness of the file.

There is a corresponding definition in RFC2616, such as implementing last-modified to mark the Last modification time of a file, so that you can determine whether the file has been changed when it is transmitted. FC2616 also defines an ETag header, which can be used to place the unique identifier of a file.

Last-Modified

If-modified-since, like last-modified, is an HTTP header used to record when a page was Last Modified, except that last-modified is an HTTP header sent from the server to the client. If-modified-since is the header sent by the client to the server. The client sends back the last-Modified timestamp from the server via the if-modified-since header. This allows the server to verify that the page is up to date. If it is not, it returns the new content. If the page is up to date, 304 is returned to tell the client that the page in its local cache is up to date, and the client can load the page directly from the local cache, thus greatly reducing the data transfer over the network, and also reducing the burden on the server.

Etag

Entity Tags (ETAGS) are designed to solve problems that Last-Modified could not solve.

  1. Some files may change periodically, but the content does not change (only the modification time), so we do not want the client to think that the file has been modified and GET again.
  2. Some files are Modified very frequently, for example, If they are Modified less than seconds (N times in 1s), and if-modified-since the granularity that can be checked is s-level, such changes cannot be determined (or the UNIX record MTIME is only accurate to seconds).
  3. Some servers do not know exactly when a file was last modified.

To this end, HTTP/1.1 introduced Etag. An Etag is simply a tag associated with a file. It can be a version tag, for example: v1.0.0; Or 627-4D648041F6B80. The HTTP/1.1 standard does not specify what an Etag is or how it should be implemented, only that it should be placed within “”.

If-Range

Used to determine whether the entity has changed. If the entity has not changed, the server sends the missing part of the client, otherwise sends the whole entity. General format:

If-Range: Etag | HTTP-Date
Copy the code

That is, if-range can use the value returned by Etag or last-Modified. Last-modified can be used as the value of the if-range field when there is no ETage but last-Modified.

Such as:

If - Range:"627- 4 d648041f6b80 "If - Range: Fri.22 Feb 2013 03:45:02 GMT
Copy the code

If-range must work with Range. If there is no Range in the request message, if-range is ignored. If the server does not support if-range, Range is also ignored.

If the Etag in the request packet is the same as that in the target content of the server, that is, the status code of the reply packet is 206. If the target content of the server changes, the status code of the reply packet is 200.

Other HTTP headers for verification: if-match/if-none-match, if-modified-since/if-unmodified-since.

The working principle of

The Etag is generated by the server, and the client verifies whether the resource has been modified by judging the request using if-range criteria. The process for requesting a file is as follows:

First request:1.The client initiates an HTTP GET request for a file.2.The server processes the request, returning the contents of the file along with the corresponding headers, including the Etag (for example:627- 4d648041F6B80) (Assuming that the server supports Etag generation and Etag is enabled.200. Second request (breakpoint continuation) :1.The client initiates an HTTP GET request for a file and sends an if-range (the contents of this header are the Etag returned by the server on the first request:627- 4 d648041f6b80).2.The server determines whether the received Etag matches the calculated Etag. If so, the status code of the response is206; Otherwise, the status code is200
Copy the code

5. Check whether the server supports resumable breakpoints

CURL:

https:
curl --insecure -i --range 0-9 http://www.baidu.com/img/bdlogo.gif
http:
curl -i --range 0-9 http://www.baidu.com/img/bdlogo.gif

HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Cache-Control: max-age=315360000
Content-Length: 10
Content-Range: bytes 0-9/1575
Content-Type: image/gif
Date: Tue, 13 Jul 2021 09:30:53 GMT
Etag: "627-4d648041f6b80"
Expires: Fri, 11 Jul 2031 09:30:53 GMT
Last-Modified: Fri, 22 Feb 2013 03:45:02 GMT
P3p: CP=" OTI DSP COR IVA OUR IND COM "
Server: Apache
Set-Cookie: BAIDUID=27C43B595BFEF5EFD51E1C63053ECB99:FG=1; expires=Wed, 13-Jul-22 09:30:53 GMT; max-age=31536000; path=/; domain=.baidu.com; version=1
Copy the code

If the Content-range can be found, the server supports resumable breakpoints. Some servers also return accept-ranges, which outputs accept-ranges: bytes, indicating that the server supports downloads by byte.

reference

HTTP breakpoint Continuation (block transfer)