This is the sixth day of my participation in the August More text Challenge. For details, see: August More Text Challenge

This section describes HTTP packets, including request packets and response packets, and concepts related to packet headers, request methods, cookies, and a complete HTTP request process.

We did a little bit earlierConcepts related to the HTTP protocolNow we introduce HTTP packets, including request packets and response packets, as well as concepts related to packet headers, request methods, and cookies.

The information used for HTTP protocol interaction is called HTTP packets. The HTTP packets of the requestor (client) are called request packets, and the HTTP packets of the responder (server) are called response packets. The HTTP message itself is a string text composed of multi-line data (CR+LF as a newline).

HTTP packets can be divided into header and body. The two are divided by the first blank line (CR+LF) that appears. In general, a message body is not necessary.

1 Request Packet

The HTTP protocol specifies that the request is sent from the client, and the server responds to the request and returns. In other words, the communication must start with the client, and the server does not send a response until the request has been received.

The format of the request packet is as follows:

Simplified as:

Request first line request header request empty line request body

Request first line and request header together constitute the HTTP request header! The content the browser sends to the server is in this format, if not this format the server will not be able to read!

A GET request packet is as follows:

GET /hello/index.jsp HTTP/1.1 Host: localhost user-agent: Mozilla/5.0 (Windows NT 5.1; The rv: 5.0) Gecko / 20100101 Firefox 5.0 / Accept: text/HTML, application/XHTML + XML, application/XML. Q = 0.9 * / *; Q = 0.8 Accept - Language: useful - cn, useful; Q =0.5 accept-encoding: gzip, deflate accept-charset: GB2312, UTF-8; Q = 0.7 *; Q = 0.7 Connection: keep alive - cookies: fdf6220f7803433c0b2de36d98 JSESSIONID = 369766Copy the code

A POST request packet is as follows:

POST/hello/index. The JSP HTTP / 1.1 Accept: image/gif, image/jpeg, image/pjpeg, image/pjpeg, application/msword, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/x-ms-application, application/x-ms-xbap, application/vnd.ms-xpsdocument, application/xaml+xml, */* Referer: http://localhost:8080/hello/index.jsp Accept-Language: zh-cn,en-US; Q = 0.5 the user-agent: Mozilla / 4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident / 4.0; InfoPath.2; The.net CLR 2.0.50727; The.net CLR 3.0.4506.2152; NET CLR 3.5.30729) Content-type: Application /x-www-form-urlencoded accept-encoding: gzip, deflate Host: application/ x-wwW-form-urlencoded Accept-encoding: gzip, deflate Host: localhost:8080 Content-Length: 13 Connection: Keep-Alive Cache-Control: no-cache Cookie: JSESSIONID=E365D980343B9307023A1D271CC48E7D keyword=helloCopy the code

The related parameters are described as follows:

  1. The request first line: The first row of data
    1. GET /hello/index? Num = 8 HTTP / 1.1. Represents the request method, requested resource (URL) address, requested protocol, and requested version. The meaning of the case is: GET request, request server path is /hello/index, request protocol and version number is http1.1. The URL of the GET request is followed by the request parameter num=8.
  2. Request header: is in the form of key-value. Value can also be used to pass multiple data using key-value. Space:
    1. Host: localhost. Indicates the server address (host address) that receives the request. The value can be IP: port number or domain name. In this case, the meaning is: localhost, the local machine.
    2. Connection: keep alive. The connection-related properties specified. In this case, the default value is 3000ms if the connection is maintained for a period of time.
    3. Accept: application/json, text/plain, /. What types of documents can be received by the client?
    4. User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36. Browser – and OS-related information. Some sites display the User’s system version and browser version by obtaining the user-agent header.
    5. Referer: www.baidu.com/. Which page is the request from, for example…
    6. Accept-encoding: gzip, deflate, br. Supported compression formats. When data is transmitted over the network, the server may compress the data before sending it.
    7. Accept-Language: zh-CN,zh; Q = 0.9. The languages currently supported by the client can be found in the browser’s Tool-Options section.
    8. Accept-Charset: GB2312,utf-8; Q = 0.7 *; Q = 0.7. Client-supported encoding;
    9. Cookie: PSTM=1594605101; JSESSIONID = 369766 fdf622xx. Used to cache some information on the client side, usually in the client implementation of the server’s session function. For example, the storage of user login information, to achieve the jump without login, will be specially explained later.
    10. Content-Type: application/json; Charset = utf-8. Media type information (data type) in the request and response. Data that tells the server how to process the request, and data that tells the client (typically the browser) how to parse the response, such as displaying images, parsing and presenting HTML, etc. The application/json here; Charset =UTF-8 represents data interaction in JSON format, which is a common data format for front-end separation development.
    11. The Content – Length: 13. The length of the request body, represented here as 13 bytes.
  3. Request blank line (CR+LF) : Used to separate request header and request body data.
  4. Request body: A POST request can have a body, that is, the data submitted by the POST request, while a GET request cannot have a body, that is, the data requested is attached to the URL.

1.1 Referer Request header

The Referer header is a useful request header. It can be used for statistical purposes and also for anti-link theft.

Statistical work: Our company’s website has advertised on Baidu, but I don’t know whether advertising on Baidu has an impact on the page view of our website. Then we can analyze the Referer in each request. If the Referer is a large number of Baidu’s, it means that users find our company’s website through Baidu.

Hotlinking prevention: There is a download link on our website, but other websites have stolen this address. For example, there is a link in the index. HTML page of my website, you can download JDK7.0 by clicking on it. Click on the link to download JDK7.0 from my website, which resulted in our website ads not being viewed, but my website resources being downloaded. At this time, we can use the Referer to prevent piracy. Before the resource is downloaded, we make a judgment on the Referer. If the request comes from this website, the download is allowed; if it is not from this website, the download is allowed first.

1.2 Differences between GET and POST submission

From w3schools (www.w3school.com.cn/tags/html_r…

GET POST
Back button/refresh harmless The data will be resubmitted (the browser should inform the user that the data will be resubmitted).
bookmarks Bookmark Do not bookmark
The cache Can be cached Can’t cache
The encoding type application/x-www-form-urlencoded Application/x – WWW – form – urlencoded or multipart/form – the data. Use multiple encoding for binary data.
history The parameters remain in the browser history. Parameters are not saved in the browser history.
A limit on the length of data Yes. When data is sent, the GET method adds data to the URL; The length of the URL is limited (maximum URL length is 2048 characters). Unlimited.
Restrictions on data types Only ASCII characters are allowed. There is no limit. Binary data is also allowed.
security GET is less secure than POST because the data being sent is part of the URL. Never use GET! When sending passwords or other sensitive information. POST is more secure than GET because parameters are not saved in the browser history or web server logs.
visibility The data is visible to everyone in the URL. The data does not appear in the URL.

2 Response Packet

The server that receives the request returns the result of processing the requested content in the form of a response. The response packet format is as follows:

Simplified as:

Response first line response header response blank line response body

The first line of response and the response header together constitute the HTTP response header!

The response content is the content sent by the server to the browser, and the browser will display it based on the response content.

  1. Response to the first line:
    1. HTTP / 1.1 200 OK. This includes the response protocol, status code, and interpretation. This case indicates that the response protocol is HTTP1.1, the status code is 200, that the request is successful, OK is the interpretation of the status code;
  2. Response header:
    1. Server: Apache – 1.1 – / – Coyote. Version information about the server.
    2. Content-Type: text/html; Charset = utf-8. Will respond to HTML text type data, the body of the response using utF-8 encoding; Application/JSON is often used in front – and back-end separated development.
    3. The Content – Length: 724. The response body is 724 bytes;
    4. Set-Cookie: JSESSIONID=C97E2B4C55xx; Path = / hello. Cookie that responds to the client;
    5. Date: Wed, 25 Sep 2012 04:15:03 GMT. The response time, which can be 8 hours time zone difference;
  3. Response blank line: Distinguishes between response header and response body
  4. Response body: The data returned to the client (browser). In this case, an HTML page is returned from the content-type: text/ HTML; Charset =UTF-8 The response type can be determined.

Here is a common response message. The response content is an HTML page:

HTTP/1.1 200 OK Server: apache-coyote /1.1 Content-type: text/ HTML; charset=UTF-8 Content-Length: 724 Set-Cookie: JSESSIONID=C97E2B4C55553EAB46079A4F263435A4; Path=/hello Date: Wed, 25 Sep 2012 04:15:03 GMT<! DOCTYPEHTML PUBLIC "- / / / / W3C DTD HTML 4.01 Transitional / / EN">
<html>
  <head>
    <base href="http://localhost:8080/hello/">
    
    <title>My JSP 'index.jsp' starting page</title>
	<meta http-equiv="pragma" content="no-cache">
	<meta http-equiv="cache-control" content="no-cache">
	<meta http-equiv="expires" content="0">    
	<meta http-equiv="keywords" content="keyword1,keyword2,keyword3">
	<meta http-equiv="description" content="This is my page">
	<! -- <link rel="stylesheet" type="text/css" href="styles.css"> -->
  </head>
  
  <body>
<form action="" method="post">Key words:<input type="text" name="keyword"/>
  <input type="submit" value="Submit"/>
</form>
  </body>
</html>
Copy the code

2.1 Response status code

From the w3c: www.w3cschool.cn/http/g9prxf…

When a visitor visits a web page, the visitor’s browser sends a request to the server where the page is located. Before a web page is received and displayed by the browser, the server where the web page is located responds to the browser’s request by returning a server header containing the HTTP Status Code.

The HTTP status code consists of three decimal digits. The first decimal digit defines the type of the status code. The last two digits have no classification function. There are five types of HTTP status codes:

classification Classification description
1 * * The server receives the request and requires the requester to continue the operation
2 * * Success, the operation was successfully received and processed
3 * * Redirection, which requires further action to complete the request
4 * * Client error, request contains syntax error or cannot complete request
5 * * A server error occurred while the server was processing the request

2.1.1 Common Response codes

  1. 200: The request is processed normally
  2. 204: The request was accepted but there are no resources to return
  3. 206: The client only requests part of the resources, and the server only executes the GET method on the requested part of the resources. The response packet contains resources in the specified content-range Range.
  4. 301: permanent redirection
  5. 302: temporary redirect
  6. 303: Has a similar function to 302 status code except that it expects clients to request one URI and redirect to another URI via GET.
  7. 304: when sending a conditional request, return if the condition is not met, regardless of redirection
  8. 307: Temporary redirects, similar to 302, except that the POST method is mandatory
  9. 400: The syntax of the request packet is incorrect and cannot be recognized by the server
  10. 401: The request requires authentication
  11. 403: The requested resource is denied access
  12. 404: The server cannot find the corresponding resource
  13. 500: Server internal error
  14. 503: The server is busy

2.1.2 List of response status codes

Status code Status code English name Product description
100 Continue To continue. The client should proceed with its request
101 Switching Protocols Switching protocol. The server switches protocols based on client requests. You can only switch to a more advanced protocol, for example, to a new version of HTTP
200 OK The request succeeded. Generally used for GET and POST requests
201 Created Has been created. The new resource was successfully requested and created
202 Accepted Has been accepted. Request accepted, but not completed
203 Non-Authoritative Information Unauthorized information. The request succeeded. But the meta information returned is not from the original server, but a copy
204 No Content No content. The server successfully processed but did not return content. Ensure that the browser continues to display the current document without updating the web page
205 Reset Content Reset the content. The server processing succeeds. The user terminal (e.g., browser) should reset the document view. Use this return code to clear the browser’s form field
206 Partial Content Part of it. The server successfully processed some of the GET requests
300 Multiple Choices Multiple options. The requested resource may include multiple locations, and the response may return a list of resource characteristics and addresses for the user terminal (e.g., browser) to select
301 Moved Permanently Permanent move (redirection). The requested resource has been permanently moved to the new URI. The return message includes the new URI, and the browser is automatically directed to the new URI. Any new requests in the future should be replaced with the new URI
302 Found Temporary move (redirect). It’s similar to 301. But resources are moved only temporarily. Clients should continue to use the original URI
303 See Other Check other addresses. It’s similar to 301. Use GET and POST requests to view
304 Not Modified Unmodified. The requested resource is not modified, and when the server returns this status code, no resource is returned. Clients typically cache the resources they have accessed by providing a header indicating that the client wants to return only resources modified after a specified date
305 Use Proxy Use a proxy. The requested resource must be accessed through a proxy
306 Unused Deprecated HTTP status code
307 Temporary Redirect Temporary redirection. Similar to 302. Use GET to request redirection
400 Bad Request Client request syntax error, server cannot understand
401 Unauthorized The request requires authentication of the user’s identity
402 Payment Required Keep for future use
403 Forbidden The server understood the request from the requesting client, but refused to perform the request
404 Not Found The server could not find the resource (web page) requested by the client. With this code, the website designer can set the “resource you requested could not be found” personality page
405 Method Not Allowed The method in the request adopted by the client is disabled.
406 Not Acceptable The server was unable to complete the request based on the nature of the content requested by the client
407 Proxy Authentication Required The request requires authentication of the agent’s identity, similar to 401, but the requester should use the agent for authorization
408 Request Time-out The server waits for the request from the client for a long time and times out. Procedure
409 Conflict This code may be returned if the server completes a PUT request from the client, and a conflict occurred while the server was processing the request
410 Gone The resource requested by the client does not exist. Unlike 404, the 410 code is used if the resource has previously been permanently deleted, and the website designer can specify the new location of the resource with the 301 code
411 Length Required The server cannot process the request information sent by the client without content-length
412 Precondition Failed The client requested information with an incorrect prerequisite
413 Request Entity Too Large The request was rejected because the requested entity was too large for the server to process. To prevent continuous requests from clients, the server may close the connection. If only the server is temporarily unable to process, a retry-after response is included
414 Request-URI Too Large The requested URI (USUALLY a URL) is too long for the server to process
415 Unsupported Media Type The server was unable to process the media format attached to the request
416 Requested range not satisfiable The scope requested by the client is invalid
417 Expectation Failed The server cannot satisfy Expect’s request headers
500 Internal Server Error The server had an internal error and could not complete the request
501 Not Implemented The server did not support the requested function and could not complete the request
502 Bad Gateway When the server working as a gateway or proxy attempted to execute the request, it received an invalid response from the remote server
503 Service Unavailable The server is temporarily unable to process client requests due to overloading or system maintenance. The length of the delay can be included in the server’s Retry-After header
504 Gateway Time-out The server acting as the gateway or proxy did not obtain the request from the remote server in time. Procedure
505 HTTP Version not supported The server did not support the requested HTTP version and could not complete processing

2.2 304 Response code

When the user requests index.html for the first time, the server adds a last-Modified response header that indicates the Last time the index.html was Modified. The browser caches the contents of the index.html, along with the Last response time.

When the user requests index.html for the second time, include in the request an if-modified-since request header, which is the value that the server sent to the browser in the last-modified response of the first request, i.e., the Last time that index.html was Modified. The if-modified-since header is telling the server that the index. HTML cached here in the browser was last Modified at this time, and you can see If the current index. HTML was last Modified at this time, and If it is, then you don’t have to respond to the index. HTML anymore. I’m going to show you what’s in the cache. The server will get the if-modified-since value and compare it with the current last Modified time of the index.html. If it is the same, the server will send a response code 304, indicating that the index.html is the same as the browser cached last time. The index. HTML has been changed and the server will respond with 200.

  1. Request header
    1. If-modified-since: Return the last Modified time of the last requested index.html to the server.
  2. Response headers
    1. Last-modified: Last Modified;
    2. 304 status code: When the if-modified-since time is compared to the current last Modified time of the index.html, the server will respond with 304, and there will be no body ring, indicating that the browser cached the latest version!

2.3 Other response headers

Response headers that tell the browser not to cache:

Expires: 0; Pragma: no cache; Cache-control: no - Cache;Copy the code

Automatically refreshes the response header, telling the browser to request www.baidu.cn after 3 seconds

Refresh:3; url=https://www.baidu.cnCopy the code

3 Packet Header

The HTTP request and response packets must contain the HTTP header. The header content provides the information needed for the client and server to process the request and response, respectively. For the client user, most of this information is not required to see it in person. The header field is used to provide the browser and server with the packet body size, language, and authentication information.

The HTTP header field consists of the header field name and field value, separated by colons (:). In addition, the field value can have multiple values for a single HTTP header field, as follows:

Keep-Alive: timeout=15, max=100

What happens if the HTTP header field is repeated? This situation is not clear in the specification, and the results may differ depending on the internal processing logic of the browser. Some browsers prioritize the first header field, while others prioritize the last header field.

3.1 Type of the HTTP header field

HTTP header fields are divided into the following four types according to the actual use.

3.1.1 General Header Fields

Header that can be used by both request and response packets.

Header field name instructions
Cache-Control Controls the behavior of the cache, typically including a value of “max-age=3400”, which means that the cache is valid for no more than 3400 seconds after the resource returns
Connection Hop by hop header and connection management. If the value is the name of another field, the proxy server will not carry the field when forwarding. Close or keep-alive indicates a short/long connection, respectively
Date Date and time when the packet was created
Pragma Message instruction. Is a legacy field from prior versions of HTTP/1.1, defined only as backward compatibility with HTTP/1.0
Trailer The header of the packet end
Transfer-Encoding Specifies the transmission encoding mode of the message body
Upgrade Upgrade to another protocol
Via Information about the proxy server. When a packet passes through a proxy or gateway, information about the server is appended to the Via header field before it is forwarded to track the transmission path of the request and response packets
Warning Error notification

3.1.2 Request Header Fields

Header used for sending request packets from the client to the server. Added additional content of the request, client information, response content related priority, and other information.

Header field name instructions
Accept The type of media that the user agent can handle
Accept-Charset The preferred character set
Accept-Encoding Priority content encoding
Authorization Web Authentication information
Expect Expect specific behavior from the server
From The email address of the user
Host Host name/domain name of the server where the requested resource resides
If-Match Compare entity Tags (ETags)
If-Modified-Since Compare the update time of the resource, which appears in the conditional GET request
If-None-Match Compare entity tags (as opposed to if-match)
If-Range A range request for entity bytes is sent when the resource is not updated
If-Unmodified-Since Compare the update time of the resource (as opposed to if-modified-since)
Max-Forwards Maximum number of hops transmitted
Proxy-Authorization The proxy server requires authentication information on the client
Range Byte range request for the entity
Referer The original source of the URI in the request
TE The priority of the transmission code
User-Agent HTTP client program information. In the case of a browser, the user agent string for the browser

3.1.3 Response Header Fields

Header used to return a response packet from the server to the client. Adding additional content to the response also requires the client to add additional content information.

Header field name instructions
Accept-Ranges Whether to accept byte range requests
Age Calculate the elapsed time of resource creation
ETag Matching information about the resource
Location Redirects the client to the specified URI
Proxy-Authenticate The proxy server authenticates the client
Server Installation information about the HTTP server
vary Management information cached by the proxy server
WWW-Authenticate The server authenticates the client

3.1.4 Entity Header Fields

Headers used for the entity part of request and response packets. Added entity-related information such as resource content update time.

Header field name instructions
Allow The HTTP methods that the resource can support
Content-Encoding The encoding mode applicable to the entity subject
Content-language The natural language of the entity subject
Content-length Size of entity body (in bytes)
Content-MD5 Message digest of the entity body
Content-Range The location range of the entity principal
Content-Type The media type of the entity subject
Expires The expiration date and time of the entity
Last-Modified The last modified date and time of the resource

3.2 Non-HTTP /1.1 header field

Header fields used in HTTP communication are not limited to the 47 header fields defined in RFC2616. Header fields such as Cookie, set-cookie and Content-disposition, defined in other RFCS, are also used very frequently.

These informal Header fields are summarized in THE RFC4229 HTTP Header Field Registrations.

3.3 End-to-end Header and Hop-by-Hop Header

The HTTP header field defines the behavior of the cached proxy and the non-cached proxy, divided into two types:

  1. End-to-end Header: Headers in this category are forwarded to the final destination of the request/response and must be stored in the response generated by the cache, which must be forwarded.
  2. Hop-by-hop Header: The Header in this category is valid only for a single forwarding. The Header is not forwarded because it passes through the cache or proxy. In HTTP/1.1 and later versions, to use the Hop-by-hop header, provide the Connection header field.

The following lists the hop-by-hop header fields in HTTP/1.1. Except for these eight header fields, all other fields belong to the end-to-end header.

Connection, keep-alive, proxy-authenticate, proxy-authorization, Trailer, TE, transfer-encoding, Upgrade

3.4 Indicates the header field of the Cookie service

Cookies, which manage the state between the server and the client, are widely used in Web sites, although they are not incorporated into RFC2616, the standard HTTP/1.1.

The working mechanism of Cookie is user identification and state management. In order to manage the user’s state, Web sites temporarily write some data to the user’s computer through the Web browser. Then when the user visits the Web site, the Cookie can be retrieved through communication.

When a Cookie is invoked, the validity period of the Cookie and the information such as the domain, path, and protocol of the sender can be verified. Therefore, the data in the formally published Cookie will not be leaked due to attacks from other Web sites or attackers.

The header field for Cookie service is as follows:

Header field name instructions The first type
Set-Cookie Cookie information used for start status management Response header field
Cookie Cookie information received by the server Request header field

3.4.1 track the Set – cookies

A simple example is as follows:

Set-Cookie: status=enable; expires=Tue, 05 Jul 2011 07:26:31

When the server is ready to start managing the state of the client, various information is notified beforehand. The table below lists the field values of the response header field set-cookie.

attribute instructions
NAME=VALUE The name and value assigned to the Cookie (required)
expires=DATE The validity period of the Cookie (if not explicitly specified, the default is until the browser closes, that is, the memory Cookie). In addition, once the Cookie is sent from the server to the client, there is no way for the server to explicitly delete the Cookie. However, you can substantially delete client cookies by overwriting expired cookies. As we’ll see later when we study servlets, the cookie.setmaxAge () method sets the Expires property of a cookie.
path=PATH Set the resource path directory on the server as the applicable object for cookies (if not specified, the default is the directory where the current URL resource is located). As we’ll see later in the Servlet, the cookie.setPath() method sets the cookie’s path property.
Domain = domain name The domain name that the Cookie is applicable to (if not specified, the default is the domain name of the server that created the Cookie). As we’ll see later when we study servlets, the cookie.setDomain() method sets the domain property of the cookie.
Secure Cookies are only sent for HTTPS secure communication
HttpOnly Restrict cookies from being accessed by JavaScript scripts

3.4.2 cookies

A simple example is as follows:

Cookie: status=enable

The header field Cookie tells the server that when the client wants HTTP state management support, it will include the Cookie received from the server in the request. When multiple cookies are received, they can also be sent as multiple cookies.

2.4 HTTP methods

The HTTP method is used to inform the server of the intent of the request, which is on the first line of the request. The common methods are POST and GET. GET is used to GET data, and POST is used to submit data. According to the HTTP standard, HTTP requests can use a variety of request methods. Today’s RESTFUL interfaces take full advantage of the HTTP protocol for these requests!

HTTP1.0 defines three request methods: GET, POST, and HEAD;

HTTP1.1 adds six new request methods: OPTIONS, PUT, PATCH, DELETE, TRACE, and CONNECT, while disusing LINK and UNLINK. Currently, HTTP1.1 is basically used.

HTTP/1.0 and HTTP/1.1 support the following methods:

methods instructions Specifies the supported HTTP version
GET Access to resources 1.0, 1.1,
POST When creating a resource, you can modify it 1.0, 1.1,
PUT Modify the resource 1.0, 1.1,
DELETE Delete the resource 1.0, 1.1,
HEAD Similar to GET, the packet header is obtained, but the packet body is not returned 1.0, 1.1,
OPTIONS Ask the resource to support the request method 1.1
TRACE Tracking path 1.1
CONNECT A tunneling protocol is required to connect to the proxy 1.1
LINK Establish connections with resources 1.0
UNLINK Disconnect relationship 1.0

LINK and UNLINK are deprecated by HTTP/1.1 and are no longer supported.

2.5 Coding improves transmission efficiency

HTTP can directly transfer data as it is, or encode data to improve the transmission rate. By encoding at transmission time, large numbers of access requests can be handled efficiently. However, the operation of coding needs to be done by the computer, so it consumes more resources such as CPU.

2.5.1 Packet body and entity Body

A message is a basic unit of HTTP communication. It consists of an 8-bit byte stream (OCTET sequence) and is transmitted over HTTP.

An entity is transmitted as the payload data (supplementary item) of a request or response. Its content consists of an entity header and an entity body. The entity header is supplemented with information related to the entity, such as the update time of the resource content.

As for the relationship between AN HTTP packet and an entity, generally, the packet header includes the entity header (as described in the packet header section), and the packet body equals the entity body. The difference between the entity body and the packet body occurs only when the content of the entity body changes during the encoding operation.

2.5.2 Content encoding of compressed Transmission

In order to reduce the size of the email to be sent, we will ZIP the file before adding the attachment to send. A facility in the HTTP protocol called content encoding can do something similar.

Content encoding specifies the encoding format applied to the entity content and leaves the entity information compressed as it is. The content-encoded entity is received by the client and is responsible for decoding.

Common content encodings are as follows:

  1. Gzip (GNU Zip)
  2. Compress (standard compression on UNIX)
  3. Deflate (zlib)
  4. Identity (not coded)

2.5.3 Block transmission coding for split transmission

During HTTP communication, the browser cannot display the requested page until all the requested encoded entity resources have been transferred. When transferring large amounts of data, by splitting the data into chunks, the browser can gradually display the page.

This ability to block entity principals is called Chunked TransferCoding.

Block transfer encoding divides the entity body into parts (blocks). Each block is marked with a hexadecimal size, and the last block of the entity body is marked with “0(CR+LF)”.

Entity principals that use block transfer encoding are decoded by the receiving client and restored to the entity principals before encoding.

In HTTP/1.1, there is a mechanism called Transfer Coding, which can be transmitted in a certain encoding mode during communication, but only defined to work in block transmission encoding.

2.6 Sending multi-part object collections of multiple data

When sending an email, we can write text into the email and add multiple attachments. This is due to the Multipurpose Internet Mail Extensions (MIME) mechanism that allows Mail to handle multiple types of data such as text, pictures, and videos. For example, when binary data such as pictures are specified in an ASCII string encoding, MIME is used to describe the tag data type. In MIME extensions, a method called Multipart object collections (MULTIparts) is used to hold multiple copies of different types of data.

Accordingly, the HTTP protocol also adopts the multi-part object collection, a sent message body can contain multiple types of entities. This is usually used when uploading images or text files, etc.

The multi-part object collection contains the following objects:

  1. Multipart/form – the data. Used when uploading a Web form file.
Content-Type: multipart/form-data; boundary=AaB03x --AaB03x Content-Disposition: form-data; name="field1" Joe Blow --AaB03x Content-Disposition: form-data; name="pics"; filename="file1.txt" Content-Type: text/plain ... (file1.txt data)... --AaB03x--Copy the code
  1. Multipart/byteranges. The status code 206 is used when the response packet contains multiple ranges of Content.
HTTP/1.1 206 Partial Content Date: Fri, 13 Jul 2012 02:45:26 GMT Last-modified: Fri, 31 Aug 2007 02:02:20 GMT Content-Type: multipart/byteranges; boundary=THIS_STRING_SEPARATES --THIS_STRING_SEPARATES Content-Type: application/pdf Content-Range: bytes 500-999/8000 ... (range specified data)... --THIS_STRING_SEPARATES Content-Type: application/pdf Content-Range: bytes 7000-7999/8000 ... (range specified data)... --THIS_STRING_SEPARATES--Copy the code

When multi-part object collections are used in HTTP packets, you need to add content-type in the header field.

Insert ‘–‘ before the starting line of entities specified by the boundary string (e.g. –AaB03x, — this_string_slanted) and ‘–‘ at the end of the string corresponding to the multi-part object collection (e.g. –AaB03x–, — this_string_slanted –)

Each part type of a multipart object collection can contain a header field. In addition, you can use a multipart collection of objects nested within a section. For a more detailed explanation of multi-part object collections, refer to RFC2046.

2.7 Scope request for partial content

Before, users couldn’t access the Internet with today’s high-speed bandwidth, and it was hard enough to download a larger image or file. If there is a network interruption during the download, you must start from scratch. In order to solve the above problems, a recoverable mechanism is needed. Recovery is the ability to recover a download from a previous break.

To implement this functionality you need to specify the scope of entities to download. Like this, a Request sent within a specified Range is called a Range Request.

For a resource with the size of 10,000 bytes, if you use the range request, you can only request the resources between 5001 and 10,000 bytes.

When performing a Range request, the header field Range is used to specify the Range of bytes for the resource. The byte range is specified as follows.

  1. 5001 to 10000 bytes: Range: bytes=5001-10000
  2. All from 5001 bytes: Range: bytes=5001-
  3. Multiple ranges from the beginning to 3000 bytes and from 5000 to 7000 bytes: Range: bytes=-3000, 5000-7000

For a range request, the response packet with 206 Partial Content status code is returned. In addition, for a range request with multiple ranges, the response packet is returned after the content-type of the header field indicates multipart/ Byteranges.

If the server is unable to respond to the range request, the status code 200 OK and the full entity content are returned.

2.8 Content Negotiation Returns the most appropriate content

The same Web site may have multiple pages with the same content. For example, the English and Chinese Web pages are the same in content, but in different languages.

If the default language of the browser is English or Chinese and you access a Web page with the same URI, the Web page in English or Chinese is displayed. Such a mechanism is called Content Negotiation.

The content negotiation mechanism refers to that the client and the server negotiate the resource content of the response, and then provide the most suitable resource to the client. The content consultation is based on the language, character set, and encoding of the response resource.

Some header fields contained in the request packet (as follows) are the basis for judgment:

Accept, accept-charset, accept-encoding, accept-language, content-language

3 A complete HTTP request

Enter www.baidu.com in your browser’s address box and press Enter:

  1. Domain name resolution
    1. The browser first searches its DNS cache to see if there is an IP address that matches the domain name. This cache takes only a minute. (To view the browser’s own cache: Chrome ://net-internals/# DNS)
    2. If no cache is found in the browser or the cache is invalid, the DNS cache of the operating system is searched.
    3. If the cache is not found in the operating system or the cache is invalid. To read the local hosts file. Windows hosts file: C:\ Windows \system32\drivers\etc\
    4. If the corresponding configuration is not found in the host file, the browser makes a system call to DNS. The host sends a query to the local DNS server (broadband carrier server) :
      1. The local DNS server views its cache. If the local DNS server does not have a cache of the domain name, an iterative DNS resolution request is initiated.
      2. The local DNS server sends an iterative query request packet to the root DNS server to query the IP address corresponding to the domain name. If the root DNS server knows, it gives the IP address. Otherwise, the root DNS server provides the IP address of the TOP-LEVEL DNS server in the COM domain, and the local DNS server queries the IP address of the top-level DNS server.
      3. The local DNS server sends an iterative query request packet to the TOP-LEVEL DNS server to query the IP address corresponding to the domain name. If the TOP-LEVEL DNS server knows, it gives the IP address; Otherwise, the top-level DNS server displays the IP address of the Baidu.com domain name server, and asks the local DNS server to query the IP address from the domain name server.
      4. To the baidu.com domain permission domain name server (domain name registrar address, such as ten thousand nets), get www.mshanzi.com corresponding IP address.
      5. The local DNS server returns the results to the operating system kernel and caches them.
      6. The operating system kernel returns the results to the browser.
      7. Finally, the browser got the IP address corresponding to Baidu.com.
  2. After the browser obtains the IP address of the domain name through domain name resolution, it initiates a TCP connection request to set up a connection. It’s basically three handshakes, as we’ve seen before.
  3. Once the connection is established, the browser can send HTTP requests to the server, such as requesting resources for Baidu.com.
  4. The server receives the request, and after some processing by the backend according to the path parameters, returns the processed result data to the browser, such as the complete HTML code of the page, JSON data, and so on.
  5. If the browser is to get the complete HTML code, when parsing and rendering the page, inside the JS, CSS, image static resources, they are also a HTTP request, all have to go through the above four main steps.
  6. The browser renders the page based on the resources it gets (or renders the original page if it is JSON data), and finally presents a complete page to the user.
  7. Disconnect the TCP connection by four wavesWe’ve seen that before.
    1. According to the Connection request header, if the Connection is keep-alive, the server will keep the TCP Connection and not immediately disconnect the Connection. If the Connection is not alive or close, the server will actively close the TCP Connection after the response transmission. Now, of course, all browsers using Http1.1 are keep-alive by default, and when the browser TAB closes, the TCP connection closes.

Reference: Illustrated HTTP w3cHTTP tutorial

If you need to communicate, or the article is wrong, please leave a message directly. Also hope to like, collect, follow, I will continue to update a variety of Java learning blog!