1. Introduction of HTTP

  1. HTTP is also known as Hypertext Transfer Protocol. A data transfer protocol that specifies the rules for communication between the browser and the World Wide Web (WWW = World Wide Web) server, and transmits World Wide Web documents over the Internet.
  2. HTTP, as an application layer protocol in the TCP/IP model, is no exception. HTTP is usually carried on top of TCP, and sometimes on top of TLS or SSL, at which point it becomes known as HTTPS. The diagram below:
  1. HTTP is an application layer protocol, composed of requests and responses, and is a standard client-server model. HTTP is a stateless protocol.
  2. The default HTTP port number is 80, and HTTPS port number is 443.
  3. Web browsing is the primary application of HTTP, but this does not mean that HTTP can only be used for web browsing. HTTP is a protocol that can be used as long as both sides of the communication adhere to it. For example, we commonly used QQ, Thunderbolt these software, will use HTTP protocol (including other protocols).

2. Features of Http

  • Client/server mode is supported. Basic authentication and security authentication are supported.
  • Simple and fast: when a client requests a service from the server, it only needs to send the request method and path. GET, HEAD, and POST are common request methods. Each method specifies a difference in the type of contact the client has with the server. Because HTTP protocol is simple, the HTTP server program size is small, so the communication speed is very fast.
  • Flexibility: HTTP allows the transfer of any type of data object. The Type being transferred is marked by content-Type.
  • Connectionless: The meaning of connectionless is to limit processing to one request per connection. The server disconnects from the customer after processing the request and receiving the reply from the customer. In this way, transmission time can be saved.
  • Stateless: HTTP is a stateless protocol. Stateless means that the protocol has no memory for processing things. The lack of state means that if the previous information is needed for subsequent processing, it must be retransmitted, which can result in an increase in the amount of data passed per receive. On the other hand, the server responds faster when it doesn’t need the previous information.

3. Workflow of HTTP protocol

HTTP is TCP based on the transport layer, and TCP is an end-to-end connection-oriented protocol. The so-called end-to-end can be understood as process-to-process communication. So before HTTP can start transmitting, it first needs to establish a TCP connection, and the TCP connection process requires a so-called “three-way handshake.” The following figure shows the three-way handshake for a TCP connection. After the TCP three-way handshake, a TCP connection is established and HTTP can be transmitted. An important concept is connection-oriented, where HTTP does not break TCP connections between transport completion. This is the default behavior in HTTP1.1 (set via the Connection header).

  1. Establish a connection between the client and the server

  2. After the connection is established, the client issues a request to the server.

  3. When the server receives a request, it gives the client a response (reply, response).

  4. After receiving the response, the client performs further processing

4. HTTP URL

HTTP uses Uniform Resource Identifiers (URIs) to transfer data and establish connections. A URL is a special type of URI that contains enough information to find a resource

URL is an IP address used to identify a resource on the Internet. Take the following URL as an example to describe the components of a common URL:

www.tencent.com:8080/news/index….

As you can see from the above URL, a complete URL consists of the following parts: 1. Protocol part: The protocol part of the URL is HTTP:, which indicates that the web page uses HTTP. There are many protocols that can be used on the Internet, such as HTTP, FTP, and so on. In this example, HTTP is used. The “//” after “HTTP” is the delimiter

2. Domain name: The domain name of the URL is www.tencent.com. In a URL, an IP address can also be used as a domain name

3. Port: The domain name is followed by the port. The domain name and port are separated by colons (:). The port is not a required part of a URL, and the default port is used if the port part is omitted

4. Virtual directory: The virtual directory starts from the first slash (/) after the domain name to the last slash (/). The virtual directory is also not a required part of a URL. The virtual directory in this case is “/news/”

5. File name: start from the last slash (/) after the domain name to? Is the filename part, if there is no? Is the file part, if there is no “?” And “#”, then from the last “/” after the domain name to the end, is the filename part. In this case, the file name is index.asp. The file name portion is also not a required part of a URL, and if omitted, the default file name is used

6. Anchor section: From the “#” to the end, it’s the anchor section. The anchor part in this case is “name”. The anchor part is also not a required part of a URL

7. Parameter part: From “? The part between the beginning and “#” is the parameter part, also known as the search part, the query part. The parameter part in this example is “boardID=5&ID=24618&page=1”. A parameter can have multiple parameters separated by ampersand (&).

(the original: blog.csdn.net/ergouge/art…).

5. Differences between URIs and urls

5.1.URI is a Uniform Resource Identifier that uniquely identifies a resource.

Each resource available on the Web, such as AN HTML document, image, video clip, program, etc., is a URI that is located by a URI and consists of three parts:

  • Naming mechanism for accessing resources
  • Name of the host where the resource is stored
  • The name of the resource itself, represented by a path, emphasizes the resource.
5.2. A URL is a Uniform Resource locator, which is a specific URI that can be used to identify a resource and specify how to locate the resource.

A URL is a string of characters used to describe information resources on the Internet. It is used in various WWW client and server programs, especially the famous Mosaic program. Using URLS can use a unified format to describe various information resources, including files, server addresses and directories. A URL consists of three parts: (1) protocol (or service mode), (2) IP address (sometimes including port number) of the host where the resource resides, and (3) specific address of the host resource. Such as directory and file name

URN, a uniform resource name that identifies a resource by its name, e.gmailto:[email protected].

Uris are an abstract, high-level concept that defines a uniform resource identity, while urls and UrNs are ways of identifying specific resources. Urls and UrNs are both urIs. Broadly speaking, every URL is a URI, but not necessarily every URI is a URL. This is because URIs also include a subclass, the Uniform Resource Name (URN), which names resources but does not specify how to locate them. The mailto, news, and ISBN URIs above are examples of UrNs.

In Java URIs, an instance of a URI can represent either absolute or relative, as long as it follows the syntax rules for URIs. The URL class, on the other hand, not only conforms to semantics but also contains information to locate the resource, so it cannot be relative. In the Java class library, the URI class does not contain any methods to access resources; its only function is parsing. In contrast, the URL class opens a stream to the resource.

6. Several HTTP request methods

Request way role
Get Requests the resource identified by the request-URI
Post Append new data to the resource identified by the request-URI
HEAD Request a response message header for the resource identified by request-URI.
PUT The requesting server stores a resource and identifies it with a request-URI.
DELETE Requests the server to remove the resource identified by request-URI.
TRACE The request server sends back the received request information, mainly for testing or diagnostics.
CONNECT Reserved in HTTP/1.1 for proxy servers that can pipe connections.
OPTIONS Request queries about server performance or resource-related options.
PATCH Used to apply local modifications to a resource, added to specification RFC5789.

HTTP request headers and response headers

7.1HTTP Request Header Request Header
Header explain The sample
Accept Specifies the type of content that the client can receive Accept: text/plain, text/html
Accept-Charset A set of character encodings acceptable to the browser. Accept-Charset: iso-8859-5
Accept-Encoding Specifies the type of web server content compression encoding that the browser can support. Accept-Encoding: compress, gzip
Accept-Language Browser acceptable language Accept-Language: en,zh
Accept-Ranges You can request one or more subscope fields of a web page entity Accept-Ranges: bytes
Authorization HTTP authorization certificate Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
Cache-Control Specify the caching mechanism that requests and responses follow Cache-Control: no-cache
Connection Indicates whether a persistent connection is required. (HTTP 1.1 makes persistent connections by default) Connection: close
Cookie When an HTTP request is sent, all cookie values stored under the domain name of the request are sent to the Web server. Cookie: $Version=1; Skin=new;
Content-Length The content length of the request Content-Length: 348
Content-Type MIME information that corresponds to the entity being requested Content-Type: application/x-www-form-urlencoded
Date The date and time the request was sent Date: Tue, 15 Nov 2010 08:12:31 GMT
Expect The specific server behavior requested Expect: 100-continue
From Email address of the user who made the request From: [email protected]
Host Specifies the domain name and port number of the requested server Host: www.zcmhi.com
If-Match This is valid only if the request content matches the entity If – the Match: “737060 cd8c284d8af7ad3082f209582d”
If-Modified-Since If the part of the request is modified after the specified time, the request succeeds; if it is not modified, the 304 code is returned If-Modified-Since: Sat, 29 Oct 2010 19:43:31 GMT
If-None-Match If the content has not changed, the 304 code is returned with the Etag sent by the server. The Etag is compared with the Etag returned by the server to determine whether it has changed If None – Match: “737060 cd8c284d8af7ad3082f209582d”
If-Range If the entity has not changed, the server sends the missing part of the client, otherwise sends the whole entity. The parameter is also Etag If – Range: “737060 cd8c284d8af7ad3082f209582d”
If-Unmodified-Since The request succeeds only if the entity has not been modified after the specified time If-Unmodified-Since: Sat, 29 Oct 2010 19:43:31 GMT
Max-Forwards Limit the amount of time messages can be sent through proxies and gateways Max-Forwards: 10
Pragma Used to contain implementation-specific instructions Pragma: no-cache
Proxy-Authorization Certificate of authorization to connect to the agent Proxy-Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
Range Only a portion of the entity is requested, specifying scope Range: bytes=500-999
Referer The address of the previous web page, followed by the current requested web page, is the incoming path Referer: www.zcmhi.com/archives/71…
TE The client is willing to accept the transmission code and notifies the server to accept the header TE: trailers,deflate; Q = 0.5
Upgrade Specify some transport protocol to the server for the server to convert (if supported) Upgrade: HTTP/2.0, SHTTP/1.3, IRC/6.9, RTA/ X11
User-Agent User-agent contains the information about the User that sends the request The user-agent: Mozilla / 5.0 (Linux; X11)
Via Notification intermediate gateway or proxy server address, communication protocol Via: 1.0 fred, 1.1 nowhere.com (Apache/1.1)
Warning Warning information about message entities Warning: 199 Miscellaneous warning
7.2HTTP Responses Header Response Header
Header explain The sample
Accept-Ranges Indicates whether the server supports scoped requests and what type of segmented requests Accept-Ranges: bytes
Age Estimated time from the original server to proxy cache formation (in seconds, non-negative) Age: 12
Allow A valid request for a network resource. If not allowed, 405 is returned Allow: GET, HEAD
Cache-Control Tell all caching mechanisms whether they can cache and what type Cache-Control: no-cache
Content-Encoding The type of returned content compression encoding supported by the Web server. Content-Encoding: gzip
Content-Language The language of the response body Content-Language: en,zh
Content-Length The length of the response body Content-Length: 348
Content-Location Request an alternate address for alternate resources Content-Location: /index.htm
Content-MD5 Returns the MD5 check value of the resource Content-MD5: Q2hlY2sgSW50ZWdyaXR5IQ==
Content-Range The byte position of this part in the entire return body Content-Range: bytes 21010-47021/47022
Content-Type Returns the MIME type of the content Content-Type: text/html; charset=utf-8
Date The time when the original server message was sent Date: Tue, 15 Nov 2010 08:12:31 GMT
ETag The current value of the entity label of the request variable ETag: “737060 cd8c284d8af7ad3082f209582d”
Expires The expiration date and time of the response Expires: Thu, 01 Dec 2010 16:00:00 GMT
Last-Modified The last modification time of the requested resource Last-Modified: Tue, 15 Nov 2010 12:45:26 GMT
Location Used to redirect the recipient to the location of the non-requested URL to complete the request or to identify a new resource Location: www.zcmhi.com/archives/94…
Pragma This includes implementing specific instructions that can be applied to any recipient on the response chain Pragma: no-cache
Proxy-Authenticate It indicates the authentication scheme and the parameters that can be applied to the URL of the broker Proxy-Authenticate: Basic
refresh Applied to redirects or a new resource is created, redirects after 5 seconds (proposed by Netscape and supported by most browsers) Refresh: 5; url=www.zcmhi.com/archives/94…
Retry-After If the entity is temporarily unavailable, notify the client to try again after the specified time Retry-After: 120
Server Name of the Web server software Server: Apache / 1.3.27 (Unix) (Red Hat/Linux)
Set-Cookie Set the Http cookies Set-Cookie: UserID=JohnDoe; Max-Age=3600; Version=1
Trailer Indicates that the header field exists at the end of the block transfer code Trailer: Max-Forwards
Transfer-Encoding File transfer coding Transfer-Encoding:chunked
vary Tell the downstream proxy whether to use a cached response or request from the original server Vary: *
Via Tell the proxy client where the response is sent Via: 1.0 fred, 1.1 nowhere.com (Apache/1.1)
Warning Alerts entities to possible problems Warning: 199 Miscellaneous warning
WWW-Authenticate Indicates the authorization scheme that the client requesting entity should use WWW-Authenticate: Basic

8. The HTTP status code

8.1. Classification of HTTP status codes

The HTTP status code consists of three decimal digits. The first decimal digit defines the type of the status code, and the second two digits do not classify.

HTTP status codes are classified into five types:
classification Classification description
1 * * Message, the server receives the request and requires the requester to proceed with the operation
2 * * Success, the operation is received and processed successfully
3 * * Redirect, requiring further action to complete the request
4 * * Client error, request contains syntax error or request cannot be completed
5 * * Server error. The server encountered an error while processing the request
8.2.HTTP status code table details
Status code Status code English name Product description
100 Continue To continue. The client should continue with its request
101 Switching Protocols Switch protocol. The server switches protocols based on client requests. You can only switch to a more advanced protocol, for example, the new version of HTTP
200 The OK request succeeded. Typically used for GET and POST requests
201 Created Created. The new resource was successfully requested and created
202 Accepted. The request has been accepted, but processing is not complete
203 Non-Authoritative Information Unauthorized information. The request succeeded. The meta information returned is not the original server, but a copy
204 No Content No content. The server processed successfully, but did not return content. You can ensure that the browser continues to display the current document without updating the web page
205 Reset Content Reset the content. The server is successful, and the user end (for example, browser) should reset the document view. Use this return code to clear the browser’s form field
206 Partial Content Part of the content. The server successfully processed some of the GET requests
300 Multiple Choices A variety of options. The requested resource can include multiple locations, and a list of resource characteristics and addresses can be returned for user terminal (e.g., browser) selection
301 Moved Permanently Permanently move. The requested resource has been permanently moved to the new URI, the return message will include the new URI, and the browser will automatically redirect to the new URI. Any future new requests should be replaced with a new URI
302 Found Temporary move. Similar to 301. But resources are moved only temporarily. The client should continue to use the original URI
303 See Other Look at other addresses. Similar to 301. Use GET and POST requests to view
304 Not Modified Unmodified. The requested resource is not modified, and the server does not return any resources when it returns this status code. Clients typically cache accessed resources by providing a header indicating that the client wants to return only resources that have been modified after a specified date
305 Use Proxy Use a proxy. The requested resource must be accessed through a proxy
306 Unused An invalid HTTP status code
307 Temporary Redirect Temporary redirect. Similar to 302. Use GET to request redirection
400 Bad Request Client request syntax error, server cannot understand
401 Unauthorized The request requires user authentication
402 Payment Required Reserved for future use
403 Forbidden The server understands the request from the requesting client, but refuses to execute the request
404 Not Found The server could not find the resource (web page) based on the client’s request. With this code, a web designer can set up a personalized page that says “the resource you requested could not be found.
405 Method Not Allowed The method in the client request is disabled
406 Not Acceptable The server could not complete the request based on the content nature of the client request
407 Proxy Authentication Required The request requires the identity of the broker, similar to the 401, but the requester should use the broker for authorization
408 Request Time-out The server waited for a request sent by the client for a long time and timed out. Procedure
409 Conflict The server may return this code after completing a PUT request from the client. A conflict occurred when the server processed the request
410 Gone The resource requested by the client does not exist. 410 differs from 404 in that if a resource previously had a 410 code that is now permanently deleted, the site designer can specify a new location for the resource through the 301 code
411 Length Required The server cannot process the content-length message sent by the client
412 Precondition Failed A prerequisite error occurred when the client requested information
413 Request Entity Too Large The request was rejected because the requested entity was too large for the server to process. To prevent continuous requests from clients, the server may close the connection. If the server is temporarily unable to process it, a retry-after response is included
414 Request-URI Too Large The request URI is too long (usually a url) for the server to process
415 Unsupported Media Type The server could not process the media format attached to the request
416 Requested range not satisfiable The scope requested by the client is invalid
417 Expectation Failed The server cannot satisfy Expect’s request headers
500 Internal Server Error The server had an internal error and could not complete the request
501 Not Implemented The server did not support the requested functionality and could not complete the request
502 Bad Gateway A server acting as a gateway or proxy received an invalid request from a remote server
503 Service Unavailable The server is temporarily unable to process client requests due to overloading or system maintenance. The length of the delay can be included in the server’s retry-after header
504 Gateway Time-out The server acting as a gateway or proxy did not get the request from the remote server in time
505 HTTP Version not supported The server did not support the HTTP version of the request and could not complete the processing