HTTP overview

What is the HTTP protocol

HyperText Transfer Protocol (HTTP), also known as HyperText Transfer Protocol, is an application-layer Protocol in the network model. It is usually used to Transfer HTTP packets (as described below) between a server program and a client program to Transfer HTML files, images, and videos. HTTP uses TCP as its support transport protocol. This means that HTTP does not transmit the content itself. It only encapsulates the content to be transmitted into HTTP datagrams and then hands them to TCP for transmission. An HTTP request sent by a client process always reaches the server intact, and conversely, we can assume that in general HTTP packets returned by the server always reach the client that responded to the request intact.

HTTP is a stateless protocol

HTTP is a stateless protocol, this sentence mean the server program in send a file to the client program does not store any client state information, which is contrary to our common sense, because now we are browsing the web most of the information related to account information (such as in e-commerce sites to check your shopping cart data), But on the Internet at first most pages are static, does not need to do anything related to user information interaction, but with the development of the Internet, the needs of this and the interaction between the user information more and more developers was thinking about adding called a cookie in the HTTP message to solve this problem, However, HTTP is still considered a stateless protocol, because it is the task of the back-end program to return the corresponding data through cookies, and has nothing to do with HTTP itself.

Non-persistent and persistent connections for HTTP

Noncontinuous connection

For example, when we visit the home page of Baidu, we need to obtain the HTML file of baidu’s home page and download the logo of Baidu’s home page. If the non-continuous connection is adopted, two TCP transmission channels will be generated to transmit HTML documents and images respectively. By default, a browser can open 5 to 10 parallel TCP connections. When the number of files requested is greater than this, frequent handshakes and waves can cause time loss. At the same time, the frequent creation and destruction of TCP connections can cause some server performance losses.

Persistent connections

Continuous connection refers to the reuse of the TCP connection between the client and the server as much as possible. It means that the TCP connection is not closed immediately after the current file is transferred. Instead, it waits for a period of time (optional). In this way, the problem caused by non-continuous connection is solved, but at the same time, if the wait time is set improperly, there will be many problems, so the wait time should be set reasonably.

The HTTP message

The request message

A typical HTTP request packet structure is shown in the following figureA request message consists of one request line, several headers lines, and an entity body (headers). Only the request line is required, and the other two may not exist in some cases. Under Linux we can use the CRUl command to view HTTP request and response messages we usecurl -v http://www.baidu.com/Command to view HTTP request packets when accessing Baidu. Some results are displayed as follows

The request line

The request line has three fields, the method field, the URL field, and the HTTP version field

Methods field

HTTP methods include GET, POST, HEAD, PUT, and DELETE. The vast majority of HTTP requests use the GET method, which is also used when we type a web address into a browser to GET the corresponding web page. The POST method is used to submit some information to the server, such as the login username and password, which is stored in the entity mentioned above and then transmitted to the server. The HEAD method is similar to the GET method except that the HEAD method does not return the object requested by the GET method (such as an HTML file), but only the request line and the header line. The PUT method is used to upload files to the server. The DELETE command is used to DELETE files on the server.

The URL field

That we request the positions of the file on the server, because we are here to visit baidu home page, the default is the root, so here’s the URL field for “/”, if we use curl -v, http://www.baidu.com/index.php, The URL field will be “/index.php”

The HTTP version

The version of the HTTP protocol used for transport, in this case HTTP1.1

The first line

The first line stores some information, which is separated by a return character and a newline character. Common lines include Host, which indicates the Host name requesting access, and user-agent, which indicates the browser version issuing the client request

The entity body

As mentioned above, request information is stored, but not every HTTP request message has an entity body

The response message

A typical HTTP response packet structure is shown in the following figureWe can see that the structure of the response message is very similar to the structure of the request message, except that the request line is changed to the status linecurl -v http://www.baidu.comCommand to view the response header

The status line

version

This parameter is the same as the HTTP version of the request packet

Status code and phrase

Status codes and phrases correspond one to one. Together, they reflect the result of a request. Here are some common status codes and phrases and what they mean

  • 200 OK: The request succeeds and the information is in the response packet returned
  • 301 Moved Permanently: The requested object was Permanently Moved, but not deleted. The current path of the object in the server will be placed in the Location: XXXXXXX header line of the response message, so that the client can request Permanently
  • 400 Bad Request: A generic error code indicating that the Request server does not understand
  • 404 Not Found: Requested content does Not exist on the server
  • 505 HTTP Version Not Support: The server does Not Support the HTTP Version used in the request

The first line

The function of the header line of the response message is basically the same as that of the request message. Therefore, the meanings of the header line of the response message are only introduced here.

  • Connection:close/ keep-alive corresponds to whether the HTTP uses a non-persistent Connection or a persistent Connection as mentioned above
  • Date Indicates the time when the server sends the response packet
  • Content-length Specifies the Length of the entity in the response packet, in bytes
  • Content-type Specifies the Type of the object returned by the response message. It is an HTML document or a common document Type
  • Last-modified The time when the file was Last Modified on the server. We’ll come back to this at the end when we talk about proxy servers
  • Server The name and version of the responding Server

Proxy server correlation

If all network requests are sent directly to the original server, it will put a serious burden on the original server. In this case, we need to set up a proxy server between the client and the server to solve this problem (as shown in the figure below).The way it works is that requests sent by the client are first sent to the proxy server, and if there is something the client wants in the proxy server, the proxy server can return that content directly, without having to send a request to the original server, thereby reducing the burden on the original server. The above content may be abstract, but let’s analyze it through an example. If I visit the home page of Baidu for the first time, and there is no relevant object saved in the proxy server, then the whole process sequence is like this

  1. The client sends a request to the proxy server indicating that it wants to request baidu’s home page
  2. The proxy server first checks whether there is a local baidu home page, and then finds that there is no local page, so the proxy server sends a request to the initial server for relevant objects
  3. The original server receives the request and returns the requested object to the proxy server
  4. The proxy server receives the object returned by the original server, saves a local copy and returns it to the client
  5. After receiving the data returned by the proxy server, the client displays the corresponding interface on the browser

From then on, I visited the baidu homepage again, and the whole process was much simpler, as shown below

  1. The client sends a request to the proxy server indicating that it wants to request baidu’s home page
  2. The proxy server checks whether the local baidu home page exists and returns the file to the client
  3. After receiving the data returned by the proxy server, the client displays the corresponding interface on the browser

But this leads to the problem of what happens if the file on the server has changed since the proxy server saved it, which requires the last-Modified header line we mentioned above. When the proxy server receives the request, It sends a request to the original server to check whether the last-Modified of the local file matches the last-Modified on the server. If so, it returns the local object directly. If not, it downloads a copy of the overwritten local object from the original server and returns the object to the client.