HTTP (Hypertext Transfer Protocol)

Hyper Text Transfer Protocol (HTTP) is a simple request-response Protocol that usually runs on top of TCP. It specifies what messages the client might send to the server and what responses it might get. The headers of the request and response messages are given in ASCII form; The message content has a MIME-like format.

GET and POST

  • The most obvious difference is that GET contains parameters in the URL, while POST passes parameters through the request body.
  • GET is harmless when the browser falls back, while POST resubmits the request.
  • GET requests are actively cached by browsers, whereas POST requests are not, unless set manually.
  • GET requests can only be url encoded, while POST supports multiple encoding methods.
  • GET requests pass parameters in the URL with length limits, whereas POST does not.
  • GET accepts only ASCII characters for the data type of the argument, while POST has no restrictions.
  • GET is less secure than POST because parameters are exposed directly to the URL and therefore cannot be used to pass sensitive information.

The essential difference

The underlying layer of HTTP is TCP/IP. So the bottom layer of GET and POST is also TCP/IP, that is, both GET and POST are TCP links. GET and POST can do the same thing. Technically, it makes perfect sense for you to add a request body to GET and a URL to POST.

So what’s the difference?

  • GET generates a TCP packet; POST generates two TCP packets.

  • From a caching perspective, GET requests are actively cached by the browser, leaving a history, whereas POST requests are not cached by default.

  • From an encoding perspective, GET can only urL-encode and can only accept ASCII characters, while POST has no limitations.

  • From a parameter perspective, GET is generally placed in the URL and therefore is not secure, while POST is placed in the request body and is better suited for transmitting sensitive information.

  • From an idempotent point of view, GET is idempotent and POST is not. (Idempotent means to perform the same operation with the same result.)

Why GET generates a TCP packet; POST generates two TCP packets

For GET requests, the browser sends both HTTP headers and data, and the server responds with 200 (return data).

For POST, the browser sends a header, the server responds with 100 continue, the browser sends data, and the server responds with 200 OK (returns data).

In other words, GET only needs to make one trip to deliver the goods, whereas POST has to make two trips. In the first trip, it goes to the server and says, “Hey, I’m going to deliver the goods later, you can open the door and meet me.” Then it goes back and delivers the goods.

Because POST takes two steps and consumes a little more time, it looks like GET is more efficient than POST. So the Yahoo team has recommended using GET instead of POST to optimize site performance. But this is a pit! Jump with caution. Why is that?

  1. GET and POST have their own semantics and cannot be used together.
  2. According to research, under the condition of good network environment, the time difference between sending a packet and sending two packets can be ignored basically. In the case of poor network environment, the TCP of two packets has great advantages in verifying the integrity of packets.
  3. Not all browsers send packages twice in POST, Firefox only sends them once.

Basic optimizations for HTTP

There are two main factors that affect an HTTP network request: bandwidth and latency.

  • Bandwidth: If we were still in the dial-up phase, bandwidth would be a serious problem, but now that the network infrastructure has improved bandwidth so much that we don’t have to worry about bandwidth affecting speed, all that’s left is latency.
  • Delay:
    • HOL blocking: Browsers block requests for a number of reasons. A browser can have only four connections for the same domain name at a time (this may vary depending on the browser kernel). If the number of connections exceeds the threshold, subsequent requests will be blocked.
    • DNS Lookup: The browser needs to know the IP address of the target server to establish a connection. The system that resolves domain names into IP addresses is called DNS. This can often be done with DNS caching results to reduce this time.
    • Initial Connection: HTTP is based on TCP. The browser can set up an actual connection only after the third handshake. However, these connections cannot be reused, resulting in three handshakes and slow startup for each request. The effect of three-way handshake is more obvious in the case of high latency, and the effect of slow startup is more significant in the case of large file requests.

HTTP1.0

HTTP1.0 was first used in 1996 for simple web pages and web requests. In order to improve the efficiency of the system, HTTP1.0 requires that the browser and the server only maintain a short connection, each browser request to establish a TCP connection with the server. The server disconnects the TCP connection as soon as it finishes processing the request. The server does not track each client or record past requests. For example, a WEB file containing many images does not contain the actual image data content, but only the URL address of the image. When a WEB browser accesses the WEB file, the browser first issues a request for the WEB file. When the browser parses the HTML content in the webpage document returned by the WEB server and finds the image label, the browser sends a request to the server to download the image data again according to the URL address specified by the SRC attribute in the label. Obviously, the entire process of accessing a web page file containing many images involves multiple requests and responses, each requiring a separate connection, each transferring only one document and image, with the last request completely separated from the next. Even if the image files are small, each time the client and server establish and close a connection is a relatively time-consuming process, and can seriously affect the performance of the client and server. A similar situation occurs when a web page file contains JavaScript files, CSS files, etc.

At the same time, bandwidth and latency are also important factors affecting a network request. At a time when network infrastructure has dramatically increased bandwidth, most of the latency is in response speed. Based on this, the most common complaints about HTTP1.0 are connection reuse and head of line blocking. It is important to understand these two problems: The client establishes connections to the server based on the domain name. Generally, the PC browser establishes six to eight connections to the server based on a single domain name, while the number of connections to the mobile phone is limited to four to six. Obviously, more connections is not always better, as both resource overhead and overall latency increase. The inability to reuse the connection results in three handshakes and a slow start per request. The effect of three-way handshake is more obvious in the case of high latency, and the effect of slow startup is more significant in the case of large file requests. Head of line blocking causes that bandwidth cannot be fully used and subsequent health requests are blocked.

The HTTP header is blocked

HTTP1.1 adds a pipelining technique that allows clients to send the next request without waiting for a response from the server. Objective Multiple requests can be sent in parallel on a TCP connection to improve network utilization. But it has one big drawback: the server must respond in the order requested. That is, the response to a subsequent request must wait until the first response is sent, even if the subsequent response is completed. This is where HTTP header blocking comes in.

Head of line blocking

Resolve HTTP queue header blocking

1. Concurrent connection

Allowing multiple long connections to be assigned to a domain name increases the queue of tasks so that one queue’s tasks do not block all other tasks. In RFC2616, the maximum number of concurrent connections for a client is 2, but in fact, in the current browser standard, this limit is much higher, Chrome is 6. But in fact, even improved concurrent connections are not enough to meet people’s performance requirements.

2. Domain name fragmentation

Can not one domain name concurrent 6 long connections? Then I’ll give you a few more domains.

Such as content1.sanyuan.com, content2.sanyuan.com.

Such a sanyuan.com domain name can be divided into many secondary domain names, and they all point to the same server, the number of concurrent long connections can be more, in fact, better solve the problem of queue blocking.

3. Use HTTP2.0

HTTP2.0 can be used to resolve queue header blocking caused by pipelining techniques in HTTP1.1. HTTP2.0 adds a layer of binary framing, introducing the concept of frames, messages, and streams. Each request/response becomes a message, which can be divided into multiple frames, and each frame is transmitted in a stream. A TCP connection can have multiple streams. Each frame is reorganized into a message upon arrival, thus avoiding request header blocking.

Of course, HTTP2.0 still uses TCP at the bottom, and TCP header blocking will still occur.

HTTP1.1

To overcome this shortcoming of HTTP 1.0, HTTP 1.1 supports persistent connections (the default HTTP/1.1 mode uses pipelined persistent connections). Multiple HTTP requests and responses can be sent over a SINGLE TCP connection, reducing the cost and latency of establishing and closing connections. Multiple requests and replies for a web page file containing many images can be transferred in a single connection, but each request and reply for a separate web page file still needs to use its own connection. HTTP 1.1 also allows a client request results back don’t have to wait for the last time, you can send out the next request, but the server must be in order to receive the order of the client request of echo response as a result, to ensure that the client can distinguish between the response content of each request, it also significantly reduce the time required to download the entire process.

In http1.1, request, and Reponse headers it is possible to have a Connection header that describes how long links should be handled when the client and server communicate. If a client uses the http1.1 protocol and does not want to use long links, the header must be set to close. If the server side also does not want to support long links, the value of connection should also be specified as close in response. Whether the header of request or Response contains a connection whose value is close, it indicates that the TCP connection in use will be broken after the request is processed on the day. The client must create a new TCP link when making a new request.

HTTP 1.1 inherits the advantages of HTTP 1.0, but also overcomes the performance problems of HTTP 1.0. HTTP 1.1 improves and extends HTTP 1.0 by adding more request and response headers. For example, HTTP 1.0 does not support the Host request header field, and WEB browsers cannot use the Host header name to explicitly indicate which WEB site on the server to visit. This makes it impossible to use a WEB server to configure multiple virtual WEB sites on the same IP address and port number. With the addition of the Host request header field in HTTP 1.1, WEB browsers can use the Host header name to specify which WEB site on the server to visit. This makes it possible to create multiple virtual WEB sites on a Single WEB server using different Host names on the same IP address and port number. For example, if the Connection request header value is keep-alive, the client notifies the server to return the result of this request and Keep the Connection. Connection When the value of the request header is close, the client notifies the server to return the request result and then closes the Connection. HTTP 1.1 also provides request and response headers related to mechanisms such as authentication, state management, and Cache caching. HTTP/1.0 does not support file resumable. RANGE:bytes is new in HTTP/1.1. Each HTTP/1.0 file is transmitted from the file header, that is, from 0 bytes. RANGE:bytes=XXXX indicates that the server is required to start the transfer from XXXX bytes of the file. This is what we normally call a resumable breakpoint

The main differences between HTTP1.0 and HTTP1.1 are:

  1. HTTP1.0 mainly uses if-modified-since,Expires in the header. HTTP1.1 introduces more cache control policies such as Entity tag. If-unmodified-since, if-match, if-none-match, etc.
  2. Bandwidth optimization and network connection, HTTP1.0, there are some waste of bandwidth, such as the client only needs a part of an object, and the server will send the whole object over, and does not support breakpoint continuation function, HTTP1.1 is introduced in the request header range header field, which allows only a part of the resource request, The return code is 206 (Partial Content), which makes it easy for developers to make the most of bandwidth and connections.
  3. Error notification management, HTTP1.1 added 24 error status response code, such as 409 (Conflict) indicates that the requested resource and the current state of the resource Conflict; 410 (Gone) Indicates that a resource on the server is permanently deleted.
  4. The Host header processing, in HTTP1.0, assumes that each server is bound to a unique IP address, so the URL in the request message does not pass the hostname. However, with the development of virtual hosting technology, there can be multiple virtual hosts (multi-homed Web Servers) on a physical server, and they share the same IP address. HTTP1.1 both Request and response messages should support the Host header field, and an error (400 Bad Request) will be reported if there is no Host header field in the Request message.
  5. HTTP 1.1 supports long Connections and Pipelining processing that delivers multiple HTTP requests and responses over a SINGLE TCP connection, reducing the cost and latency of establishing and closing connections. Connection: keep-alive is enabled by default in HTTP1.1, somewhat compensating for the fact that HTTP1.0 creates a Connection on every request.

HTTP2.0

HTTP2.0 is the first update to the HTTP protocol since the release of HTTP1.1 in 1999. It is based on the SPDY protocol. HTTP2.0 significantly improves Web performance, further reducing network latency on top of HTTP1.1’s full semantic compatibility. Achieve low latency and high throughput. For the front-end developer, there is less optimization. It is mainly reflected in the following features:

  • The head of compression
  • multiplexing
  • Binary framing
  • Request priority
  • Server push

What is the SPDY protocol

SPDY is a term of Speedy, meaning ‘faster’. It is an application layer protocol based on TCP protocol developed by Google. The goal is to optimize the performance of the HTTP protocol to reduce load times and improve security of web pages through compression, multiplexing, and prioritization. The core idea of SPDY protocol is to minimize the number of TCP connections. SPDY is not intended as an alternative to HTTP, but rather an enhancement of the HTTP protocol.

1. Head compression

In HTTP/1.1 and prior eras, the request body would typically have a response compression Encoding process, specified by the Content-Encoding header field, but have you ever thought about compression of the header field itself? When the request fields are very complex, especially for GET requests, the request messages are almost all request headers, there is still a lot of room for optimization. HTTP/2 also uses the corresponding compression algorithm – HPACK for the header field to compress the request header.

The HPACK algorithm is designed specifically for HTTP/2 and has two main advantages:

  • The first is to create a hash table between the server and the client. The fields to be used are stored in this table. Then during the transmission, only the index (such as 0,1,2…) is needed. Pass to the other side, the other side to get the index table on the line. In this way, the request header field can be simplified and reused to a great extent.

HTTP/2 does away with the concept of the start line. Instead, request methods, URIs, and status codes in the start line are converted to header fields, but these fields have a “:” prefix to distinguish them from other headers.

  • Secondly, for integer and string Huffman encoding, the principle of Huffman encoding is to establish an index table for all the characters that appear first, and then make the index corresponding to the characters that appear more frequently as short as possible, transmission is also the transmission of such index sequence, which can achieve a very high compression rate.

2. Multiplexing

It is said that the HTTP queue head blocking problem can be solved by using concurrent connections and domain name sharding, but this does not really solve the problem from the level of HTTP itself, but only increases THE TCP connection, sharing the risk. There is also a downside: multiple TCP connections compete for limited bandwidth, and truly high-priority requests are not processed first.

In http1.x, you can use a Sprite image to block the number of requests from the same domain. As a result, the web browser blocks the number of requests from the same domain and blocks the number of requests from the same domain. Resources need to wait for other resource requests to complete before continuing to send.

In HTTP2.0, based on the binary framing layer, HTTP2.0 can send requests and responses simultaneously over a shared TCP connection. HTTP messages are broken up into separate frames without breaking the semantics of the message itself, sent interlaced, and reassembled at the other end based on stream identifiers and headers. By using this technology, the queue header blocking problem of old HTTP versions can be avoided and the transmission performance can be greatly improved.

So what is binary framing? How does HTTP/2 deal with so-called queue header blocking?

3. Binary frame division

First of all, HTTP/2 believes that plaintext transmission is too troublesome for the machine and not convenient for the computer to parse, because for the text will have polysemous characters, such as whether carriage return newline is content or delimiter, in internal need to use the state machine to identify, which is inefficient. So HTTP/2 simply changed all the packets into binary format, all transmission 01 string, convenient machine parsing.

The Headers + Body format is now split into binary frames. The Headers frame stores the header field and the Data frame stores the request Body Data. After frame splitting, the server sees a bunch of out-of-order binary frames instead of complete HTTP request packets. These binary frames are not sequenced, so they are not queued, and there is no HTTP queue blocking.

Both communication parties can send binary frames to each other. This bidirectional sequence of binary frames is also called a Stream. HTTP/2 uses streams to communicate multiple data frames over a TCP connection, which is the concept of multiplexing.

You may have a question, since it is out of order first, then how to deal with the out-of-order data frame?

Streams with different ids are out of order, but frames with the same Stream ID must be transmitted in order. After the binary frame arrives, the peer party will assemble the binary frame with the same Stream ID into a complete request message and response message. Of course, there are other fields in the binary frame that implement functions such as priority and flow control.

How are binary frames designed in HTTP/2?

The frame structure

The frame structure transmitted in HTTP/2 is shown below:

Each frame is divided into a header and a body. The first is the three-byte frame length, which represents the length of the frame body.

Then there are the frame types, which can be roughly divided into data frame and control frame. Data frames are used to store HTTP packets, and control frames are used to manage the transmission of streams.

The next byte is the frame flag, which contains eight flags, such as END_HEADERS, END_STREAM, and END_STREAM.

The last four bytes are the Stream ID, or Stream identifier, which allows the receiver to select frames with the same ID from the out-of-order binary frames and assemble them into request/response messages in sequence.

Flow state changes

As you can see from the previous section, a stream in HTTP/2 is actually a bi-directional sequence of binary frames. So how does the state of a flow change during HTTP/2 requests and responses?

HTTP/2 actually borrowed the idea of TCP state change, according to the frame flag bit to achieve specific state change. Let’s take a common request-response process as an example:

After the client sends the Headers frame, the Stream ID is allocated. Then the client Stream is opened, and the server Stream is also opened. After both streams are opened, data frames and control frames can be transmitted to each other.

When the client is about to shut down, it sends an END_STREAM frame to the server to enter the semi-closed state, in which the client can only receive data, but not send data.

The server also enters the semi-closed state after receiving the END_STREAM frame, but the server can only send data, not receive data. Then the server also sends the END_STREAM frame to the client, indicating that the data is sent and the two parties enter the shutdown state.

If you want to start a new flow next time, the flow ID must be increased until the upper limit is reached. When the upper limit is reached, a new TCP connection is opened to start the count again. Because the stream ID field is 4 bytes long and the highest bit is reserved, the range is 0 to 2 to the power of 31, or about 2.1 billion.

The characteristics of the flow

Just talked about the flow state change process, here incidentally to summarize the characteristics of a stream transmission:

  • Concurrency. Unlike HTTP/1, multiple frames can be sent simultaneously on an HTTP/2 connection. This is also the basis for multiplexing.
  • Since the sex. Stream ids are not reusable, but are incremented sequentially, and when the upper limit is reached, a new TCP connection is opened to start from the beginning.
  • Two-way sex. Both the client and server can create streams without interfering with each otherThe senderorThe receiving party.
  • You can set the priority. You can set the priority of data frames so that the server can handle important resources first to optimize the user experience.

4. Server push

Another thing worth mentioning is HTTP/2’s Server Push. In HTTP/2, the server is no longer completely passive to receive and respond to requests. It can also create a stream to send messages to the client. When a TCP connection is established, for example, the browser requests an HTML file, the server can return the HTML, Other resource files referenced in the HTML are returned to the client, reducing the client’s wait

5. Request priority

By dividing HTTP messages into many individual frames, you can further optimize performance by optimizing the interleaving and transmission order of these frames.

HTTPS

HTTP is a plaintext transmission protocol, the transmission of external completely transparent, very insecure, how to further ensure security?

The result is HTTPS, which is not a new protocol, but a layer of SSL/TLS under HTTP. In short, HTTPS = HTTP + SSL/TLS.

So what is SSL/TLS?

SSL, or Secure Sockets Layer, is at the session Layer (Layer 5) in the OSI seven-tier model. There have been three major versions of SSL before it was standardized into a third major version, called TLS (Transport Layer Security), and used as TLS1.0, to be precise, TLS1.0 = SSL3.1.

The current mainstream version is TLS/1.2. The previous TLS1.0 and TLS1.1 are considered insecure and will be completely phased out in the near future. So we’re going to focus on TLS1.2, and of course in 2018, TLS1.3 came out, which greatly optimised the TLS handshake process.

Part of the article is excerpted from god’s explanation of HTTP, link: juejin.cn/post/684490…