[browser working principle and practice] Day 10 | know the development of HTTP

HTTP/1

Hypertext Transfer Protocol HTTP/0.9

HTTP/0.9, proposed in 1991 for academic communication, had a simple requirement: it was used to transfer HTML hypertext content across networks, so it was called hypertext Transfer Protocol.

The following figure shows the HTTP/0.9 request flow:

The requirement is very simple, is used to transfer small HTML files, so HTTP/0/9 implementation characteristics have the following three:

Only one request line, no HTTP request header and request body;
The server does not return header information.
The content of the returned file is transmitted as an ASCII character stream, because it is an HTML file.

HTTP / 1.0

With the rapid development of the World Wide Web, HTTP/0.9 has been unable to meet a core demand: to support multiple types of file download. In order to allow clients and servers to communicate more deeply, HTTP/1.0 introduced request headers and response headers, which are stored in the form of key-value.

To support multiple types of files, several issues need to be addressed:

The browser needs to know what type of data the server is returning
The browser needs to know how the server compresses
A page that requires the browser to tell the server what language version it wants;
The browser needs to know the encoding type of the file.

Based on the above problems, HTTP/1.0 scheme is to negotiate through the request header and response header. When the request is initiated, the HTTP request header will tell the server what type of file it expects the server to return, what form of compression to take, what language files to provide and the specific encoding of the file.

The resulting request header from the browser reads as follows:

accept: text/html // Expect the server to return an HTML file
accept-encoding: gzip, deflate, br // Expect the server to use either GZIP, Deflate or BR compression
accept-Charset: ISO-8859-1,utf-8 // The desired file encoding is UTF-8 or ISO-8859-1
accept-language: zh-CN,zh // The preferred language of the page is Chinese
Copy the code

After receiving the request header from the browser, the server prepares the response data based on the request header.

Sometimes, however, the browser requests a compression type of GZIP, but the server does not support GZIP and only supports BR compression, so it tells the browser the final compression type through the Content-Encoding field in the response header. This means that ultimately the browser needs to process the data based on the information in the response header.

The following is the data of the response header:

content-encoding: br // indicates that the server uses br compression
content-type: text/html; charset=UTF-8 // Indicates that the server is returning an HTML file whose encoding type is UTF-8
Copy the code

Summary of HTTP/1.0 features

Request headers and response headers were introduced.
Import the status code. The status code is notified to the browser by way of a response line
Provides the Cache mechanism. To reduce the strain on the server, HTTP/1.0 provided a Cache mechanism to Cache data that had already been downloaded.
Join the user agent. The server needs to collect basic information about the client, such as the number of Windows and macOS users, so HTTP/1.0 request headers also include user-agent fields.

HTTP / 1.1

Over HTTP/1.0, HTTP/1.1 improves a number of issues:

1. Improve persistent connections

HTTP/1.0 short connection, that is, every HTTP communication requires the establishment of a TCP connection, the transmission of HTTP data and the disconnection of TCP connection three phases, will undoubtedly add a lot of fearless overhead. As shown below:

To this end, HTTP/1.1 added the persistent connection method, which features the ability to transfer multiple HTTP requests over a SINGLE TCP connection and keep the TCP connection as long as the browser or server does not explicitly disconnect.

The benefits are:Reduces the additional burden on the server and increases the overall HTTP request time.

Persistent connections are enabled by default in HTTP/1.1, so you don’t need to set HTTP headers specifically for persistent connections. If you don’t want persistent connections, you can add Connection: close to the HTTP header. By default, six TCP persistent connections can be established for the same domain name in the browser.

2. Immature HTTP pipelining

If for some reason one request in the TCP channel is not returned in time, then all subsequent requests are blocked, which is known as queue head blocking.

(For a variety of reasons, pipelining was eventually abandoned by major manufacturers)

3. Provide virtual host support

In HTTP/1.0, each domain name was bound to a unique IP address, so only one domain name could be supported by a server. However, with the development of virtual host technology, it is necessary to bind multiple virtual hosts on a physical host. Each virtual host has its own separate domain name, and these separate domain names all share the same IP address.

Therefore, the HTTP/1.1 request header added a Host field to represent the current domain name address, so that the server can do different processing based on different Host values.

4. Perfect support for dynamically generated content

When DESIGNING HTTP/1.0, you need to set a full data size in the response header, such as Content-Length: 901, so that the browser can receive data based on the set data size. However, with the development of server-side technology, the content of many pages is generated dynamically, so the final data size is not known until the data is transferred, which causes the browser to not know when all the file data will be received.

HTTP/1.1 solves this problem by introducing the Chunk transfer mechanism. The server will divide the data into several data blocks of arbitrary size, and each data block will be sent with the length of the previous data block. Finally, a zero-length block is used as a symbol of the completion of data transmission. This provides support for dynamic content.

5. Client Cookie and security mechanism

Summary of HTTP/1.1 features

Added persistent connections.
The browser maintains a maximum of six TCP persistent connections for each domain name.
Using CDN to achieve domain name sharding mechanism.

In this figure, CDN is introduced and six connections are maintained for each domain at the same time, which greatly reduces the download time for the entire resource. Here we can do a simple calculation: if using a single TCP persistent connection, it takes 100 * n * RTT to download 100 resources; By using the above technique, the total time can be reduced to 100 * N * RTT/(6 * CDN number). According to this calculation, our page loads much faster.

Three factors that affect HTTP/1.1 efficiency

Main problem with HTTP/1.1: Bandwidth utilization is not ideal

Bandwidth is the maximum number of bytes that can be sent or received per second. We call the maximum number of bytes per second that can be sent upstream bandwidth and the maximum number of bytes per second that can be received downstream bandwidth.

HTTP/1.1 has poor bandwidth utilization because it is difficult to use up bandwidth. For example, we often say that 100M bandwidth, the actual download speed can reach 12.5m /S, while using HTTP/1.1, maybe when loading page resources can only use up to 2.5m /S, it is difficult to use up 12.5m.

This problem is mainly caused by the following three reasons.

1. TCP starts slowly

Once a TCP connection is established, it enters the state of sending data. At first, the TCP protocol uses a very slow speed to send data, and then speeds up until the speed of sending data reaches an ideal state. This process is called slow start.

Slow start is a TCP policy to reduce network congestion, and there is no way to change it.

The reason why slow start will bring performance problems is that some key resource files commonly used in the page are not large, such as HTML files, CSS files and JavaScript files, usually these files after the TCP connection is established, the request will be initiated, but this process is slow start. So it took much longer than normal, delaying the precious first rendering of the page.

2. Multiple TCP connections compete for bandwidth

You can imagine that the system establishes multiple TCP connections at the same time. When the bandwidth is sufficient, the speed of sending or receiving each connection slowly increases. When bandwidth is low, these TCP connections slow down the speed of sending and receiving. For example, if a page has 200 files and uses 3 CDNS, it needs to establish 6 * 3, that is, 18 TCP connections to download resources when loading the page. During the download process, when insufficient bandwidth is found, each TCP connection needs to dynamically slow down the data receiving speed.

Some TCP connections download key resources, such as CSS files and JavaScript files, while some TCP connections download common resource files, such as pictures and videos. However, multiple TCP connections cannot negotiate which key resources can be downloaded first. This may affect the download speed of critical resources.

3. Queue head is blocked

We know that with persistent connections in HTTP/1.1, although it is possible to share a TCP pipe, only one request can be processed at a time in a pipe, and other requests can be blocked until the current one is completed. This means we can’t just send requests and receive content in one pipe.

This is a very serious problem, because there are many factors blocking requests, and are some uncertain factors, if some requests are blocked for 5 seconds, then the subsequent queued requests will be delayed for 5 seconds, in this waiting process, bandwidth, CPU are wasted.

Generate page in browser processing process, it is very hope to be able to receive data in advance, so that you can do to the data pre-processing, such as advance received pictures, you can ahead of time decoding operation, when the need to use the images, can be directly processed data are given, this can let the user feel overall speed boost.

However, queue header blocking prevents these data from being requested in parallel, so queue header blocking is detrimental to browser optimization.

HTTP/2

Some of the main problems with HTTP/1.1: Slow starts and competing bandwidth between TCP connections are due to TCP’s own mechanism, and queue header blocking is due to HTTP/1.1’s mechanism.

The idea behind HTTP/2 is to use only one LONG TCP connection for a domain name to transfer data, so that the entire page resource download process requires only a slow start and avoids the problem of multiple TCP connections competing for bandwidth.

The HTTP/2 solution can be summed up as: use only one TCP long connection for a domain name and eliminate the queue header blocking problem. Please refer to the following figure:

This diagram is HTTP/2’s core, most important, and most disruptive multiplexing mechanism. From the diagram you can see that each request has a corresponding ID, such as stream1 for index.html request and stream2 for foo.css request. This way, on the browser side, requests can be sent to the server at any time.

Multiplexing mechanism

Core features: Multiplexing technology, can send multiple URL requests through a TCP connection. Realize the parallel transfer of resources. Multiplexing technology is based on the binary frame layer.

As you can see from the figure, HTTP/2 adds a binary frame layer, so let’s analyze the HTTP/2 request and receive process in combination with the figure.

First, the browser prepares the request data, including the request line, the request information, and, if it is a POST method, the request body.
These data are processed by the binary frame layer and converted into frames with request ID numbers, which are sent to the server through the protocol stack.
After the server receives all the frames, it merges all the frames with the same ID into one complete request message.
The server then processes the request and sends the processed response line, response header, and response body to the binary framing layer, respectively.
After the browser receives the response frame, it submits the frame’s data to the request based on the ID number.

As you can see from the flow above, HTTP multiplexing is implemented by introducing the binary framing layer.

Summary of HTTP/2 features

Multiplexing technology (referencing the binary frame layer), realizing the parallel transfer of resources, is the most core function.
You can set the priority of requests
Server push
The head of compression

Multiplexing technology can make full use of bandwidth, maximum avoid the problem caused by TCP slow start, but also realize the header compression, server push and other functions, so that the transmission speed of page resources has been greatly improved.

The defects of HTTP / 2

1. The TCP queue header is blocked

Normal TCP data transfer process:

TCP packet loss status:

As shown in the figure, in the process of data transmission, if a packet is lost due to network failure or other reasons, then the whole TCP connection will be suspended, waiting for the lost data treasure to be re-transmitted.

In TCP transmission, the blocking caused by the loss of a single packet is called queue head blocking on TCP.

HTTP/2 multiplexing:

From this figure, we know that in HTTP/2, multiple requests are running in a TCP pipeline. If packet loss occurs in any of the data streams, all requests in the TCP connection will be blocked. This is different from HTTP/1.1, where the browser opens six TCP connections for each domain name. If one of these TCP connections is blocked, the other five connections can continue to transmit data.

Therefore, as the packet loss rate increases, HTTP/2 transmission efficiency becomes worse and worse. Some test data showed that HTTP/1.1 performed better than HTTP/2 when the system reached a 2% packet loss rate.

2. Delay of TCP connection establishment

First understand the concept of network latency: Round Trip Time (RTT). We call RTT the Round Trip Time of sending a packet from the browser to the server, and then returning the data treasure from the server to the browser. RTT is an important indicator of network performance.

So how many RTTS does it take to establish a TCP connection? So let’s figure it out.

We know that HTTP/1 and HTTP/2 both use TCP for transmission, while HTTPS requires TLS for secure transmission. TLS also requires a handshake process, which requires two handshake delay processes.

When establishing a TCP connection, three handshakes are required to confirm the connection with the server, that is, 1.5 RTT is consumed before data is transmitted.
TLS1.2 and TLS1.3 are two versions of TLS. Each version takes different time to establish a connection, and generally requires 1 to 2 RTT. About HTTPS, we will discuss in detail in the security module later.

In short, we need to spend 3-4 RTTS before transferring data. If the browser and server are physically close to each other, an RTT may take less than 10 milliseconds, which means a total of 30-40 milliseconds. This may be acceptable to the user, but if the servers are far apart, an RTT may take more than 100 milliseconds, in which case the entire handshake takes 300 to 400 milliseconds, and the user will feel “slow” significantly.

3. TCP becomes rigid

We know that TCP has some disadvantages such as queue blocking and connection establishment delay, but we can’t solve these problems by improving TCP. It’s very difficult. There are two main reasons:

The first is the rigidity of the intermediate equipment.

To figure out what intermediate rigidities are, we need to figure out what intermediate rigidities are. We know that the Internet is a mesh structure composed of multiple networks. In order to ensure the normal operation of the Internet, we need to build various devices around the Internet, which are called intermediate devices.

There are many types of intermediate devices, and each device has its own purpose. These devices include routers, firewalls, NAT, and switches. They often rely on infrequently upgraded software that uses a lot of TCP features that are rarely updated once set.

So, if we upgrade TCP on the client side, but when packets of the new protocol pass through these intermediate devices, they may not understand the contents of the packet, and the data will be discarded. This is middleware ossification, which is a major barrier to TCP updates.

In addition to the rigidity of the intermediate device, the operating system is another reason for the rigidity of TCP. Because TCP is implemented through the operating system kernel, applications can only use it and cannot change it. Operating system updates often lag behind software updates, making it difficult to freely update TCP in the kernel.

4. QUIC agreement

HTTP/2 has some serious TCP related defects, but due to TCP’s rigidity, it is almost impossible to modify TCP itself to solve these problems, so the idea to solve the problem is to bypass TCP and invent a new transport protocol other than TCP and UDP. But this also presents the same challenges as modifying TCP because of the rigidity of intermediate devices, which only recognize TCP and UDP, and where the new protocol is adopted, the new protocol is also not well supported.

Therefore, HTTP/3 chooses a compromise method – UDP protocol, based on UDP to achieve similar to TCP multi-channel data flow, transmission reliability and other functions, we call this function QUIC protocol. For comparison of HTTP/2 and HTTP/3 stacks, you can refer to the following figure (HTTP/2 and HTTP/3 stacks) :

As you can see from the figure above, the QUIC protocol in HTTP/3 contains the following features.

It realizes the functions of flow control and transmission reliability similar to TCP. Although UDP does not provide reliable transmission, QUIC adds a layer on top of UDP to ensure reliable transmission of data. It provides packet retransmission, congestion control, and other features found in TCP.
Integrated TLS encryption function. QUIC currently uses TLS1.3, which has more advantages than the earlier version of TLS1.3, the most important of which is the reduction in the number of RTT spent on the handshake.
The multiplexing function in HTTP/2 is realized. Unlike TCP, QUIC enables multiple independent logical data flows over the same physical connection (see figure below). Realize the separate transmission of data stream, and solve the problem of TCP squadron head blocking.
Realize the quick handshake function. Since QUIC is udP-based, it can use 0-RTT or 1-RTT to establish connections, which means that QUIC can send and receive data as quickly as possible, which can greatly improve the speed of first page opening.

QUIC protocol multiplexing:

Write in the last

Learning resources from Geek Time – Teacher Li Bing “Browser working principle and Practice”. Next, let’s check in every day