preface

We know that HTTP is the most important and most used protocol in browsers. It is not only the communication language between browsers and servers, but also the cornerstone of the Internet. As browsers continue to evolve and adapt to new technologies, the best way to learn HTTP is to see it through the eyes of browsers: HTTP/1, HTTP/2, HTTP/3

If this article helps you, ❤️ follow + like ❤️ to encourage the author, the article public account first, followThe front nine southThe first time to get the latest articles ~

The history of HTTP

HTTP is the main communication protocol between browser and server

In the 1960s, the ADVANCED Research Projects Agency of Defense (ARPA) established the ARPA Network, which is considered to be the origin of the Internet. In the 1970s, researchers invented the famous TCP/IP protocol based on the practice and thinking of ARPA network. This protocol has a good hierarchical structure and stable performance, and in the mid-1980s into the UNIX system kernel, prompting more computers to connect to the network.

In 1989, Dr Tim Berners-Lee published a paper outlining the idea of a hyperlinked document system on the Internet. In the article, he identified three key technologies: URI, HTML, and HTTP.

HTTP / 0.9

HTTP (HyperText Transfer Protocol) formally born in 1991, the World Wide Web Consortium (W3C) and Internet Engineering Task Force (IETF) developed the HTTP 0.9 standard. The original purpose of the protocol was to transfer HTML, a hypertext content, and only support GET requests.

Protocol defines the communication mode in which the client initiates a request and the server responds to the request. Therefore, the request message contains only one line:

GET + The requested file path

Upon receiving the request, the server returns an HTML document encoded in an ASCII character stream.

Request: GET /index.html

< HTML >Hello HTTP/0.9

Process:

  • The CLIENT establishes a TCP connection with the server.
  • The client sends a GET request to the server, requesting data for the index.html page.
  • The server closes the TCP connection after sending the response.

Features: Simple, one request requires one connection.

HTTP/0.9 is simple, but it fully validates the feasibility of Web services

  • First it has only one command GET.
  • It has no headers and other information describing the data. Because the request at this point is very simple, the purpose it needs to achieve is very simple and there are not as many data formats.
  • After sending the content, the server closes the TCP connection. One thing to note here is that a TCP connection is not the same as an HTTP request. HTTP requests are not the same as TCP connections. One HTTP request can be sent over a TCP connection, and one TCP connection can send many HTTP requests (HTTP/0.9 can’t do this, but HTTP/1.1 can, and is optimized for HTTP/2 to improve the efficiency of HTTP transfers and server performance).

HTTP / 1.0

With the development of the Internet, HTTP/0.9 can no longer meet the needs of users. Browsers want to use HTTP to transfer scripts, styles, images, audio and video files and other different types of files. Therefore, HTTP was updated in 1996:

  • New methods such as HEAD and POST have been added
  • Added a response status code to mark possible error causes
  • The concept of protocol version number is introduced
  • The HTTP header concept was introduced to make HTTP more flexible in handling requests and responses
  • Data transmission is no longer limited to text

Request: The first line requests command + version information, followed by multi-action header information

GET / HTTP / 1.0
User-Agent: Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_10_5)Accept: * / *Copy the code

Response: Response header information + blank line (\r\n) + data section

HTTP / 1.0 200 OK
Content-Type: text/plain
Content-Length: 2345
Expires: Thu, 05 Dec 2020 16:00:00 GMT
Last-Modified: Wed, 5 August 2020 15:55:28 GMT
Server: Apache 0.84

<html>
   <body>Hello World</body>
</html>
Copy the code

The main drawback of HTTP/1.0 is that, like HTTP/0.9, only one HTTP request can be sent per TCP connection, and when the server sends a response, the connection is closed. If new data needs to be requested later, the TCP connection needs to be established again. However, the three-way handshake cost of establishing a TCP connection is relatively high, and the speed of sending data is relatively slow at the initial stage of the TCP connection, resulting in a stage of slow start and congestion avoidance. In extreme cases, if a small amount of data is requested each time, but the requests are frequent, the connection needs to be established and then disconnected each time a small amount of data is requested.

To solve this problem, a non-standard Connection header field was used in version 1.0. If the client sends a Connection: keep-alive message in the header, the server will not disconnect the TCP Connection after sending the response data, thus achieving the purpose of reusing the same TCP Connection. But because they are not standard fields, different implementations can cause them to behave inconsistently, so this problem cannot be fundamentally solved.

The core change in HTTP/1.0 is the addition of headers, whose contents are set in the form of key-value pairs. The request header uses the Accept field to tell the server what Type of file to Accept, and the response header uses the Content-Type field to tell the browser what Type of file to return. The header field is not only used to solve the problem of different types of file transfer, but also can realize many other functions such as cache, authentication information and so on.

HTTP/1.0 is not a “standard”, but a reference document with no actual binding force.

HTTP / 1.1

With the rapid development of the Internet, HTTP/1.0 can not meet the needs of users, the most fundamental problem is the link problem, HTTP/1.0 each communication, need to go through the establishment of a connection, data transmission and disconnection of three stages. When a page references many external files, the process of establishing and disconnecting can add a lot of network overhead.

To address HTTP/1.0, HTTP/1.1, introduced in 1999, had the following features:

  • Long connection: TCP connection multiplexing is introduced, that is, one TCP is not closed by default and can be multiplexed by multiple requests
  • Concurrent connections: Requests for a domain name allow multiple long connections to be allocated (alleviating the “queue head blocking” problem in long connections)
  • Pipelining was introduced, a TCP connection that can send multiple requests simultaneously. (The order of the response must be the same as the order of the request, so it is not often used)
  • Added new methods such as PUT, DELETE, OPTIONS, and PATCH
  • – Added some cached fields (if-modified-since, if-none-match)
  • The range field was introduced in the request header to support breakpoint continuation
  • Allows response data to be chunked, facilitating large file transfers
  • Force the Host header to make Internet hosting called possible

HTTP pipelining

It refers to in a TCP connection, multiple HTTP requests can be parallel, the client request results back don’t have to wait for the last time, can send out the next request, but the server must be in order to receive the order of the client request of echo response as a result, to ensure that the client can be enough to distinguish between the response content of each request.

As the web evolves, HTTP 1.1 still exposes some limitations:

  1. Although joinkeep-aliveYou can reuse some connections, but you still need to create multiple connections in domain name sharding connection, consume resources and bring performance pressure to the server.
  2. pipelingOnly partially resolved queue head blocking (HOLB). HTTP 1.1 attempt to usepipelingTo address header blocking, where the browser can issue multiple requests (same domain name, same TCP connection) at once. Pipeling requires sequential returns, so if the first request is time-consuming (such as processing a large image), subsequent requests will wait for the first request to be processed.
  3. The protocol overhead is high, and there is no corresponding compression transmission optimization scheme. When HTTP/1.1 is used, the content carried in the header is too large, which increases the cost of transmission to a certain extent. Moreover, the header is basically unchanged every time it is requested, especially increasing user traffic on the mobile end.

HTTP/1.1 reduces the performance cost of mass creation/disconnection through long connections, but its concurrency is limited in two ways:

  • When persistent connections are used in HTTP/1.1, only one request can be processed at a time in a connection. Other requests are blocked until the current request is completed, which is called queue head blocking
  • To reduce the pressure on the server, the browser limits the number of HTTP connections under the same domain name to 6 to 8. In order to solve the quantity limitation, there isDomain name subdivisionTechnology, in fact, the resource domain, the resource under the different domains (such as secondary subdomain), so that you can create a connection and request for different domain name, break through the limit in a clever way, but the abuse of this technology can also cause many problems, such as each TCP connection itself takes DNS queries, three-step handshake, slow start, It also takes up extra CPU and memory, and too many connections to the server can cause network congestion, traffic congestion, etc.

SPDY: Optimization for http1.x (improved HTTP/1.1)

In 2012, Google proposed a SPDY solution to optimize http1. X request latency and solve the security of http1. X, as follows:

  1. Reduce latency: For HTTP’s high latency problem, SPDY gracefully takes multiplexing. Multiplexing can share a TCP connection through multiple request streams, which solves the problem of HOL blocking, reduces latency and improves bandwidth utilization.
  2. A new problem with multiplexing is that key requests can be blocked on a shared basis. SPDY allows you to prioritize each request so that important requests get a response first. For example, when the browser loads the home page, the HTML content of the home page should be displayed first, and then all kinds of static resource files and script files are loaded, so as to ensure that users can see the content of the web page in the first time.
  3. Header compression: The http1. x header mentioned earlier is often redundant. Choosing the right compression algorithm can reduce the size and number of packets.
  4. HTTPS based encryption protocol transmission: greatly improves the reliability of transmitted data.
  5. Server push: Enables the server to actively push resource files to the client. Of course, the client also has the right to choose whether or not to receive.

HTTP/2

HTTP/2, officially released in 2015, uses binary data instead of ASCII to improve transmission efficiency.

When sending a request, the client encapsulates the contents of each request into different numbered binary frames, which are then sent to the server simultaneously. When the server receives the data, it merges the frames with the same number into the complete request information. Similarly, the server returns the result and the client receives the result following the process of splitting and combining the frames.

With binary framing, for the same domain, the client only needs to establish a connection with the server to complete the communication requirements, this way of using a connection to send multiple requests is called multiplexing. Each path is called a stream.

Features:

  • Binary protocol: HTTP/1.1 has text headers, and data can be either text or binary. HTTP/2 versions of the header and data are binary and collectively referred to as’ frames’

  • Multiplexing: Deprecating the HTTP/1.1 pipeline, the client and server can send multiple requests and responses simultaneously over the same TCP connection without ordering them. Because the server does not process the responses sequentially, the “head jam” problem is avoided.

  • Header information compression: use a special algorithm to compress the header, reduce the amount of data transmission, mainly through the server and client at the same time maintain a header information table, all the header information in the table will have the corresponding record, and there will be an index number, so that after only need to send the index number

  • Server Active push: Allows the server to actively push data to customers

  • Data flow: Because HTTP/2 packets are not sent sequentially, two packets connected within the same TCP connection may belong to different responses, so there must be a way to distinguish which response each packet belongs to. In HTTP/2, all packets of each request or response are called a stream, and each stream has a unique numbered ID. The numbered ID of the request stream is odd, and the numbered ID of the response stream is even. Each packet is sent with the ID of the corresponding data stream, so that the server and client can partition which data stream they belong to. Finally, the client can specify the priority of the data flow, and the higher the priority, the faster the server will respond.

Disadvantages:

While HTTP/2 solves many of these problems, similar problems still exist at the level of TCP, the protocol on which the Web is still built. When a TCP packet is lost during transmission, the receiver cannot acknowledge the incoming packet until the server resends the lost packet. Because TCP is not designed to follow advanced protocols like HTTP, a single missing packet blocks the flow of all ongoing HTTP requests until the missing data is re-sent. The problem is particularly acute with unreliable connections, not uncommon in the age of ubiquitous mobile devices.

HTTP/3

Because HTTP/2 uses binary framing multiplexing, only one TCP connection is usually used for transmission, and all subsequent data is blocked in the event of packet loss or network interruption.

The problems of HTTP/2 cannot be solved by the application layer alone, so new iterations of the protocol must update the transport layer. However, creating a new transport layer protocol is not easy. Transport protocols require support from hardware vendors and deployment by most network operators to become widespread.

Fortunately, there is another option. UDP is as widely supported as TCP, but the former is simple enough to serve as the basis for custom protocols that run on top of it. **UDP packets are once and for all: there is no handshake, persistent connection, or error correction. ** The main idea behind HTTP3 is to abandon TCP in favor of the UDP-based QUIC (Fast UDP Internet Connection) protocol.

Unlike HTTP2, which technically allows unencrypted communication, QUIC strictly requires encryption before a connection can be established. In addition, encryption applies not only to the HTTP payload, but also to all data flowing through the connection, thus avoiding a host of security issues. Establishing persistent connections, negotiating encryption protocols, and even sending the first batch of data are all consolidated into a single request/response cycle in QUIC, greatly reducing connection wait time. If the client has locally cached password parameters, it can re-establish a connection to a known host with a simplified handshake.

To solve the transport-level head blocking problem, the data transmitted over the QUIC connection is divided into streams. A flow is a transient, independent “subconnection” in a persistent QUIC connection. Each flow handles its own error correction and delivery assurance, but uses connection global compression and encryption properties. Each HTTP request made by the client runs on a separate stream, so missing packets does not affect the data transfer of other streams/requests.

Comparison between versions

Protocol version The core of the solution The solution
0.9 HTML file transfer Establish the communication flow of client request and server response
1.0 Different types of file transfer Set the header field
1.1 It is expensive to create or disconnect a TCP connection Establish a long connection for reuse
2 Limited concurrency Binary framing
3 TCP packet loss is blocked Using UDP protocol
SPDY Http1.x request delay multiplexing

HTTP three handshakes 🤝 and four waves 🙋♂️

A complete HTTP contains requests and responses, so you need to create a connection channel over TCP

A TCP channel can pass multiple HTTP requests

Generally, a three-way handshake is required to confirm the connection process to avoid network resource consumption and thus create a TCP connection

Three-way handshake

First handshake: The client sends a SYN packet (SYN = X) to the server and enters the SYN_SEND state for confirmation.

Second handshake: After receiving a SYN packet, the server must acknowledge the client’s SYN (ACK = X +1) and send a SYN packet (ACK = Y). In this case, the server enters the SYN_RECV state.

Third handshake: After receiving the SYN+ACK packet from the server, the client sends an ACK packet (ACK = Y +1) to the server. After the packet is sent, the client and the server enter the ESTABLISHED state to complete the three-way handshake.

The packet transmitted during the handshake does not contain data. After three handshakes, the client and server start data transmission. Ideally, once a TCP connection is established, it is maintained until either of the communicating parties voluntarily closes the connection.

Four times to wave

Similar to the “three-way handshake” for establishing a connection, the “four-way handshake” is required to disconnect a TCP connection.

First wave: The active closing party sends a FIN to shut down the data transmission from the active to the passive closing party, that is, the active closing party tells the passive closing party: I will not send you any more data (of course, if the data sent before the FIN packet is not received, the active closing party will resend the data), but the active closing party can still accept the data at this time.

Second wave: After receiving a FIN packet, the passive closing party sends an ACK with the received sequence number +1 (the same as SYN, one FIN occupies one sequence number).

Third wave: The passive closing party sends a FIN to close the data transfer from the passive closing party to the active closing party, which tells the active closing party that MY data has also been sent and I will not send you any more data.

Fourth wave: After the active closing party receives the FIN, it sends an ACK to the passive closing party and confirms that the serial number is +1. Thus, the four waves are completed.

Differences between HTTP/1.0 and HTTP/1.1

A long connection

HTTP 1.1 supports both long-Connection and Pipelining processing, which delivers multiple HTTP requests and responses over a SINGLE TCP connection, reducing the cost and latency of establishing and closing connections. Connection: keep-alive is enabled by default in HTTP1.1; HTTP1.0 uses short connections by default, and specifies that the browser and the server only maintain a short connection, each browser request to establish a TCP connection to the server, the server completes the request immediately disconnect TCP connection, the server does not track each client and does not record past requests. To establish a long Connection, include a Connection: keep-alive header field in the request message, and a Connection: Keep-alive header field in the response message if the server is willing to maintain the Connection.

Cache handling

HTTP1.0 mainly used if-Modified-since and Expires in the header as the cache criteria. HTTP1.1 introduces more cache control policies such as Entity Tag, If-unmodified-since, if-match, if-none-match, etc.

  • Expires: The browser uses a local cache for a specified expiration time to indicate when a document should be considered expired and not cached again, at GMT. Example: Expires: Thu, 19 Nov 1981 08:52:00 GMT

  • Last-modified: The Last time a request object was Modified to determine whether the cache has expired is usually generated by the file’s time information

  • Date: indicates the Date and time when the message is generated, that is, the current GMT time. For example, Date: Sun, 17 Mar 2013 08:12:54 GMT

  • If-modified-since: Time when the resource was Last Modified by the client, compared with last-modified on the server

  • Set-cookie: used to send cookies to clients. For example: the Set – cookies: PHPSESSID = c0huq7pdkmm5gg6osoe3mgjmm3; path=/

  • Pragma:no-cache: the client uses this header field to state that the requested resource cannot be fetched from the cache, but must be fetched from the source.

Bandwidth optimization

HTTP1.0, there are some waste of bandwidth, such as the client only needs a part of an object, and the server will send the whole object, and does not support resumable breakpoint function, HTTP1.1 in the request header introduced in the range header field, which allows only a part of the resource request. The return code is 206 (Partial Content), which makes it easy for developers to make the most of bandwidth and connections.

Error notification management (Status code)

Add 24 error status response codes in HTTP1.1. For example, 409 (Conflict) indicates that the requested resource is in Conflict with the current state of the resource. 410 (Gone) Indicates that a resource on the server is permanently deleted.

The Host header processing

HTTP1.0 assumes that each server is bound to a unique IP address, so the URL in the request message does not pass the hostname. However, with the development of virtual hosting technology, there can be multiple virtual hosts (multi-homed Web Servers) on a physical server, and they share the same IP address. HTTP1.1 both Request and response messages should support the Host header field, and an error (400 Bad Request) will be reported if there is no Host header field in the Request message.

Differences between HTTP/2 and SPDY

  1. HTTP2.0 supports plaintext HTTP transport, while SPDY enforces the use of HTTPS
  2. The HTTP2.0 header compression algorithm uses HPACK rather than DEFLATE, which is used by SPDY

Differences between HTTP/1.1 and HTTP/2

  • Binary Format (http1.x) is a text-based Format parsing protocol, which has a natural deficiency. The representation of text is diverse, and many scenarios must be considered for robustness. Binary Format is a combination of 0 and 1. Based on this consideration HTTP2.0 protocol parsing decision to adopt binary format, implementation is convenient and robust.
  • MultiPlexing means that each request is used as a connection sharing mechanism. A request corresponds to an ID. In this way, a connection can have multiple requests. The requests of each connection can be randomly mixed together, and the receiver can assign the requests to different server requests according to the REQUEST ID.
  • HTTP2.0 uses encoder to reduce the size of the headers that need to be transferred. The communication parties cache a table of header fields. This avoids duplicate header transmission and reduces the size of the required transmission.
  • Server push. Like SPDY, HTTP2.0 has server push functionality.

The difference between HTTP/2 multiplexing and HTTP/ 1.x long connection multiplexing

  • HTTP/1.X a request-response, establish a connection, and close when used up; Each request establishes a connection;
  • HTTP/1.1 Pipeling (Pipeling) is a multi-threaded process that queues up multiple requests to perform any of the following operations at any time.
  • HTTP/2 Multiple requests can be executed simultaneously in parallel on a single connection. A request task is time-consuming and does not affect the normal execution of other connections.

Performance comparison between HTTP/1.1 and HTTP/2

HTTP1.1 vs. HTTP2: Server push vs. Server push: Server push vs. Server push

You can see that HTTP/1.1 and HTTP/2 respectively used HTTP/1.1 and HTTP/2 to load the same large image consisting of several small images: HTTP/1.1 took 7.41 seconds, while HTTP/2 only took 1.47 seconds. HTTP2 is nearly five times faster than HTTP/1.1. Because many smaller diagrams are requested to load the large one, HTTP/1.1 uses serial requests and is much slower than HTTP/2, which uses parallel requests.

Differences between HTTP and HTTPS

HTTPS

HTTPS is the secure version of HTTP, a security-oriented HTTP channel. HTTPS is based on SSL. SSL is located between TCP/IP and various application-layer protocols and provides security support for data communication.

The SSL protocol is divided into two layers:

  • The SSL Record Protocol (SSL) is based on reliable transport protocols (such as TCP) and supports basic functions such as data encapsulation, compression, and encryption for high-level protocols.

  • The SSL Handshake Protocol is based on the SSL recording Protocol. It is used for identity authentication, encryption algorithm negotiation, and encryption key exchange between communication parties before data transmission.

The advantages of the HTTPS

  • Using THE HTTPS protocol, users and servers can be authenticated to ensure that data is sent to the correct clients and servers.

  • HTTPS is a network protocol that uses SSL and HTTP to encrypt transmission and authenticate identity. It is more secure than HTTP and protects data from theft and modification during transmission, ensuring data integrity.

  • HTTPS is the most secure solution under the current architecture, and while it is not absolutely secure, it significantly increases the cost of man-in-the-middle attacks.

The disadvantage of the HTTPS

  • HTTPS handshake takes a long time and lengthens the page loading time.

  • HTTPS connection caching is not as efficient as HTTP, increases data overhead, and even compromises existing security measures.

  • The HTTPS protocol is secure in scope and offers little protection from hacks, denial-of-service attacks, and server hijackings.

  • SSL certificates usually need to be bound to IP addresses. Multiple domain names cannot be bound to the same IP address. IPv4 resources cannot support such consumption.

  • After HTTPS is deployed, extra computing resources are consumed. For example, the ENCRYPTION algorithm of SSL and the number of SSL interactions consume computing resources and server costs.

  • The HTTPS protocol also has a limited range of encryption. Most importantly, the SSL certificate credit chain system is not secure, especially in cases where some countries can control the CA root certificate, man-in-the-middle attacks are just as feasible.

Differences from HTTP

  • HTTP is a hypertext transmission protocol, and information is transmitted in plain text. HTTPS is a secure SSL encryption transmission protocol.
  • The HTTP port uses 80, and the HTTPS port uses 443.
  • HTTP connections are simple and stateless. HTTPS is a network protocol that uses SSL and HTTP to encrypt transmission and authenticate identity. It is more secure than HTTP. Stateless means that packets are sent, transmitted, and received independently of each other. Connectionless means that neither party maintains any information about the other for long.
  • HTTPS requires a certificate, which is rarely available for free.

Recommended reading

  • Super comprehensive summary of Vue interview knowledge points, help golden three silver four
  • Common front-end sorting algorithms
  • A few tips for CSS performance optimization
  • Common front-end security problems and preventive measures
  • Why are big factories using GIF as a burying point?
  • Don’t just know about KFC, you should know about BFC, IFC, GFC and FFC
  • What is the difference between Promise, Generator, and Async?
  • In 2022, don’t you know the difference between an arrow function and a normal function?
  • From how to use to how to implement a Promise

The original address point here, welcome everyone to pay attention to the public number “front-end South Jiu”, if you want to enter the front-end exchange group to learn together, please click here

I’m Nan Jiu, and we’ll see you next time!!