preface

After learning the relevant knowledge of HTTP on the video screen, I felt it necessary to write an article to record it, to deepen my impression and facilitate my future reference

Origin of HTTP

The origin of HTTP has to be explained by Tim Berners-Lee, the father of the World Wide Web, who proposed the HTTP protocol in October 1990, and in 1991 Tim published an article describing the proposed HTTP protocol, explaining the implementation of the proposed HTTP protocol, although not published by any agency, But it has been referred to as the original version of HTTP, http0.9

Http0.9 version

The http0.9 implementation was very simple, supporting only simple GET requests and transmitting only plain text, with no request headers or response headers, because at that time the simple usage scenarios were good enough for academic communication

http1.0

However, with the advent of the Internet, it became clear that HTTP 0.9 was no longer sufficient, and that we needed a more functional version of HTTP to meet the needs of everyday use scenarios, so HTTP 1.0 was ready. The new features of HTTP 1.0 are as follows

  • Added request and response headers
  • Supports multiple types of file transfers, not just plain text
  • Added the POST and HEAD request methods
  • Added caching mechanism and identity authentication

disadvantages

Although http1.0 is more flexible than http0.9, there are some disadvantages as follows

  • Http1.0 is short-connected, meaning that each requested connection requires three handshakes and four breakups, which is performance consuming and increases latency
  • Http1.0 does not have the host attribute. If a physical machine is bound to multiple VMS, it cannot distinguish the corresponding VMS to make the corresponding logic

http1.1

Due to the above shortcomings and the need for a more complete version of HTTP as the Internet continues to evolve, HTTP1.1 came into being. The flexibility of HTTP is as follows

  • Add Connect:keep-alive, so that each TCP connection is not immediately disconnected. You can set the connection to be disconnected at a fixed time or at a fixed amount of concurrency, so that multiple requests reuse the same TCP connection, reducing the number of RTTS and delay. RTT represents the round trip time between the client sending the data packet and the receiving server returning the data, that is, one TCP three connections is 1.5 RTT.
  • Added the host field to distinguish between different VM domain names, thus implementing different logic
  • Added the range field to allow range requests
  • Improved the caching mechanism and added eTAG, last-Modified fields

disadvantages

Although Http1.1 is excellent, there are still some problems, mainly as follows

  • Bandwidth will not be fully utilized

Bandwidth refers to the maximum number of bytes that a network can send or receive

Since Http1.1 can open multiple TCP connections, and Chrome can open up to six, each TCP will demand bandwidth resources. If the bandwidth resources are insufficient, the request or response will be slow. If the request is important text, such as JS or CSS, then the home page rendering will be delayed. And TCP has a characteristic of slow start, TCP transmission speed is not stable all at once, it is a slow process, just like the car start speed is slowly increased after a period of time, the speed will be stable, the actual transmission speed of 100M broadband is 12.5m /s, but the first few requests may be only 2.5m /s. Bandwidth is not being fully utilized.

  • Although there is a long connection, the return of the server is sequential, so that if the response to a request is not returned for some reason, it will block the subsequent response, causing queue blocking
  • Requests can only be initiated by the client, and the server cannot push messages
  • There are many repeated fields in the request header and response header, which wastes traffic
  • Request and response headers are sent without compression
  • Requests are in plain text and are not secure

http2.0

In order to solve this problem, Internet technology advocates have introduced Http2.0 (based on the SPDY protocol, which is a new protocol introduced by Google to address the shortcomings of Http1.1). However, he gave up and devoted himself to the construction of Http2.0. Here’s a look at the problems http2.0 solves

  • Http1.1 is a problem of TCP connection too much, bandwidth preemption and insufficient utilization, so HTTP2.0 is a domain name below, only establish a TCP connection, the domain name below all requests and responses are completed in this TCP, a good solution to the above problem

  • The biggest problem of HTTP1.1 is the queue head blocking, so HTTP2.0 in order to solve this problem, introduced multiplexing, is a TCP connection within the parallel send request and parallel receive response, then how to ensure that not out of order is a key problem, in order to solve this problem, is also the core technology of HTTP2.0 binary frame came into being. The basic principle of binary frame rate is as follows:

    Set up a binary minute hand frame layer on top of the TSL layer

    2: after the parallel request is processed by binary frame layer, a small data stream is generated. This data stream has been processed by binary to generate 0,1 bytes, and a frame ID is generated for each data stream and sent to the server

    3: The server takes the data stream and combines it according to the frame ID to generate the complete information. After processing, it sends the response data in parallel

    4: Response data through the binary frame layer, and will be processed to generate the corresponding frame ID, the browser gets these data and continues to combine the complete response information according to the frame ID

  • Http2.0 supports header compression, reducing traffic waste

  • Http2.0 forces encrypted transmission to ensure transmission security

  • You can set the priority, so that when sending in parallel, you can set the priority of the data returned

disadvantages

Http2.0 has done well, is the wisdom of the Internet enabler of implementation, there is a problem, however, what problems or congestion, because http2.0 is to establish a TCP connection, so the request of the parallel data packet is within this a TCP, because TCP mechanism of safe and reliable, once a request packet loss, The response will stop, requiring the browser to send a new packet, and all requests will be blocked, even worse than HTTP1.1 when tested, if there is too much data lost. Because HTTP1.1 TCP is multiple, one block will not affect the other TCP, HTTP2.0 TCP is one, once the loss of multiple request packets, the need for multiple requests to retransmit packets, the response of other requests will be blocked, so although HTTP2.0 has done a lot of optimization, but the most afraid is the request loss of packets

http3.0

Http3.0 is a protocol that is still being planned, and it will take some time to promote it. As mentioned above, both Http1.1 and HTTP2.0 will have TCP problems. It is not realistic to change TCP, because the change may cause other problems, so Http3.0 has a different idea. The traditional TCP protocol is changed to UDP protocol, so that we can avoid the disadvantages of TCP, but only using UDP is not enough, because UDP has disadvantages, so we need to use UDP, but also to meet the reliability of the incoming data, security and other issues. Here is an Http3.0 concept mapThe QUIC layer is added on top of the UDP layer. A lot of logic is done in this layer to solve the disadvantages of UDP itself. It can be said that HTTP3.0 is a perfect version of the protocolIntroduction to http3.0 articleYou can learn more about it

The HTTPS protocol

HTTPS protocol is based on THE TCP protocol and establishes a layer of SSL protocol to encrypt the transmitted packets. Before explaining HTTPS encryption, we need to talk about symmetric encryption and asymmetric encryption.

Symmetric encryption

Symmetric encryption means that the exchange of information is completed with a key, that is, encryption and decryption are used with a key, this encryption method is relatively simple, fast. But it is relatively unsafe

Asymmetric encryption

Asymmetric Encryption Many articles on the Internet talk about encryption in which the exchange of information requires two keys, a public key and a private key. This is true, as long as the exchange of information requires two keys can be called asymmetric encryption. However, throughout the course of asymmetric encryption, there are supposed to be two pairs of public and private keys (that is, four keys, two keys each for the client and two keys for the server, so many articles on the Internet say asymmetric encryption is two-key encryption). Asymmetric encryption is still a little around, in the following words in their own description, the process is as follows

  • The client sends a request and provides its public key to the server. The client keeps its private key
  • The server obtains the public key of the client, encrypts its own public key (server) with the public key of the client, and returns it to the client
  • Because the public key of the server is encrypted with the public key of the client, only the private key of the client can be decrypted. After decrypting the public key of the server, the client can obtain the public key of the server
  • Now the client has the server’s public key and its own private key, and the server has the client’s public key and its own private key
  • When the client and server transmit the data, they encrypt it with each other’s public key and decrypt it with each other’s private key, thus achieving security

Asymmetric encryption is more complex and takes more time, but none of the above is the most secure because we often have an attack called a man-in-the-middle attack

Man-in-the-middle attack

Man in the middle attack is an attack that can intercept the content transmitted by the client or the server and modify it. For symmetric encryption, a key transmission is required, which is easy to be intercepted by a third-party server to obtain information. For asymmetric encryption while intercepted by the client or server public key useless, because the public key encryption is need to be decrypted, however, the middle attack are generally not very bad, he can intercept your public key, for their own public key, then you use the public key is an intermediary, on his own, with the private key can decrypt.

The CA certificate

Since HTTPS is supposed to be secure, the encryption process is certainly not that simple. HTPPS encryption is actually more complex, need to use the CA certificate, why said CA certificate is safe, then it is necessary to look at his entire process.

  • The server first needs to go to the CA certification agency to buy a certificate, this step can not be faked, just like you go to Taobao to buy things, the business will not be replaced by others, when buying a certificate, you need to provide their public key, domain name and other information. CA certification body will also take the responsibility of the business, online and offline to confirm your legitimacy, whether it is a formal company or collective, and then will give you a certificate, all verification, confirmation after the end, will give you a certificate. So what does a certificate contain?

    1: The certificate contains the domain name, public key, CA certification authority and other information of the purchaser

    2: Include a digital signature, this is very important, digital signature is how to come? Firstly, the CA generates a summary (the algorithm is not reversible) of some information of the above 1, and then uses the private key of the CA to encrypt the summary and generate a signature

  • The server sends the certificate to the client (more on this later), and the client verifies that the certificate is valid. Once the certificate is valid, it can obtain the server’s public key, which is the prerequisite for generating the HTTPS session password.

How to ensure the validity of the certificate

Our browser already works with a number of certification authorities, many of whose root certificates remain in our operating system. If not, you have to upload the certificate. Let’s say our operating system already has a root certificate from a certification authority. When he receives the CA certificate sent by the server, he gets the public key of the certificate according to the root certificate. If he can decrypt the CA certificate, then the certificate is legitimate. Then the manin attack can not obtain the root certificate? Already said above, issue certificate not so easy, need online offline investigation to confirm, need not worry about middleman forgery certificate so.

How to ensure that the contents of the certificate are not modified

After we said above, the certificate of legal guarantee, while hackers won’t replace the certificate, but he still has the ability to alteration certificate, so this time we how to ensure that the certificate has not been tampered with, this time the inside of the certificate will play a role, the broker if changed the content of the certificate, according to the client’s own root certificate when generating a summary, If the summary and the server to send to come over the different is the alteration, then changed certificate information among some people would say, and then put the changed myself, this is not possible, because once you want to move the contents of the digital signature, first you have to have the corresponding CA certificate public key can decrypt the signature to replace the, generally do not have, because it says you want to get the root certificate, CA authentication is required, even if you get the public key, replace the abstract, but the signature is generated by the CA’s private key encryption digest, CA’s private key is the general hacker certainly does not have, so this situation we think is impossible to achieve.

HTPPS handshake

We can get the public key of the server in a very secure situation, so HTPPS is not using this public key as symmetric encryption, encryption and transmission of information? No, it’s more complicated than that. The CA certificate is only part of the HTTPS encryption process, so we will focus on what this process looks like

explain

Give a general explanation of the above diagram

  • The first RTT of the HTPPS request is a normal TCP three-way handshake, provided that https://, is not entered
  • Because the HTPPS website will not necessarily enter HTTPS when others enter the site name, this time the background needs to set 302 jump
  • Since the default HTTPS port is 443, the TCP three-way handshake has to be re-established
  • TSL handshake stage one
  • The second stage of the TSL handshake

The first stage of the TSL handshake

Client Hello

During the first stage of the TSL handshake. The Client first initiates a Client Hello request

  • Protocol version supported by the client
  • A 32-bit random number used for later session key generation
  • Supported encryption suite, very important

Server Hello

After receiving the Client Hello, the server must reply

  • Confirm the protocol version of the communication
  • A 32-bit random number used for later session key generation
  • It’s important to confirm the encryption suite that different encryption suites have different actions

Server Certificate

This process is the process in which the server sends the certificate to the client. After receiving the certificate, the client first verifies the validity of the certificate and then checks whether the certificate is modified. If both are normal, then the contents of the certificate will be retrieved, the public key of the server will be obtained, and the above question will be answered here. The public key is not the key for the final session encryption, but the key for generating the final symmetric encryption. The public key here is actually an asymmetric encryption public key, the purpose is to encrypt the pre-master key, so HTTPS encryption is actually a process of asymmetric encryption and symmetric encryption, the final communication encryption is actually a symmetric encryption.

Server key Exchange

This process is optional and depends on the encryption suite that is finally confirmed. There are two types of key exchange: RSA and ECDHE. If the encryption suite does not belong to DHE_DSS, DHE_RSA, ECDHE_ECDSA, and ECDHE_RSA, the process will be skipped directly. If so, additional data is sent to confirm the key. The Tengine HTTPS handshake process is the most complete in history. The Tengine HTTPS handshake process is the most complete in history

Server Hello Done

This tells the client that the server has finished sending this time and is waiting for the client to respond

The second stage of the TSL handshake

Client key Exchange

In this phase, if the client is skipping the Server Key Exchange phase, the client will combine the random number sent by its last request with the random number sent by the Server last time, do a pre-master key, and then use the Server’s public key to encrypt (at this time, the Server’s public key is used). And then sent to the server. If you did not skip the Server Key Exchange, then the generation of this pre-master key, see the two articles linked above.

Change Cipher Spec

This protocol tells the other party that I have calculated the session key (as described below, the client has the pre-master key, the client random number, the server random number, and the ability to create the session key).

Client Finished

Tell the server that my message has been sent

Change Cipher Spec Message

Tell the client that I will use the pre-master key you sent (the server has its own private key to decrypt), add your random number and my random number to generate the same session key

Server Finished

Tell the client that I have sent it here

Difference between pre-master key, master key and session key

  • The pre-master key is generated primarily on the second phase of the TSL handshake client.
  • The master key is the pre-master key, the client random number, and the server random number generated by the PRE function
  • The session key is generated by the master key, the client random number, and the server random number using the PRE function. The symmetric key of TSL is the session password

Refer to the article

History of HTTP (HTTP0.9, HTTP1.0, HTTP1.1, HTTP2, HTTP3) comb notes

The most complete interpretation, practice and debugging of Tengine HTTPS principles in history

The HTTPS handshake process is described in detail