Do I really know the difference between HTTP and HTTPS

Every time I saw a question like “What’s the difference between HTTP and HTTPS? I know that HTTPS is more secure than HTTP, but I don’t know why, or I don’t know many details. To find out, also in order to understand the knowledge of HTTP related to the system, while reading a wave diagram to HTTP, have to say this book is really easy to understand, awareness of many unclear knowledge (protocol, packet, a status code, the first field, identity authentication, resource cache and web attacks, etc.). If you want to learn more about HTTP, you can also read the Definitive HTTP Guide.

HTTP

HTTP, or Hypertext Transfer Protocol, is a data transfer protocol that specifies the rules for communication between clients and Web servers, and transmits Documents on the World Wide Web over the Internet. Its features are:

Stateless, each request will be closed after the end, each request is independent, its execution and results with the front of the request and after the request is not directly related, it will not be directly affected by the front of the request response, will not directly affect the back of the request response; In the server did not save the client’s status, the client must each with its own state request to the server, like “life only such as first”, such as the user needs to request a data, need to log in to access, request after the user login, because HTTP stateless as a result, the next would also like to request a user data, such as still need to log in again, This is annoying, so you need sessions and cookies for state management.
Plaintext transmission (unencrypted packets), why not encrypt communication is a disadvantage, this is because according to the working mechanism of TCP/IP protocol family, communication content can be accessed on all communication lines. No matter which corner of the world when the server and client communication, in this communication line on some network equipment, optical cable, computer, etc., can not be a personal private thing, so do not exclude a link will be malicious peep behavior. Even encrypted communications can be seen, just as unencrypted communications are. It’s just that if the communication is encrypted, it might be impossible to decipher the meaning of the message, but the message itself can still be seen after the encryption.
The identity of the communicating party is not verified, so it is possible to encounter camouflage. Requests and responses in HTTP do not acknowledge the communicator. That is, there are questions like “is the server really the host specified in the URI that sent the request, and is the response actually returned to the client that actually made the request?” In HTTP communication, anyone can initiate a request because there is no step to confirm the communicator. In addition, whenever the server receives a request, it will return a response no matter who it is (but only if the IP address and port number of the sender are not restricted by the Web server; No matter who sends the request, it will return the response. Therefore, there are the following risks if the communication party is not identified: 1. It is impossible to determine whether the Web server that sends the request to the target is the server that actually returns the response. It could be a disguised Web server; 2. It is not possible to determine whether the client to which the response is returned is the client that received the response as intended. It could be a disguised client; 3. You cannot determine whether the communicating party has access permission. Because some Web servers hold important information and only want to give specific users permission to communicate; 4. It is impossible to determine where or by whom the request came from; 5. Even meaningless requests will be accepted. Denial of Service (DoS) attacks on massive requests cannot be prevented.
The integrity of the message could not be proved. Thus, there is no way of knowing if the contents of a request or response have been tampered with until the request or response is received; In other words, there is no way to confirm that the request/response sent and the request/response received are the same.

HTTPS

HTTPS, short for Hyper Text Transfer Protocol Secure, provides a Secure layer called TLS (SSL). HTTPS and HTTP are both application layer protocols based on TCP (and UDP), but they are completely different. TCP uses port 80 and HTTPS uses 443.

HTTPS is not a new protocol at the application layer. The HTTP communication interface is replaced by the Secure Socket Layer (SSL) and Transport Layer Security (TLS) protocols. Typically, HTTP communicates directly with TCP. When SSL is used, it evolves to communicate with SSL first and then with SSL and TCP. In short, HTTPS is HTTP in the shell of THE SSL protocol.

HTTPS solves the following problems:

Trust host issues. A server that uses HTTPS must apply for a certificate from a CA (a digital certificate Authority in the position of a trusted third party for both the client and the server) that proves the type of use of the server. This certificate is signed by the CA so that the client knows that the server is secure. At present, almost all websites or systems such as online shopping and online banking are HTTPS based on key applications. Customers can trust the certificate and thus trust the host to ensure security.
Data leaks and tampering during communication all communication between the server and client is encrypted using HTTPS. The client and the server each have their own pair of asymmetric keys, one is called private key and the other is called public key. As the name implies, the private key cannot be known to anyone, while the public key can be released freely and anyone can obtain it. In public-key encryption mode, the sender uses the public key of the other party to encrypt the ciphertext. After receiving the encrypted message, the other party uses its private key to decrypt the encrypted message. In this way, there is no need to send the private key for decryption, and there is no need to worry about the key being eavesdropped and stolen by an attacker. Retrieving the original message from ciphertext and public keys is extremely difficult, because decryption involves evaluating discrete logarithms, which is not an easy task. At the very least, if you can factor a very large integer quickly, there is hope for password cracking. But it’s not realistic with the current technology.

Simply put: HTTP + authentication + encryption + integrity protection = HTTPS

HTTPS communication steps

Step 1: The Client sends a Client Hello packet to start SSL communication. The packet contains the specified VERSION of SSL supported by the client and the Cipher Suite list (encryption algorithm and key length).
Step 2: When SSL communication is enabled, the Server responds with Server Hello packets. As with the client, the message contains the SSL version as well as the encryption component. The server’s encryption component content is filtered from the received client encryption component.
Step 3: Then the server sends a Certificate packet. The message contains a public key certificate.
Step 4: The Server sends a Server Hello Done packet to notify the client that the INITIAL SSL handshake negotiation is complete.
Step 5: After the first SSL handshake, the Client responds with a Client Key Exchange packet. The message contains a random password string called pre-masterSecret used in communication encryption. The packet is encrypted with the public key in Step 3.
Step 6: The client sends a Change Cipher Spec packet. The packet prompts the server that the communication after the packet is encrypted with the pre-master secret key.
Step 7: The client sends a Finished packet. The packet contains the overall checksum of all packets so far connected. Whether the handshake negotiation can succeed depends on whether the server can decrypt the packet correctly.
Step 8: The server also sends a Change Cipher Spec packet.
Step 9: The server also sends a Finished packet.
Step 10: After exchanging Finished packets between the server and client, the SSL connection is established. Of course, the communication is protected by SSL. This is where application layer protocol communication starts, that is, sending HTTP requests.
Step 11: Application layer protocol communication, that is, sending HTTP responses.
Step 12: Finally disconnect from the client. When the connection is disconnected, the close_notify packet is sent. After this step, a TCP FIN packet is sent to close the communication with TCP.

Below is a diagram of the process. The figure illustrates the entire process of establishing HTTPS communication from using only server-side public key certificates (server certificates).

HTTPS encryption technology

The dilemma of shared key encryption

Encryption and decryption using the same key is called Common key crypto system, also known as symmetric key encryption. If the shared key is used for encryption, you must also send the key to the peer party. But how do you do it safely? When a key is forwarded over the Internet, if the communication is monitored the key can fall into the hands of an attacker, thus losing the purpose of encryption. You also have to secure the keys you receive. That is to say, there is a risk of eavesdropping when sending the key, and the other party cannot decrypt it without sending it. And if the key can be safely delivered, so can the data, then encryption loses its meaning.
Public key encryption using both keys

Public key encryption solves the difficulty of shared key encryption. Public-key encryption uses a pair of asymmetric keys. One is called a private key and the other is called a public key. As the name implies, a private key cannot be known to anyone else, whereas a public key can be freely distributed and available to anyone. In public-key encryption mode, the sender uses the public key of the other party to encrypt the ciphertext. After receiving the encrypted message, the other party uses its private key to decrypt the encrypted message. In this way, there is no need to send the private key for decryption, and there is no need to worry about the key being eavesdropped and stolen by an attacker. In addition, it is extremely difficult to recover the original information based on ciphertext and public keys, because the decryption process involves evaluating the discrete logarithm, which is not easy to do. At the very least, if you can factor a very large integer quickly, there is hope for password cracking. But it’s not realistic with the current technology.
HTTPS uses a hybrid encryption mechanism

Therefore, HTTPS uses a mixture of shared key encryption and public key encryption. If the key can be exchanged securely, it is possible to consider using public-key encryption only for communication. However, public key encryption is slower than shared key encryption. Therefore, we should make full use of their respective advantages and combine a variety of methods for communication. Public key encryption is used in the key exchange and shared key encryption is used in the subsequent stage of establishing communication exchange messages. The master secret generated in the preceding figure is the shared key, which is used to encrypt the packets exchanged later.

The problem of the HTTPS

Is HTTPS secure enough? There is no absolute security in the world. In 2014, for example, the Heartbleed bug spread across the world, affecting websites such as Yahoo and StackOverflow. But in general HTTPS is relatively secure for most people, at least more secure than HTTP.
Another problem with HTTPS is that it slows down when SSL is used

There are two types of SSL slowness. One is slow communication. On the other hand, the processing speed slows down due to the large consumption of CPU and memory resources.

Compared to using HTTP, network load can be 2 to 100 times slower. In addition to TCP connections and SENDING HTTP requests and responses, SSL communication is required, which inevitably increases the overall traffic volume.
The other point is that SSL must be encrypted. Encryption and decryption are required on both the server and client. As a result, more hardware resources are consumed by the server and client than HTTP, resulting in increased load. – There is no fundamental solution to slow down, we will use hardware like SSL accelerators to fix the problem. The hardware is dedicated to SSL communication, which can improve the computing speed of SSL several times compared with software. Use the SSL accelerator only for SSL processing to share the load.

Why not use HTTPS all the time?

Encrypted communication consumes more CPU and memory resources than plain text communication. If every communication is encrypted, it consumes a considerable amount of resources, and the number of requests that can be processed on a single computer is bound to decrease. Therefore, HTTP is used for non-sensitive information and HTTPS is used for encrypted communication only when sensitive data such as personal information is included. Especially when the most visited Web sites in the encryption process, they bear the load should not be underestimated. When encrypting, not all content is encrypted, but only those that need to be hidden to save resources.
A desire to save money on certificates is also a factor. Certificates are essential for HTTPS communication. The certificates used must be purchased from a certification authority (CA). Certificate prices may vary slightly depending on the certification body. Services that do not pay for certificates, as well as personal web sites, may simply choose to use HTTP for communication.

Reference books

Illustrated HTTP

The Definitive GUIDE to HTTP