With the rapid development of the Internet, the problem of security is more and more important, and human beings are “greedy”, far from satisfied with the security of HTTP, so there is HTTPS. How is HTTPS so much more secure than HTTP?







Problems with HTTP

  • Data is not encrypted: HTTP is transmitted in clear text and does not encrypt this information. Once an attacker can access this clear text, the user’s privacy is completely exposed. The solution is to encrypt information.






  • Authentication failure: In HTTP applications, the client and server cannot verify the identity of each other because there is no standard for verifying the identity of the peer. The solution is to use certificates to verify the identity of the other party.




  • Data is vulnerable to tampering: HTTP data passes through many nodes in transit, all of which can modify the original data, and there is no technology for clients and servers to ensure that the data received is the original data sent by the sender. The solution is to use algorithms to verify the data.






These problems with HTTP lead to significant security issues. Hence the birth of HTTPS.








HTTPS

HTTPS is not a new protocol, it is essentially HTTP, but HTTPS adds some network security on top of HTTP. HTTPS is a hypertext transfer protocol established on the secure socket level. You can consider HTTPS = HTTP + SSL/TLS

SSL/TLS is a security protocol. Secure Sockets Layer (SSL) : Secure Sockets Layer protocol. Transport Layer Security (TLS) : Transport Layer Security protocol. TLS is a development of SSL, the earliest security layer protocol, which became TLS after vulnerabilities were discovered. Currently, TLS1.2 and TLS1.3 are the most used versions.

Security protocol version To establish a secure connection, ensure that both parties use the same protocol version after negotiation.



Before we get into how the browser transfers data using HTTPS, let’s take a look at a few things.

The encryption algorithm

HTTP does not encrypt data. HTTP does not encrypt data. However, there are many different encryption algorithms.

Key: a parameter used when plain-text is converted to ciphertext or when ciphertext is converted to plain-text

1. Symmetry algorithm

Symmetric algorithms, as their name suggests, use the same key to encrypt and decrypt data.

Advantages: It is very efficient and can encrypt long data.

Disadvantages: The biggest problem is that it is difficult for both parties to ensure secure keys. Because the first step always needs to negotiate the key through communication, hackers may monitor and tamper with the key during the negotiation. If a fixed key is used, the algorithm is meaningless.

2. Asymmetric algorithms

Unlike symmetric algorithms, asymmetric algorithms use different keys.

  • Asymmetric algorithms have two keys: public and private.
  • The public key is open to all, the private key is their own secret, can not be known to others.
  • With public key encryption, only the private key can unlock. Again, with private key encryption, only the public key can be unlocked

Advantages: It perfectly solves the problem of “unable to exchange keys safely” in symmetric algorithms

Disadvantages: poor performance, far less efficient than symmetric algorithms.

3. Symmetric + asymmetric algorithms

Symmetric algorithm has high efficiency, but it cannot exchange keys safely. Asymmetric algorithms can exchange keys safely, but the efficiency is very low. Therefore, we can use asymmetric algorithm to exchange keys in the first communication negotiation, and then use symmetric algorithm to continue the conversation. Is this a combination of its advantages? As shown below:




  • 1. When the client initiates a request, the server first returns a public key in plaintext to the client
  • 2. After obtaining the public key of the server, the client uses the public key of the server to encrypt the key of the client and sends the key to the server.
  • 3. After getting the data encrypted by the client with its public key, the server decrypts it with the private key and obtains the key of the client
  • 4. The server encrypts its own key with the obtained client key and sends it to the client.
  • 5. The client takes the key from the server

This is the communication process of symmetric algorithm + asymmetric algorithm.


The digital certificate

HTTP cannot authenticate authenticate authenticate authenticate authenticate authenticate authenticate authenticate authenticate authenticate authenticate authenticate authenticate authenticate authenticate authenticate authenticate authenticate authenticate So what is a certificate? How do you use certificates to verify your identity?

What is a digital certificate?

Digital certificate refers to a digital authentication that marks the identity information of each party in Internet communication. People can use it to identify each other on the Internet. It is a relatively authoritative and fair certificate issued by electronic Commerce Certification Center (CA Center for short).

A digital certificate is like our ID card, an authentication used to verify informationCopy the code

Digital certificates contain server information: public key, certificate signature, certificate authority information, and so on. After obtaining the certificate of the server and verifying the certificate, the client can obtain the public key of the server. Using the public key, the symmetric and asymmetric encryption algorithm can be implemented.

A digital certificate authenticates data sources and securely obtains the public key of the server for encrypted communication.


2. Certificate generation

Digital certificates are issued by a certification Authority (CA) or a sub-certification authority of the CA. The root certificate is the basis for the authentication center to establish trust relationship with users. Digital certificates must be downloaded and installed before users can use them.




Certificate generation: the authentication center hashes the basic information of the user certificate and encrypts the hashes with its own private key. As shown below:




  • The server generates its own key pair and transmits the public key and part of the server’s identity information to a certification authority (CA).
  • The ca uses the hash algorithm to obtain a 128-bit digest of the server information, and encrypts the digest asymmetrically with its own private key to obtain the digital signature of the certificate.
  • The ca sends the server information (plaintext), digital signature, and ca information (including the ca public key) to the server
  • When the client requests the server, the server returns the certificate to the client.





3. Distribution of certificates

A CA can distribute certificates to users in several ways.

  • Out-of-band Distribution is an offline mode. For example, the key pair is generated by the software operator on behalf of the customer, and the certificate is downloaded from the CA by the operator on behalf of the customer, and then the private key and the downloaded certificate are stored in the floppy disk and handed to the user. The advantage of this is that users do not have to go online to download the certificate.
  • In-band distribution, In which users download digital certificates from the Internet to their computers. At the time of downloading, users are required to show a “reference number” and an “authorization code” to prove their identity to the CA. It’s cheaper to do that.
  • Query the public database. CA also places certificates centrally in a public database for publication, and users can call them up with queries.





4, how to verify the digital certificate?

Here’s an example:

Student A verifies student B’s certificate

  • Student A has obtained student B’s certificate (student B’s certificate, student B’s certificate issued by CA, information of certificate institution)
  • Student A uses CA’s public key to decrypt student B’s certificate issued by CA to obtain A summary S1
  • Student A uses the hash algorithm to obtain an abstract S2 for student B’s information
  • Compare S1 and S2 to see if the information has been tampered with
  • Verify the validity period of student B’s certificate, certificate invalidation list (CRL,OCSP) and signature of the issuer (certificate chain)

Certificate expiration date: When a certificate is issued, the expiration date is set. It is usually one year for individuals and three years for enterprises. This periodic update generates new key pairs, which is good for security.

Certificate invalidation list: When we apply for a certificate cancellation, the CA can only issue a notice declaring that the certificate has been invalidated because once issued, it cannot be recalled. This “notice” is called the certificate invalidation list.





5. Certificate chain

When verifying digital certificates, we mentioned a certificate chain. What is a certificate chain? What does a certificate chain do?

Recall that the CA uses its private key to encrypt the production certificate, and the user uses the CA public key to decrypt the authentication. You can’t get the CA key in the process. How do we know if there is a hijacking (the hacker encrypts with his own private key while changing the certificate authority’s public key to his own)? This time need certificate chain!!

Open any HTTPS website and there is a small lock on the left side of the address bar. Click to see the certificate information inside.




The certificate hierarchy is GlobalSign Root CA –> GlobalSign Organization Validation CA –> baidu.com

  • The end – user: namelybaidu.com, the certificate contains baidu’s public key, which is used by the visitor to encrypt data and then transmit it to Baidu
  • Intermediates: intermediates, the Issuer mentioned above, are the certificates used to authenticate the public key holders and verify that the end-user certificate used for HTTPS is indeed from Baidu. These intermediates certificates can have many levels, which means that the Issuer may have many levels
  • Root: the Issuer, the highest level of Issuer, is responsible for verifying the legitimacy of the intermediates status

This actually represents a trust chain. The ultimate purpose is to ensure that the end-user certificate is trusted, and the public key of the certificate is also trusted. The certificate chain is as follows:







Before said the certificate verification process, so how is the certificate chain verification process?

  • To obtain end-user’s public key, you need to obtain end-user’s certificate, because the public key is stored in the certificate
  • To prove that the obtained end-user certificate is trusted, it is necessary to check whether the certificate is authenticated by intermediate authority, which is equivalent to whether the certificate has the digital signature of the authority
  • Is an authority trusted because it has a digital signature? You need to continue the verification, that is, check whether the digital signature of the upper-level authority exists
  • The ultimate trust chain is Root CA, which adopts self-signature and can only trust its signature unconditionally






The above is about the basic knowledge of the certificate chain, here is a foreign language is about the certificate chain, interested can watch a point


6. Hash algorithm

Hash algorithm, also known as digest algorithm or hash algorithm, can compress any data object into a data digest. For the same hash algorithm, the compressed data digest has a specific length and format, so as to form the “fingerprint” of the data. Any small change to the original data object can make a big difference to the new “digital fingerprint”.

The hash algorithm is often used to determine whether two data objects are the same. The “digital fingerprint” formed by the hash algorithm must be the same for the same data objects. However, the original data objects corresponding to the same “digital fingerprint” may not be the same because of the possibility of hash conflict. In reality, there are a wide range of scenarios, such as the most commonly used MD5 for two file objects to determine whether their contents are the same.

Hashing algorithm and encryption algorithm differences:

  • Encryption algorithm: the corresponding can be decrypted, the purpose is to carry out data encryption after the safe storage or transmission, can be through the key to get the original, is a reversible process.
  • Hash algorithm: in essence, the category of “digital fingerprint”, through the hash algorithm to form the “digital signature”, directly in the algorithm level can not get the original text, is irreversible. Of course, so-called decryption in the form of rainbow tables and data dictionaries is essentially a brute force process.



HTTPS Connection Process

With that in mind, let’s take a look at the PROCESS of making an HTTPS connection.




The flow can be seen from the figure above:

  • 1. First, the client initiates a request with the local TLS version, the encryption algorithm suite supported by the client, and generates a random number R1 of the client
  • 2. After receiving the request, the server confirms the TLS version, selects one from the algorithm suite supported by the client, generates a server random number R2, and then returns the encryption suite selected by the client, random number R2, CA certificate (including public key), and certificate signature.
  • 3. After receiving the certificate from the server, the client verifies the validity of the certificate (for details, see How to verify the validity of the certificate). Two random numbers R1 and R2 are used to generate pre-master secret, which is encrypted and sent to the server using the public key of the server. And the client generates the session key and P1-P6
  • 4. The server uses private key decryption to obtain pre-master secret. The session key and P1-P6 are generated
  • 5. The client generates Encrypted Handshake Message using the hash of P1 generated in step 3 and sends it to the server and sends a FIN packet, indicating that the end is complete.
  • 6. The server uses p1 generated in step 4 to verify the Encrypted Handshake Message of the client.
  • 7. After the authentication passes, use P2 to encrypt hash handshake information to generate Encrypted Handshake Message. Then send the Encrypted Handshake Message to the server and send a FIN packet.
  • 8. After the client uses P2 to verify the Encrypted Handshake Message of the server, the HTTPS connection process ends

At this point, the two sides can safely communicate.

Questions: 1. What is the function of random number R1 on the client and R2 on the server? A: After exchanging random numbers, both parties have R1 and R2. The client generates a pre-master secret using R1 and R2, and then generates the session key using these three numbers. The same server uses these three numbers to generate session keys. 2. What is the role of P1-P6? Answer: 6 keys P1-P6 are used for subsequent identity authentication. What's the use? A: Encrypted Handshake Message performs a simple operation (hash+ encryption) between the data currently received and sent. The purpose of this packet is to tell the peer what data is received and sent during the handshake. To make sure no one tampers with the message. Second, the message is used to verify the correctness of the key. Encrypted Handshake Message is the first message Encrypted with the symmetric secret key. If the encryption and decryption check of this message is successful, it indicates that the symmetric secret key is correct.Copy the code



Charles crawls the HTTPS principle

As an extension, I’ll explain how Charles captures HTTPS

Charles acts as a middleman, and for the client, the server is Charles. For the server, the client is Charles. As shown below:




Simply put, Charles acts as a “man-in-the-middle agent” and obtains the server certificate’s public key and the symmetric key for the HTTPS connection, provided that the client chooses to trust and install Charles’s CA certificate, otherwise the client will “alarm” and abort the connection. So HTTPS is pretty secure.

Refer to the connection

HTTPS process details

A brief discussion on the principle of Charles grasping HTTPS

Encrypted Handshake Message (19) Encrypted Handshake Message