This article appears on Github Blog.

HTTPS works, I have seen some before, but the whole process and some details are still in a vague state, before I also had some questions: “How is the certificate verified?” “, “What is the TLS handshake process? “, “How to calculate the symmetric key? “, “How many pre-master key random numbers are used to calculate? And so on, based on these questions, also took some time to gradually understand, based on their own understanding, made a HTTPS series of articles, hoping to help readers who have this question.

This article is the first in a series of articles about what symmetric encryption, asymmetric encryption, digital certificates, key negotiation and other concepts are and can do.

Potential problems with using HTTP

In HTTP, the network transmission between data is plaintext, which is easy to be stolen and attacked by middlemen. The data is forged and sent to the server, and the server cannot judge whether the source of the data is accurate after receiving the data.Why use HTTPS? To put it bluntly, “HTTP is not secure” and cannot guarantee data accuratelyconfidentiality,authenticity,integrity.

What is the HTTPS protocol

HTTPS is not a new protocol. It is an HTTP protocol based on SSL/TLS. It is equivalent to HTTPS = HTTP + SSL/TLS and protects the integrity and confidentiality of data transmission between users’ computers and website servers.

See from OSI model diagram is mainly in the application layer and transport layer directly more SSL/TLS protocol.Here the most important part of SSL/TLS is the key part of our study of HTTPS, SSL/TLS as a secure encryption protocol, it provides us with secure communication channels over insecure infrastructure.

The name SSL/TLS is sometimes confusing. Today, we usually refer to the TLS protocol. Let’s take a look at its history.

History of SSL/TLS

SSL is short for Secure Socket Layer. Originally developed by Netscape, the first version of the protocol was never released. SSL 2 was developed largely without consultation with security experts outside Netscape since its second release in November 1994, which was deemed to have serious flaws and ultimately failed. After SSL2 failed, Netscape redesigned the protocol entirely, focusing on SSL 3, which was released in 1995. SSL version 3 is still in use today, though it has since been renamed TLS 1.0, which many people may not know.

In May 1996, a TLS working group was formed to begin the migration of SSL from Netscape to IETF, which was also a lengthy process due to Netscape’s dispute with Microsoft over Web domination. In January 1999, THE IETF standardized SSL TLS 1.0, the predecessor of SSL 3.

TLS is short for Transport Layer Security. TSL version 1.1 was released in April 2006, fixing some key security issues, adding protection against CBC attacks (implicit IV was replaced with display IV, and padding errors in block cipher mode were changed).

In August 2008, TLS version 1.2 was released, including the addition of sha-2 password hash function, AEAD encryption algorithm, TLS extended definition, and AES password combination.

In August 2018, TLS version 1.3 was released. Many changes were made to enhance security and improve performance. For example, insecure or outdated algorithms such as MD5 and SHA-1 were removed, and only a few algorithms such as ECDHA and SHA-2 were retained. In terms of performance, the TLS handshake is improved from the previous 2-RTT handshake to 1-RTT handshake and initially supports 0-RTT.

Select the appropriate encryption algorithm

When we talk about HTTPS, we know that it is secure because it encrypts data during transmission. Let’s first understand what encryption method it chooses to implement.

Symmetric encryption

Symmetric encryption is oneThe Shared secretThe client and server share a key to encrypt data transmission. If the key is only held by both sides of the communication and does not guarantee leakage, it can ensure security.This is obviously not the case in the real world. For example, when the browser interacts with the server, the server transmits the shared key to the browser. How can the key not be intercepted or tampered with during transmission?

Asymmetric encryption

To further improve the security factor, the emergence of “asymmetric encryption” also known as “public key encryption”, the algorithm has two asymmetric keys, its characteristic is to usePublic key encryption Only the corresponding private key can be decrypted.Conversely, only the corresponding public key can be decrypted for private key encryption. Pay attention to,The private key is only visible to itself, and the public key is exposed to the public.Asymmetric encryption has higher security than symmetric encryption, but it requires more computation and is not suitable for scenarios with large amounts of data, and communication speed is not guaranteed.TLS encryption algorithm does not fully adopt this encryption algorithm.

Mixed encryption

The so-called “learn from each other’s strengths”, TLS encryption algorithm in the combination of asymmetric encryption and symmetric encryption, we call it the “hybrid encryption” algorithm, the use of asymmetric encryption authentication and shared key negotiation, only once, the subsequent communication using symmetric keys for data transmission.

In addition, the client and the server public key exchange process, there is still a wiretapped, classic or man-in-the-middle attack, because the public key in the process of transmission is visible, middlemen to client server role or on the server the client’s role, can still to tamper with the data, but the server can’t distinguish the source is reliable, The problem remains.

Here’s an example:

  • The server uses asymmetric encryption algorithm to generate A pair of public and private keys, which we call public key A and private key A, to solve the key exchange problem.
  • There is also an intermediary that also generates a pair of public and private keys, which we call public key B and private key B.
  • The browser makes A request to the server, and the server returns its public key A, which is intercepted in transit by the middleman (problem) and replaced by the middleman’s public key B to the browser.
  • The browser obtains public key B, which it does not know is a middleman, and generates a random number which is encrypted with public key B to obtain the “session key” needed for symmetric encryption.
  • The browser sends the generated “session key” to the server. The middleman intercepts the generated “session key” and decrypts the generated “session key” using his own private key B, and encrypts the generated “session key” using the server public key A and sends the generated “session key” to the server.
  • After receiving the message, the server decrypts it with its private key A to obtain the “session key”, but the server does not know that it has been intercepted by A middleman.

That’s not gonna work. What are we gonna do? The use of “hybrid encryption” here strikes a balance in terms of security and performance, using asymmetric encryption to exchange symmetric encryption keys has achieved the confidentiality we need.

Now we come to the next question: how do you ensure that the public key that the browser gets is trustworthy?

Digital certificates solve trust problems

For example, in the real world, we went to the bank to handle affairs, to the front of the counter you say I am zhang, should deal with the business, the bank staff need your papers first, must prove that you are really zhang SAN, can prove himself is the “identity card”, by the authority (the public security bureau in the real world) all the approval certificate.

The Public Security Bureau of the Online World

We also need to apply for a digital Certificate for our website.

A certificate is a digital certificate file containing the version, serial number, signature algorithm, issuer, validity period, public key, and so on. Before using HTTPS, our website will apply for a digital certificate from CA and install it on its own server. Then the browser sends a request and the server can return the digital certificate to the browser. How can we ensure that the digital certificate is not modified?

The Public Security Bureau has certain anti-counterfeiting technology when issuing our ID cards, and THE CA will also digitally sign the certificates when issuing them to ensure the integrity of the certificates.

The algorithm

The algorithm is a one-way encryption algorithm, also known as “hashing algorithm”, in the encryption of data does not need to provide the key, the encrypted data can not be calculated backwards.

It can realize to a big map as a small file after file encryption, like an article to extract a summary, but if the original change, even if is to add or remove a punctuation encrypted again will change completely different, the result of the current summary of some commonly used algorithm (MD5, SHA – 1) is considered to exist safety problems, In TLS version 1.3 it has been removed and sha-2 is now recommended, for example SHA256.

The CA creates a digest algorithm for plaintext data and generates a Hash value that cannot be reversely decrypted. This Hash value cannot be transmitted in plaintext, preventing the middleman from modifying the digest algorithm after modifying the certificate.

A digital signature

Digital signature, this name is also the same in the real world, for example, if I give you a certificate, to prove that I gave you, the most effective way is to sign, fingerprint, this can not be forged.

The CA also has its own pair of public and private keys and uses the hash value generated by the above digest algorithm to generate a digital signature that can be decrypted only by the corresponding public key.

The digital certificate

CA combines the digital signature with the information we request (server name, public key, host name, authority name, information, etc.), generates a digital certificate, and issues it to the server.

The following is rightwww.nodejs.redA screenshot of this domain name.

With digital certificates, clients and servers can use asymmetric keys to negotiate symmetric encryption keys for data encryption when they interact.

Negotiates symmetric encryption keys

Certificate authentication

We open an HTTPS url in the browser to initiate a request. After establishing a TCP connection, TLS handshake protocol will be initiated. After that, the server will return a series of messages, including the certificate message.

There is a trust chain problem in certificate verification. The certificate we apply to CA is usually issued by an intermediate certificate authority. For example, the domain name www.nodejs.red is issued by “R3,” which is a free certificate released by Let’s Encrypt on November 20, 2020, and it’s issued by “ISRG Root X1,” “ISRG Root X1”, without a superior issuer, is now considered to be the Root certificate.

This is shown belowwww.nodejs.redThis domain name website certificate chain relationship.In ourSome certificates of authority are pre-installed in the operating system, the browser trusts the Root certificate. If the Root certificate is local, use the Root certificate “ISRG Root X1” public key to verify whether the intermediate certificate “ISRG Root X1” is trusted. If the verification succeeds, use “ISRG Root X1” to verify the final entity certificate.www.nodejs.red”If yes, the certificate www.nodejs.red is considered to be trusted.

Certificate verification is basically in this mode, eventually finding the locally installed root certificate, in reverse step by step verification, to confirm that the issuer of the website is trusted. As shown in the figure below.

If the certificate returned by the server passes the verification, the browser can obtain the plaintext and signature information of the digital certificate and perform the following operations:

  • Decrypt the signature with the public key of THE CA organization (the public key of the CA organization does not need to be transmitted and will exist in the root certificate provided by the operating system), and obtain the Hash value calculated by the summary algorithm, which is tentatively named hashCode1.
  • Encrypt the plaintext data using the digest algorithm specified in the certificate, resulting in hashCode2.
  • If the plaintext data has not been tampered with, hashCode2 should be equal to hashCode1.
  • Now that the certificate is trusted, you can get the server’s public key.

If the certificate information is tampered, the client cannot change the signature without the certificate private key. After receiving the certificate, the client can compare the signature with the original information to know whether the certificate is tampered.

Another question, suppose: “What if our certificate was swapped by a hacker with a valid certificate?” The domain name and other information of the certificate cannot be tampered with. Even if the hacker switches to his or her own legitimate certificate, the problem can be found by comparing the domain name information when the browser requests it.

There is no absolute security, if the hacker installed his root certificate on your computer, so it can issue false certificates of any domain name, therefore, encounter some unreliable files or do not install disorderly, to ensure the security of the root certificate.

Computing encryption key

The browser sends a request to the server, and the server returns a certificate. During this process, the two sides exchange two parameters, namely the random number of the client and the random number of the server, to generate the master key. However, the generation of the master key depends on a pre-master key.

Different key exchange algorithms have different ways to generate pre-master key. A key exchange algorithm is RSA. The key exchange process of RSA is very simple. The client generates the pre-master key, which is a random number of 46 bytes, encrypted with the public key of the server, and sends the message to the server through key exchange.

Rsa-based key exchange algorithms are considered to have a serious vulnerability where anyone with access to the private key (for example, due to politics, bribery, forced entry, etc.) can recover the pre-master key, build the same master key, and eventually leak the key to decrypt all previously recorded traffic. This key exchange algorithm is being replaced by other algorithms that support forward secrecy. For example, the ECDHE algorithm uses a separate master key for each link in key exchange. If there is a problem, it only affects the current session and cannot be used to trace and decrypt any other traffic.

ECDHE is a temporary elliptic curve key exchange algorithm. The Client and Server will exchange two information Server Params and Client Params respectively. In each link, a new pair of temporary public and private keys will be generated. Premaster secret can be calculated by the client and server respectively based on ECDHE algorithm.

At this point, the Client and Server have three Random numbers: Client Random, Server Random and Premaster Secret.

The master key in TLS V1.2 is via a pseudo-random function master_secret = PRF(pre_master_secret, “master secret”, Random + ServerHello.random) calculated.

However, the master key is not the final session key. The final session key is generated by passing in the master key, client random number and server random number using PRF pseudo-random function.

key_block = PRF(master_secret, "key expansion", server_random + client_random)
Copy the code

The final session key includes: Symmetric key, MAC key, and iv key, which is generated only when necessary.

All of this is done in TLS’s handshake protocol. When the handshake is complete, the client/server establishes a secure communication tunnel to send application data.

HTTPS complete process diagram

In the next article, Wireshark will be used to capture network data packets for analysis, and a practical explanation will be made. Learn more about HTTPS.

Reference

  • zhuanlan.zhihu.com/p/43789231
  • HTTPS authoritative Guide
  • www.wanweibaike.net/wiki- Transport layer security…
  • An article to understand HTTPS and the encryption behind it
  • Jamielinux.com/docs/openss…
  • www.zhihu.com/question/37…