This article is published on my wechat official account at the same time. You can follow it by searching “Guo Lin” on wechat. The article will be updated every working day.

Today I have a technical post that has nothing to do with Android.

HTTPS is a technology that is now widely used. The countdown is on to HTTP’s demise as Internet leaders such as Apple and Google have made HTTPS mandatory in their operating systems, browsers and other mainstream products.

There is nothing special about HTTPS for the client developer because the code is the same as when writing HTTP requests. For this reason, however, many client developers do not know much about HTTPS, except that it is a secure encrypted network transmission, but do not know how it works.

Does Client Development need to know HTTPS?

Do client developers need to know HTTPS at all? I think it is necessary to understand how HTTPS works. It will help you to understand and solve some problems in your work more effectively. In addition, many companies like to ask questions about HTTPS during an interview, and it’s easy to get fired if you don’t know anything about it.

When I was learning HTTPS, I looked up a lot of information on the Internet, but most of the articles were not so easy to understand, which made many people a little afraid of HTTPS. I don’t think you need to know all the details of HTTPS to understand how it works (many articles online are too detailed to understand), but you just need to understand how it works and why it makes network traffic secure. So today I’m going to give you one of the best explanations of HTTPS.

Two basic concepts

Before we start talking about HTTPS, we need to understand two concepts: what is symmetric encryption, and what is asymmetric encryption? Both of these concepts are fundamental to cryptography, and they’re actually pretty easy to understand.

Symmetric encryption is simple. The client and server share the same key, which can be used to encrypt a piece of content, and can also be used to decrypt the piece of content. Symmetric encryption has the advantage of high encryption and decryption efficiency, but there may be some problems in terms of security, because the key stored on the client may be stolen. The representative algorithms of symmetric encryption are AES and DES.

Asymmetric encryption, on the other hand, is a bit more complicated and divides keys into two types: public and private. The public key is usually stored on the client, and the private key is usually stored on the server. Data encrypted with a public key can be decrypted only with the private key, and in turn, data encrypted with a private key can be decrypted only with the public key. The advantage of asymmetric encryption is higher security, because the encrypted information sent by the client to the server can only be decrypted with the private key of the server, so there is no need to worry about being decrypted by others, but the disadvantage is that the efficiency of encryption and decryption is much worse than that of symmetric encryption. The representative algorithms of asymmetric encryption are RSA, ElGamal, etc.

With these two concepts in hand, we’re ready to learn about HTTPS. Here’s an early question, and one that’s likely to be asked during an interview: Is HTTPS encrypted symmetrically or asymmetrically to ensure secure data transmission?

You will know the answer after you finish this passage.

Use guide

First, let’s take a look at the problems of traditional HTTP transmission over the network.

Since we transmit data in plain text, it is easy for data to be listened to and stolen. The schematic is as follows:

In addition, the transmitted data can be manipulated by someone with ulterior motives, resulting in the browser and the site to send and receive content is inconsistent. The schematic is as follows:

In other words, there are at least two major risks of data being listened on and data being tampered with, so HTTP is an insecure transport protocol.

The solution, of course, is to use HTTPS, but let’s try to figure out for ourselves how to secure HTTP transport, so we can understand how HTTPS works step by step.

Since data is not secure to be transmitted over the network in clear text, we obviously need to encrypt the data. Just mentioned, there are two types of encryption, symmetric encryption and asymmetric encryption. The advantage of symmetric encryption is that it is efficient to encrypt and decrypt data, and we are very efficient in transferring data over the network, so it is obvious to use symmetric encryption here. The schematic is as follows:

As you can see, since all the data we transmit over the network is ciphertext, we are not afraid of being picked up by listeners, because they have no way of knowing what the text is. When the browser receives the ciphertext, it simply decrypts the data using the same key as the website.

This mechanism does seem to secure data transfers, but there is one big hole: how do browsers and websites agree on which keys to use?

It’s a computer conundrum. Browsers and websites need to use the same key to encrypt and decrypt data. But how do you keep the key in the hands of both of them, and not anyone listening in? You’ll find that no matter how you negotiate, the first communication between the browser and the site must be in plain text. This means that we will never be able to create a secure symmetric encryption key following the workflow described above.

Therefore, it seems that using symmetric encryption alone will never solve this problem. At this time, we need to introduce asymmetric encryption to help solve the problem of creating symmetric encryption keys securely.

So why does asymmetric encryption solve this problem? Let’s try to understand it in a schematic way:

As you can see, if we want to securely create a symmetric encrypted key, we can have the browser generate it randomly, but instead of sending the generated key directly over the network, we have to use the public key provided by the site to encrypt it asymmetrically. Since the data encrypted by the public key can only be decrypted using the private key, the data transmission over the network is absolutely secure. When the website receives the message, it only needs to decrypt it using the private key to obtain the key generated by the browser.

In addition, asymmetric encryption is required only when the browser and the site agree on the key for the first time. Once the site receives the key randomly generated by the browser, both parties can use symmetric encryption to communicate with each other, so the work efficiency is very high.

So, do you think the above mechanism has been perfected? Not really, because we’re still missing one crucial step: how does the browser get the public key of the site? Although the public key belongs to the public data, transmission on the network is not afraid of being monitored by others, but what if the public key is tampered with by others? The schematic is as follows:

That is to say, as long as we get the public key of any website from the network, there must be the risk of the public key being tampered with. Once you use a fake public key to encrypt the data, it can be decrypted by someone else using a fake private key, with disastrous consequences.

The design seems to have hit a dead end here, because there is no way to securely access a web site’s public key, and we obviously can’t have all the web sites’ public keys preinstalled in the operating system.

At this point, a new concept must be introduced to break the deadlock: the CA agency.

The CA is used to issue digital certificates to websites so that browsers can securely access their public keys. So how do CA agencies accomplish this daunting task? Let’s start with a step-by-step analysis.

First of all, as a website administrator, we need to apply to the CA and submit our public key to the CA. The CA will use the public key we submit, plus a series of other information, such as the domain name of the website, the duration of validity, and so on, to create a certificate.

After the certificate is created, the CA will use its private key to encrypt the certificate and return the encrypted data to us. We only need to configure the obtained encrypted data to the website server.

Then, whenever a browser requests our site, we first return this encrypted data to the browser, which then decrypts the data using the CA’s public key.

If the decryption is successful, we can get the certificate issued by CA to our website, including the public key of our website. You can view the certificate details in your browser’s address bar by clicking on the small lock icon to the left of the url, as shown in the image below.

Once you have the public key, the process follows as described in the diagram.

If the decryption fails, the encrypted data is not encrypted by a legitimate CA using the private key, and may have been tampered with. Therefore, a well-known exception interface will be displayed on the browser, as shown in the following figure.

So you may ask, is it really safe to have a CA institution? We need to use the CA’s public key to decrypt the data in the browser, so how to get the CA’s public key safely?

This problem is easily solved because there are an infinite number of websites in the world and only a few CA agencies in total. Any legitimate operating system will have all the public keys of mainstream CA institutions built into the operating system, so we do not need to obtain additional, decryption only need to iterate over the public keys of all built-in CA institutions in the system, as long as any public key can decrypt the data normally, it is legal.

The built-in certificates of Windows are as follows:

But even if the CA’s public key can be used to decrypt the data properly, the current process still has problems. Since every CA creates certificates for thousands of websites, if an attacker knows that ABC.com uses a CA certificate, he can apply for a valid certificate from that same CA and replace the encrypted certificate data returned when the browser requests ABC.com. The schematic is as follows:

As you can see, since the certificate applied by the attacker is also made by a formal CA authority, this encrypted data can of course be successfully decrypted.

For this reason, the certificate produced by all CA agencies contains a lot of other data besides the public key of the website to assist the verification. For example, the domain name of the website is one of the important data.

In the same example, if the certificate contains the domain name of the site, the attacker can only get nothing back. Even if the encrypted data can be successfully decrypted, the domain name contained in the decrypted certificate does not match the domain name requested by the browser. In this case, the browser still displays an abnormal interface. The schematic is as follows:

Well, the scheme design here, in fact, our network transmission has done enough security. Of course, that’s really how HTTPS works.

So back to the original question: does HTTPS use symmetric or asymmetric encryption? The obvious answer is that HTTPS uses a combination of symmetric and asymmetric encryption.

Of course, if you want to dig deeper, there are plenty of details to explore in HTTPS. However, if I continue to write this article, it may not be the best explanation of HTTPS, so I think it’s just right.

If, like me, you are mainly engaged in client-side development, you know enough about HTTPS to deal with common interview and job problems.

Follow my technical public number “Guo Lin”, every day there are quality technical articles push.