This is the first day of my participation in the August Challenge. For details, see:August is more challenging

Like and follow, no longer get lost, your support means a lot to me!

🔥 Hi, I am Ugly. This article has been collected by GitHub · Android-Notebook. Welcome to grow with Peng Chouchou. (Contact information & group entry on GitHub)

preface

  • Secure transmission of data is a very broad topic. Scenarios such as HTTPS, file signature, and application signature actually solve the problem of how to securely transfer data.
  • In this article, I will take you through the basic model of secure data transmission, as well as the concepts and functions of encryption, digests, signatures, and digital certificates. Please be sure to like and follow if you can help, it really means a lot to me.

directory


1. An overview of the

1.1 Three elements of CIA

Before discussing today’s topic, it is necessary to clarify what security is. In the computer field, the so-called “security” actually refers to “information security”, its basic meaning can be summarized as three elements, referred to as CIA three elements:

  • 1. Confidentiality: Ensures the Confidentiality of data during transmission and storage. The main means are encryption, authority management and sensitive information desensitization;

  • 2. Integrity: ensuring that the content of the data is complete and has not been tampered with. The main means is digital signature;

  • 3. Avaliability: ensuring that services remain available. For example, it can withstand Dos and other network attacks.

1.2 Risks of insecure Channels

Data needs to be transmitted through a channel. In the narrow sense, the channel refers to the communication media such as newspapers, wired networks and radio waves, while in the broad sense, the channel can be understood as the whole process of data from distribution to reception. In an insecure channel, the following security risks are encountered:

  • 1. Eavesdropping risk: modern computer networks are built on the transmission capability provided by TCP/IP protocol family. Data may be eavesdropped at every link on the transmission line, thus leading to sensitive data leakage;

  • 2. Tampering risk: Data may be tampered with during transmission, such as man-in-the-middle attack. An attacker can set up separate connections with both parties, making them believe they are making a private connection, but not aware of data tampering.

  • 3. Disguised risk: Attackers can disguise themselves as legitimate.

1.3 How to Achieve Transmission Security?

Our topic today, “Encryption + signature + certificate”, is essentially a solution to secure data transmission over an insecure channel. To achieve secure data transmission, in fact, it is necessary to effectively solve the three risks in the insecure channel:

  • 1. Encryption — anti-eavesdropping: the plaintext is converted into ciphertext. Only the expected receiver has the ability to decrypt the ciphertext into plaintext, and even if the ciphertext is stolen by the attacker, the data content cannot be understood;

  • 2. Verify integrity — prevent tampering: Calculate a summary of the original data and deliver the data together with the summary to the communicating party. After receiving the data, the receiver also calculates the summary of the data and compares it with the accepted summary, so as to determine whether the received data is tampered. However, because abstracts received can also be tampered with, more secure means are needed: digital signatures;

  • 3, authentication data source – prevent masquerade: digital signature can verify data integrity, but also can authenticate data source, prevent masquerade.


2. Encryption — anti-eavesdropping

2.1 What is Encryption?

Encryption is the process of converting Plaintext into Ciphertext. Only the intended recipient has the ability to decrypt the Ciphertext into Plaintext. Even if the Ciphertext is stolen by an attacker, the content of the data cannot be understood.

In the period of classical cryptography, data security depended on the privacy of the algorithm. Once the encryption algorithm was cracked or leaked, the data security was lost. In the modern cryptography era, Kirkhoff’s principle became the basic design principle of encryption systems: the security of data is based on the key rather than the secrecy of the algorithm.

According to Kirkhoff principle, the modern secure communication model is a key – based secure communication model. In this model, encryption and decryption use the same key, which is symmetric encryption cryptosystem; On the other hand, encryption and decryption use different keys, which is an asymmetric cryptosystem.

2.2 What are the differences between symmetric encryption and asymmetric encryption?

  • 1. Key management: In the symmetric encryption algorithm, the key needs to be sent to the communication party, so there is a risk of key leakage; In asymmetric encryption, the public key is public while the private key is secret, preventing the private key from being transmitted externally.

  • 2. Key function: The data encrypted by the public key can only be decrypted by using the private key. On the other hand, the data encrypted by the private key can only be decrypted using the public key. (Note: the data encrypted by the public key cannot be decrypted using the public key, because the public key is public. If the public key can be decrypted, the security of encryption is lost.)

  • 3. Computational performance: The computational efficiency of asymmetric encryption algorithm is low, so in practice, a composite algorithm combining two algorithms is often used: firstly, asymmetric encryption is used to establish a secure channel to transmit symmetric keys, and then the key is used for symmetric encryption;

  • 4. Authentication function: In the asymmetric encryption algorithm, only one party holds the private key, which has authentication and anti-repudiation (this feature is applied in the digital signature algorithm in section 3).

To improve performance, the mixed encryption scheme, which combines symmetric encryption and asymmetric encryption, is adopted in HTTPS: During communication establishment, the session key is negotiated in asymmetric encryption mode, and the communication is symmetric encrypted based on the key.

2.3 Data Encryption Standard — DES

In 1977, Data Encryption Standard (DES) became the Federal information processing Standard in the United States, and gradually became the de facto Standard. Many mainstream symmetric Encryption algorithms were developed from DES algorithm.

The main disadvantages of DES algorithm are the poor encryption strength and computing performance

  • 1. The key length is too short: the key length of THE DES algorithm is only 56 bits, and the theoretical maximum encryption length is 256. With the improvement of computing power, the key can be cracked in a short time by exhaustive method.

  • 2. Can’t fight differential and linear cryptanalysis;

  • 3. Poor computing performance: When the DES key length is increased, the computing overhead of encryption and decryption increases exponentially.

2.4 Advanced Encryption Standard — AES

Advanced Encryption Standard (AES), also known as Rijndael [Rain-dahl] Encryption method, is one of the most popular symmetric Encryption algorithms.

Compared with DES, THE AES algorithm has the following advantages:

  • 1. Larger key length: The minimum key length is 12 bits and the maximum key length is 256 bits. It is difficult to crack by exhaustive method;

  • 2. Use WTS design strategy to combat differential and linear cryptanalysis;

  • 3, high computing performance: low computing and memory overhead, suitable for limited devices.

2.5 RSA algorithm

In 1977, three MIT professors Rivest, Shamir, and Adleman proposed the RSA encryption algorithm, in which RSA is the first letter of their last name. RSA is a classical asymmetric encryption algorithm and a classical digital signature algorithm.

The security of THE RSA algorithm depends on a mathematical problem called “factorization of large numbers” : it is very easy to multiply two large primes, but the complexity of factorization of a large integer is extremely high. If there were some fast factorization algorithm, the reliability of THE RSA algorithm would be greatly reduced. RSA algorithm has a systemic risk: “forward encryption is not supported”. In the RSA algorithm, the public key of the server is relatively fixed. Once the private key of the server is cracked, all encrypted data that has been sent will be cracked.

On the principle of RSA algorithm analysis, reference: brief analysis of RSA algorithm.

2.6 DH algorithm

The security of the DH algorithm depends on a mathematical problem called “discrete logarithms” : it is very easy to calculate the real number of a given logarithm, but it is extremely complicated to calculate the logarithm of a given real number. If there were some logarithmic algorithm, the DH algorithm would be much less reliable.

At present, the DH algorithm has a variety of implementations, the main differences are as follows:

  • Static DH algorithm: the private key of one party is static (usually the server private key is fixed), and does not have forward security;

  • DHE algorithm: The private keys of both parties are randomly generated at the key exchange node, which has forward security.

  • ECDHE algorithm: Using ECC elliptic curve characteristic, it can calculate public and private keys with less computation.

On the principle of DH algorithm analysis, reference: this HTTPS, true drop cow force!

2.7 Why use random numbers in security systems?

In the process of RSA key pair generation, we need to randomly generate two large prime numbers. In fact, many security systems require a random number, except in the RSA algorithm. Why? — The key is the unpredictability of random numbers, which makes cracking and packet replay attacks more difficult.

2.8 How do computers generate random numbers?

Random number is a very important point in the field of computer security. Many scenarios require a random number to generate random events, such as key generation, file name, sessionId/orderId/token, etc. The modern random number generation model still adopts the random number model designed by Von Neumann in 1946:

1. Input any number as the “seed” and get a random number through the random number algorithm; 2. Take the generated random number as a new seed and substitute it into the next round of calculation; 3. Repeat steps 1 and 2 to generate a number of statistically significant random numbers.

However, the random numbers generated by this model are not absolutely random. As long as the sampling range is large enough, the random result will fall into the cycle, so the random number generated by this model can only be called “pseudo-random number”, and the period during which the random result falls into the cycle is called “random period”.

To get real random numbers, we need hardware support. In 1999, Intel integrated the world’s first true random number generator on its I810 chipset. Its scheme is to take the thermal noise of the circuit (the irregular movement of molecules) as the data source. The disadvantage is that the efficiency is too low, so the random number used in the computer is still the pseudo-random number implemented by software. Although the software can’t be really random, it can improve the randomness of the generator. For example, using more robust random algorithms (Java#SecurityRandom), using more complex seeds (system time, mouse position, network speed, hard disk read/write speed), expanding the range of random numbers, combining multiple random algorithms, and so on.


3. Digital signature – Verify integrity & authenticate data source

3.1 What is a Digital Signature?

Digital signatures, also known as Digital fingerprints, are a combination of message digest algorithms and asymmetric encryption algorithms that verify data integrity and authenticate the source of data.

The data signature algorithm model is divided into two main stages:

  • 1. Signature: The [abstract] of the data is calculated first, and then the [abstract] is encrypted with the private key to generate the [signature], and the [data + signature] is sent to the receiver;
  • 2. Verification: the [digest] of the received data is calculated using the same digest algorithm, and then the [signature] is decrypted using the pre-obtained public key. The [decrypted signature] and the [calculated digest] are compared for consistency. If yes, the data is not tampered with.

Tip: See Section 4 for how the receiver can safely get the sender’s public key in advance.

3.2 Why can The Integrity of digital Signatures be Verified?

The integrity verification mainly depends on the characteristics of the message digest algorithm. The principle of the digest algorithm is to extract the information in the original data according to certain operation rules. The extracted information is the message digest of the original data, which is also called data fingerprint. Well-known digest algorithms include MD5 algorithm and SHA series algorithm.

The algorithm has the following characteristics:

  • Consistency: the summary of the same data is the same when calculated multiple times, and the summary of different data (without considering collisions) is different;
  • Irreversibility: only the abstract of the original data can be extracted forward, and the original data cannot be derived from the abstract;
  • High efficiency: the generation process of the abstract is efficient and fast;

The model of the algorithm is divided into two main steps:

  • Abstract generation: the [abstract] of the data is calculated, and then the [data + abstract] one is sent to the receiver;
  • Validate abstracts: Use the same abstracts algorithm to calculate the [abstracts] of the received data and compare [received abstracts] with [calculated abstracts] for consistency. If yes, the data is complete.

It is important to note that data integrity cannot be rigorously verified by relying solely on the abstract algorithm. Because in an insecure channel, both the data and the digest are at risk of tampering, an attacker can tamper with the digest as well as the data. Therefore, the abstract algorithm needs to be combined with the encryption algorithm to strictly verify the integrity.

3.3 Why can a Digital Signature Authenticate data Sources?

This is because the signature introduces the sender’s private information (private key), and only the “legitimate sender” can produce a digital signature (encrypted string) that others cannot forge, which proves the true source of the data. When the receiver uses a “legitimate way” to obtain the public information of the sender (public key), and the digital signature is successfully verified, then the data is from a “legitimate receiver”.

In addition, the private information of the sender is introduced in the signature, and the public information of the sender is used in the authentication, which just conforms to the characteristics of “asymmetric encryption”. Therefore, the private information introduced in the signature is the private key, and the public information used in the authentication is the official public key.

3.4 The algorithm has collision, is it unsafe?

Abstract Algorithm (Hash algorithm) is essentially a compression mapping, so there must be different original data mapped to the same Hash value, this is the occurrence of a Collision (Hash Collision). In fact, hash algorithms such as MD5 and SHA-1 have been found to hash collisions quickly, and attackers can construct memory-tampered but consistent files to bypass checks. But it does not mean that it is completely unsafe, because it is difficult to tamper with valuable counterfeit content??

3.5 Why does salt need to be added to the abstract?

To improve security, we often add salt to the original data before generating a summary. Why do you do that?

This is to avoid “Rainbow tables” attacks and improve the security of simple data. Because the abstract algorithm has the characteristics of consistency, the same data calculated many times the abstract is the same. With this feature, an attacker can pre-generate a summary of a series of simple data and store a “summary to data” mapping, known as the rainbow table. After obtaining the data summary, if the summary is found to be in the rainbow table, the original data can be easily derived.

When setting passwords, users should also avoid using simple passwords such as 123456, because they are easy to be cracked by rainbow table attacks. In order to improve security, the original password is often added in the process of transmitting sensitive information such as phone numbers and passwords.

3.6 Can I First Use a Private Key to Sign the Original Data and Then Digest the Signature?

No, for two main reasons:

  • 1. Feasibility: The receiver needs to verify the integrity of the data through the digest, but the receiver cannot sign the data, so the consistency of the data digest cannot be verified;

  • 2. Time efficiency: it takes too long to sign (encrypt) the original data, and the abstract algorithm itself is a compression mapping, which can shorten the time consumed by the signature.


4. Digital Certificates — Securely issue public keys

In Section 3, we mentioned that the receiver needs to use the sender’s public key to verify the data’s authenticity. So, how can the receiver safely obtain the sender’s public key? This requires a digital certificate to ensure.

4.1 What is a Digital Certificate?

Digital signatures and digital certificates always come in pairs and cannot be separated from each other. Digital signatures are used to verify data integrity and authenticate data sources, while digital certificates are used to securely issue public keys. A digital certificate consists of the user information, the public key of the user, and the CA signature on the certificate entity information.

The digital certificate model is divided into two steps:

  • 1. Certificate issuance:

    • 1.1 The applicant sends the signature algorithm, public key, and valid time to the CA;
    • 1.2 After verifying the identity of the applicant, CA will type the information sent by the applicant into an entity and calculate the summary;
    • 1.3 The CA uses its private key to encrypt summaries and generate Certificate Signature.
    • 1.4 The CA adds the certificate signature to the digital certificate to form a complete digital birth.
  • 2. Verify the certificate

    • 2.1 The verifier calculates the abstract of the certificate entity using the same abstract algorithm;
    • 2.2 Using the CA public key (the CA public key information is integrated in the browser and operating system) to decrypt and sign the certificate;
    • 2.3 Check whether the decrypted data is consistent with the calculated summary. If the decrypted data is consistent, it is a trusted certificate.

4.2 What is a Certificate Authority (CA)?

A Certificate Authority (CA) is responsible for approving, issuing, archiving, and revoking digital certificates. The CA is divided into root CA and intermediate CA. In principle, the intermediate CA should issue the final entity certificate instead of the root CA directly issuing the final entity certificate. This is to avoid the scope of certificate invalidation, where once the root certificate is invalid or forged, the entire certificate chain is at fault.

4.3 What is a Certificate Chain?

Certificate chain is a certificate verification chain established by multiple digital certificates. A digital certificate consists of three parts: user information, user key, and CA signature to the certificate entity. To verify the validity of the certificate entity, you need to obtain the public key of the CA that issues the certificate. The public key exists in the certificate of the previous level. Therefore, in order to verify the validity of the certificate, you need to go back up the certificate chain to the root certificate.

A root certificate is a self-signed certificate. If you download a root certificate, you trust all certificates issued by the root certificate. Some trusted root certificates are built into the operating system or browser.

4.4 Digital Certificate Standards

A digital certificate consists of the user information, the public key of the user, and the CA signature on the certificate entity information. At present, the digital certificate is based on the X.509 standard developed by the Public Key Infrastructure (PKI). There are three versions of the standard, among which the third version of X.509 is the most common one. The main formats are as follows:

field describe
Version (Version) Indicates the version of the certificate
Serial Number Unique identification of the certificate
Signature Algorithm Identifier (Hash) Algorithm used for certificate signing
The period of Validity (gnosis.xml.validity) The start date and end date of the certificate validity period
Holder information (Subject) Holder of certificate
Public Key (Subject Public Key Info) The public key built by the holder
Information on the Issuer Certificate issuer
Signature (Certificate Signature) The issuer’s signature to the certificate entity

5. To summarize

By this point, you should have established a basic understanding of the safe transfer of data. Most of the time, we encounter concepts like encryption, digests, signatures, and certificates when we talk about the HTTPS protocol. In fact, these concepts go beyond HTTPS; wherever data is transferred over an insecure channel, these tools need to be applied to secure data transmission.

I’ll write a few articles later that discuss data security in more specific scenarios. Stay tuned for more articles:

  • When you build the Android Apk, you need to sign the application. After reading this article, can you now clarify the role of the application signature? Specific analysis: other mountain stone, can attack jade! This article understands the V1 / V2 / V3 signature mechanism

The resources

  • The Definitive Guide to HTTPS — by Ivan Ristich
  • HTTPS protocol — by Liu Chao, in Geek Time
  • Illustrated HTTP (chapters 7 and 8)
  • The Art of Java Encryption and Decryption — by Liang Dong
  • Graphic Networks — by Kobayashi Coding
  • Implementation Principle of VasDolly — Written by Tencent VasDolly Technical team
  • The Authoritative Guide to HTTP (Chapters 12, 13) — David Gourley, Brian Totty wait

Creation is not easy, your “three company” is the biggest power of Ugly, we see next time!