Common encryption, encoding and Hash | Java development

This article is participating in “Java Theme Month – Java Development in Action”, see the activity link for details

In daily during the development process, in order to guarantee the safety of application as well as the security of the communication, we will use the essential way of encryption, such as at the time of call interface using asymmetric to encrypt data, important to encrypt the string in the program, to prevent the decompiled view, etc., today let’s take a look at a variety of encryption methods,

Symmetric encryption

The secret key and encryption algorithm are used to transform the data, and the meaningless data is regarded as ciphertext. Using the secret key and decryption algorithm to reverse transform the ciphertext, the obtained data is the original data.

The ciphertext is first encrypted through the encryption algorithm, and then sent to the target. After receiving the ciphertext, the target will decrypt it through the decryption algorithm

Symmetric encryption encrypts any binary data.
Classical algorithms: DES, AES

DES secret key is too short, so it was abandoned. If the secret key is too short, it will be easy to crack. Why? If the key is too short by force, the key will be tested quickly and then cracked.

Now the mainstream is AES. Both of these are symmetric encryption.

Asymmetric encryption

Use public key to encrypt data to obtain ciphertext. Use the private key to decrypt the data to get the original data.

Unlike symmetric encryption, asymmetric encryption uses the same encryption algorithm for decryption, but the secret key is different

Example: for example, if two parties want to communicate, the contents of the communication are only 10 characters: 0,1,2,3,4,5,6,7,8,9. Add secret key: +4 for each character, unlock secret key: +6 for each character

Send message: 110

Encryption: 554

Decryption: 5+6 = 11, get the overflow is 1, the middle is the same, the last is 4+6 = 10, get the back overflow is 0, the final result is 110.

This, of course, is inexcusable, but it can explain the core principles of asymmetric encryption, the chief of which is overflow. If overflow is not allowed, asymmetric encryption cannot be played.

Problem: If A and B use asymmetric encryption to communicate, there is no problem, but the problem is how to send the secret key to each other?

If A and B communicate, A has its own encryption key and decryption key, as does B.

So how do you solve the key transfer problem? The answer is to publish the encryption keys directly

Understand:

During communication, USER A gives user B his encryption key, and user B gives user A his encryption key.

A sends A message to B, which is encrypted by the encryption key B, and then sent to B. After receiving the ciphertext, user B can use the local decryption key to decrypt it. But: if C intercepts the encryption key and ciphertext in the process of sending, can he decrypt it? Obviously not, because the encryption and decryption key is not the same, so even if intercepted can not decrypt
In the question above:

The encryption key corresponds to: public key

Decryption key corresponds to: private key

The public key can be arbitrarily published, but the private key can not be published to anyone and cannot be transmitted.
Delay purposes

Can a public key unbind a private key?

Yes, first use the private key to encrypt, get the ciphertext, then use the public key to encrypt again to get the original data. This is called signature verification.

Why is that?

Since the first through the public key encryption after encryption through the private key can get the original data. In the same way, encryption with the private key and then encryption with the public key can get the original data.
Signature and Verification

Asymmetric encryption can also be used as a digital signature technique because the private and public keys are mutually intelligible

Signature: Use the private key to encrypt the original data using an algorithm (called signature) to obtain the signature data

Authentication: Using the public key to encrypt the original data using an algorithm (called authentication) to obtain the original data

For example, after I sign a file, I get signature data. Someone holding a file that they don’t understand can be verified by the public key, which means that the file was signed by me. Because the private key is only known to me, no one can randomly create a piece of data that happens to be the original data verified by the public key.
Use both encryption and signatures

It’s the same picture up here

A sends A message to B, which can be intercepted by C. C cannot decrypt the original data, but C can use the public key to encrypt A piece of data again and send it to B.

For example, C sends: Lend me 30,000 yuan. And then B gets it, decrypts it using the private key, finds out it’s a loan, and sends the money. This would result in B being cheated of his money.

How to solve: use encryption + signature

User A encrypts the message using the peer’s public key and signs the message using its own private key

After receiving the message, user B decrypts the original data using its own private key and uses the other party’s public key for authentication

So, once C gets the data, he can encrypt it with the public key, but he can’t sign it with the private key, so the problem is solved.
Classical algorithms :RSA, DSA

RSA: can be used for encryption, decryption, and signature

DSA: Designed specifically for signature. He has the advantage of being faster.
Advantages: It can be transmitted over insecure networks
Disadvantages: the calculation is complex, so the performance is much worse than symmetric encryption

Key and login password

Key (Key)

A key is a thing that fits the ciphertext and can be used to decrypt the ciphertext
- Scenario: Used for encryption and decryption
- Objective: To ensure that data theft is not readable content
Login password

It’s basically a passcode, it’s an authentication.
- Scenario: Used for authentication when entering a website or logging in
- Purpose: The data provider protects the user’s data and only provides permissions when “you are you”

Base64

Converts binary data to a string of 64 characters, with 26 uppercase and lowercase letters, for a total of 52 characters, followed by 0 to 9, followed by + /, for a total of 64 characters

What is binary data

Non-text data is binary data, such as pictures, music, movies, etc.
use
- Let the original data have the characteristics of strings, such as can be placed in URL transmission, can be kept to text files, can be text transmission through ordinary chat software
- Changing human readable strings into unreadable strings reduces the risk of voyeurism
Base64 encrypted transmission of images can be more secure and efficient, really?
- Base64 has no security whatsoever and can be used to retrieve metadata backwards from the code table
- Base64’s efficiency is bogus. A Base64 conversion will result in a string that is larger than the original data, so it will not be efficient, rather it will be inefficient.
Variant: Base58

He removed four characters from Base64, 0, O, I, L, +, and /, because it was confusing. + / was removed because of double click replication.
Variant: URL Encoding

The reserved characters in the URL are encoded with %, and the + and/are replaced with two other characters

For example :blog.csdn.net/ hahaha

Put the url above in your browser and press Enter, then copy and paste it into the following address:

Blog.csdn.net/%E5%93%88%E…

Because browsers don’t support displaying men, even if you look like a man, he’s already converted,

Note that there is a space in the middle if you type China in the flowmeter. In browsers, the + is used directly instead, and/also has a unique function, which is why you need to change the + / character to something else.

Objective: To eliminate ambiguity and avoid parsing errors

Compression and decompression

Compression: Data is stored in a different way to reduce storage space
Decompress: To restore compressed data to its original form for use
Common compression algorithms: DEFLATE, JPEG, MP3
- DEFLATE: Archive a whole bunch of stuff and compress it at the same time
- JPEG: Compress the image
- MP3: Compress the sound
Is compression coding?
- What exactly does coding mean?
  
  There is no official definition of encoding. For example, change A to B and back again without losing or adding any information. That’s the code
Compression and decompression are fully consistent with this feature. So compression is also a form of coding.

Codec of media data

What is picture, audio, video codec

Image encoding: the image data coroutine JPG,PNG and other file encoding format

If you put the data in a format, for example, if a white dot is represented by FFFFFF, if an image is 64 × 64, there will be 64 × 64 FFFFFF,

To encode this 64*64 image, you can specify the following format:

YS: FFFFFF = 64 * 64;

Through the above format can be clearly seen, and decoded.

Of course, this is just a simple example, and good algorithms don’t do this, but the idea is there
Image decoding: JPG,PNG and other file data parsing into standard image data.
Audio and video codec, and above all, lossy compression and lossless compression, etc., is the sound quality is not very good, nothing but pictures, of course, also can, for example WeChat expression now 1 MB, you will need to compress images, using good algorithms compressed images are smaller, but still look similar to the original image.

serialization

The process of converting an object (usually in memory) into a sequence of bytes

Java serialization mechanism

Purpose: To allow things in memory to be stored and transferred
Is serialization code?

It’s not strictly encoding, which is converting from A format to B format, and converting between them at will, but serialization is the process of serializing an in-memory object into bytes. It’s pretty much the same, depending on how you interpret it.

Hash

The conversion of arbitrary data into a specified (usually small) range of data. Its main functions are abstract, digital fingerprint. For example, if there are 200 people, the 200 people are numbered using hash, for example, 001,002. Each number corresponds to one person, and this number is called the hash value.

Classic algorithms: MD5, SHA1,SHA256, etc.

Hash has an algorithm, and it calculates the hash value based on the algorithm. The hash value needs to be calculated at the same time that the collision rate, which means the hash value is different, is very low. And it’s not easy to crack.

If you have learned Java, you should know hashCode. You can re-hash the hashCode method to compute a custom hash value, for example:
```
public int hashCode(String sources){
	return sources.length()
}
// Pass in the hash to get the hash value: 3
// Pass in the hash to get the hash value: 2
Copy the code
```
This simple algorithm can be used to obtain the corresponding hash value.

If the hash value collision rate is very low, this requires some sophisticated algorithms.
Practical use
- Data integrity verification
  
  For example, download a file from the Internet. The author of the file provides a 5gb file and a hash value. Then you download the file from the Internet and compute the hash value of the file. If it matches the hash value provided by the author, the file is not corrupted, otherwise it may have been tampered with or corrupted.
- Quick finds: hashCode and HashMap
  
  A HashMap principle
  
  The difference between hashCode and equals
  
  If you’ve learned Java, you know that you have to rewrite hashCode to override equals. Why is that?
  
  The data structure of HashMap is in the form of array + linked list. It obtains the corresponding subscript through hashCode and then determines whether to save the data. Data is saved using a key, which must be unique. When you save, you pass in the key, and if you look at the source code, it computes the hash value of the key, and then it determines whether the hash is unique, that is, whether the hash is colluded, and if not, it takes the hash value as the key and stores the value. If the hash is not unique, a hash collision has occurred. It then checks whether the contents of the key are equal to each other through equals. If they are not equal, it is saved, otherwise it is not saved.
- Privacy protection
  - Plaintext: some websites save user information using plaintext, that is, the account password is directly saved, in the database is visible, if the database sent leakage, then others can directly get your account password. Disadvantages of this plaintext storage.
  - Cleartext: What is cleartext? Hash the password. When logging in, you only need to hash the same password to verify that the password is true. Otherwise, it does not work. If the database is compromised, he gets hash values that are useless, so it’s safe.
  - Add salt
    
    Because MD5 is irreversible, you cannot reverse compute the original password after getting the hash. But people who have nothing to do will hash common passwords and compare the hash values to see if they are equal.
    
    For this reason, each site has its own salt, which it hashes when saving passwords, and then saves the corresponding result. This makes it impossible to compare the old password even if you get the hash value. Because the hash value is salted.
    
    So the salt of each website must be strictly protected and cannot be leaked.
- Is Hash code
  
  No, Hash is irreversible. It simply extracts the characteristics of the object and generates a hash value.
- Hash plus password? MD5 is encryption?
  
  In fact, no, encryption refers to reversible, encrypted data can be restored after calculation. But hash and MD5 don’t fall into this category. You can call them “irreversible conversions.”
- Hash and asymmetric encryption
  
  When signing in asymmetric encryption, you need to use the private key to sign the original data and then get the signature file. But if the file is very large, then the signature file is also very large, which can be very wasteful.
  
  Therefore, the hash algorithm is added to the signature, and the process is as follows:
  
  The hash algorithm is used to extract features from the original data to obtain the hash value. The hash value is then encrypted using the private key (encryption with the private key is called a signature) to obtain the signed value.
  
  When verifying: use the public key to verify, and then get the hash value, and then calculate the hash value of the original data, if the same, it means success, otherwise the file has been tampered with.
  
  In this way, no matter how big the original data is when the message is sent, the signature data is very small.

Character set

A Map from integers to literal symbols in the real world

branch
- ASCLL: 128 characters, 1 byte
- Iso-8859:1 byte for ASCLL extension
- Unicode: 130,000 characters, multi-byte
  - Utf-8: Unicode encoding branch
  - Utf-16: Unicode encoding branch
- GBK/GB2312/GB18030
  
  Chinese developed standard, multi – byte, character set + encoding

If this article is helpful to your place, we are honored, if there are mistakes and questions in the article, welcome to put forward!