Message Digest Algorithm MD5 is a hash function widely used in computer security to protect Message integrity. RFC 1321 (R.Rivest,MIT Laboratory for Computer Science and RSA Data Security Inc. April 1992)

Developed in the early 1990s by Ronald L. Rivest of the MIT Laboratory for Computer Science and RSA Data Security Ic, and developed from MD2, MD3, and MD4. Its purpose is to “compress” a large amount of information into a secret format (that is, convert an arbitrarily long byte into a large integer) before signing a private key with digital signature software.

MD5 is most widely used for password authentication and key recognition of various software. It’s colloquially what people call serial numbers.

MD2 algorithm

Rivest developed the MD2 algorithm in 1989. In this algorithm, the information is first filled with data so that the length of the information is a multiple of 16 bytes. A 16-bit checksum is then appended to the message, and the hash value is calculated based on this newly generated information. Later, Rogier and Chauvaud found that ignoring the test would conflict with MD2. The result of MD2 algorithm is unique after encryption (that is, different results after encryption of different information).

MD4 algorithm

In order to enhance the security of the algorithm, Rivest developed MD4 algorithm in 1990. MD4 algorithm also needs to fill in the information to ensure that the bit length of the information plus 448 is divisible by 512 (information bit length mod 512 = 448). Den Boer, Bosselaers and others quickly discovered vulnerabilities that attacked steps 1 and 3 in the MD4 version. Although MD4 algorithm has such a big hole in security, it has a leading role in the emergence of several information security encryption algorithms that have been developed subsequently. In addition to MD5, there are more famous sha-1, RIPEMD and Haval.

The MD5 algorithm

A year later, in 1991, Rivest developed the more technically mature MD5 algorithm. It adds the concept of “safety-belts” to MD4. Although MD5 is slightly slower than MD4, it is more secure. The algorithm clearly consists of four steps that are slightly different from the MD4 design. In THE MD5 algorithm, the size and padding requirements of the message-digest are exactly the same as in MD4. Den Boer and Bosselaers have found false collisions in THE MD5 algorithm, but other than this, no other post-encryption results have been found. Van Oorschot and Wiener have considered a brute-force hash function that searches for conflicts in hashes, And they guessed that a machine designed specifically to search for MD5 collisions (which cost about $1 million to build in 1994) could find one every 24 days on average. However, the fact that there was no MD6 replacement or new algorithm called by any other name in the decade from 1991 to 2001 shows that this flaw did not affect THE security of MD5 much. None of this is enough to be a real problem with MD5. And, since the use of MD5 algorithm does not need to pay any copyright fees, so in general (non-top secret application domain. But even when it comes to top-secret applications, MD5 is an excellent intermediate technique. MD5 is pretty secure anyway. MD5 uses hash function. The most widely used irreversible encryption algorithms in computer networks are MD5 algorithm invented by RSA and SHA hash algorithm recommended by THE National Institute of Technical Standards in the United States.

Application of algorithm

1. Generate summaries of information

A typical use of MD5 is to produce a message-digest of a Message to prevent tampering. For example, in UNIX, there are many software files with the same file name and the file extension is md5. In this file, there is usually only one line of text, and the general structure is as follows: MD5 (file ABC. Tar. Gz) = 0 ca175b9c0f726a831d895e244332461

This is the digital signature of the “file abc.tar.gz”. MD5 takes the entire file as a large text message and uses its irreversible string transformation algorithm to produce a unique digest of the MD5 message. Every person on the planet has his or her own unique fingerprint, which is often the most reliable way to identify criminals; Similarly, MD5 can create an equally unique “digital fingerprint” for any file (regardless of size, format, or quantity). If anyone makes any changes to the file, the MD5 value, or “digital fingerprint”, will change. We often see the MD5 value in the information of a software in some software download sites. Its function is that we can use special software (such as Windows MD5 Check, etc.) to do an MD5 Check on the downloaded files after downloading the software, so as to ensure that the files we get are the same as those provided by the site. MD5 algorithm is widely used in software download station, forum database, system file security and so on.

2. Fingerprint the byte string

A typical use of MD5 is to generate a fingerprint on a piece of Message to prevent it from being “tampered with.” For example, if you write a paragraph in a file called myfile.txt and generate an MD5 value for the file, then you can spread the file to others, and if they change anything in the file, you will find it when you recalculate the MD5 value. If there is a third party authentication authority, using MD5 can also prevent the author of the file “repudiation”, this is the so-called digital signature application.

3. Login authentication

MD5 is also widely used in operating system login authentication, such as Unix, various BSD system login password, digital signature and many other aspects. For example, on UNIX systems, passwords are hashed by MD5 (or other similar algorithms) and stored in the file system. When a user logs in, the system hashes the password entered by the user and compares it with the MD5 value saved in the file system to determine whether the entered password is correct. Through such steps, the system can determine the validity of a user’s login without knowing the explicit password of the user. This prevents a user’s password from being known to a user with system administrator privileges. MD5 arbitrary length of the byte string is mapped to a 128 – bit integer, and it is through the 128 – bit push against the original string is difficult, in other words, even if you see the source program and algorithm description, nor will a MD5 value transform back to the original string, says from the mathematical principles, because the original string has an infinite number, This is a bit like a mathematical function without an inverse function. So, if you have md5 password problems, it is a good idea to use the md5() function of this system to reset the password, such as admin, to generate a string of password Hash value overwrite the original Hash value. For this reason, one of the most popular cryptographic methods used by hackers today is known as “dictionary running”. There are two ways to get the dictionary, one is the daily collection of string list used as password, the other is generated by permutation and combination method, the MD5 program is used to calculate the MD5 value of these dictionary items, and then use the TARGET MD5 value to retrieve in the dictionary. We assume that the maximum length of the password is 8 Bytes, and the password can contain only letters and digits. The total number of dictionary entries is P(62,1)+P(62,2)… .+P(62,8), which is an astronomical number, requires a terabyte disk array to store the dictionary, and this method also requires the premise that the target account password MD5 value can be obtained. This encryption technique is widely used in UNIX systems, which is a big reason why UNIX systems are more robust than normal operating systems.