The previous Blockchain 100 lectures covered blockchain, computing power, mining, and more, with the word Hashing mentioned in almost every lecture. When talking about blockchain, you will also hear “hash”, “hash function”, “hash algorithm”, are you confused? Don’t worry, in this lecture we’re going to talk about what a hash algorithm is.
1
A hash is an encryption algorithm
A Hash Function is also called a Hash Function or Hash Function. A Hash function is an exposed function that maps a Message M of any length to a shorter, fixed-length Value H (M), called a Hash Value, Hash Value, or Message Digest. It is a one-way cipher system, that is, an irreversible mapping from plain text to ciphertext, only encryption process, no decryption process.
Its function expression is: h= h (m)
The output is a fixed-length string of bits, regardless of the input format or the size of the file. Take the Sh256 algorithm used by Bitcoin as an example. Whatever data file is input, the output is 256bit.
So each bit is one zero or one, and a 256bit is a string of 256 zeros or ones, and in hexadecimal numbers, how many digits?
Sixteen is equal to two to the fourth power, so each hexadecimal digit can represent four bits. So, the 256-bit bit is expressed as a hexadecimal number, which of course is 256 divided by 4 equals 64 bits.
So the hash that you would normally see would look something like this:
00740 f40257a13bf03b40f54a9fe398c79a664bb21cfa2870ab07888b21eeba8.
This is a hash copied from bTC.com. If you are not sure, you can count it
2
Characteristics of Hash functions
The Hash function has the following characteristics.
-
Easy to compress: The Hash length is small for any input x. In practice, the Hash length generated by H is fixed.
-
Easy to compute: It is easy to compute the Hash value of any given message.
-
Unidirectional: For a given Hash value, it is difficult to find something that is computationally infeasible, that is, to compute the inverse of the Hash. Given some hash function H and hash value H (M), it is concluded that M is computationally infeasible. That is, the output from the hash cannot be retracted from the original input. This is fundamental to the security of hash functions.
-
Collision-proof: The ideal Hash function is collision-free, but this is difficult to achieve in the design of real algorithms.
There are two kinds of collision resistance: one is weak collision resistance, that is, for a given message, to discover another message, satisfy is computationally infeasible; The other is strong collision resistance, which makes it computationally infeasible for any pair of different messages.
-
** High sensitivity: ** This is from a bit perspective, meaning that a change in the input of 1 bit causes a change in the input of 1/2 bit. Any change to message M causes the hash value H (M) to change. If the input is slightly different, the hash output must be different.
3
The hash algorithm
Convert the url, A, to the number 1. Web address B, converted to the number 2.
A website X, converted into a number N, according to the number N as a subscript, you can quickly find out the information of the website X. The process of transformation is called hashing.
Let’s say you have 10,000 songs, you’re given a new song X, and you’re asked to confirm whether it’s one of those 10,000 songs.
Of course, comparing 10,000 songs one by one is very slow. But if there is a way of ten thousand songs of each data can be enriched to a number (called the hash code), then get the ten thousand number, then use the same encoding algorithm to calculate the new song X, see if song X coding in before the ten thousand figures, will know whether song X in the ten thousand songs.
For example, if you were to organize 10,000 songs, a simple hash algorithm would be to hash the number of bytes the song occupies on the hard drive. That way, you could have 10,000 songs “sorted by size,” come across a new song, and just see if the new song has the same number of bytes as one of the existing 10,000 songs to know if the new song is in that 10,000.
A reliable hash algorithm should satisfy:
-
Given the data M, it is easy to compute the hash X=F(M);
-
It’s hard to figure out M in terms of X;
-
It’s hard to find M and N such that F(N)=F(M).
I mentioned that hash functions are crash-resistant, and crash-proof means that someone has to find even and odd to get the same hash result, but that’s computationally impossible.
First, by squeezing messages from a large space into a small space, collisions must exist. Assuming the hash length is fixed at 256 bits, if the sequence is 1,2… 2 to the 256 plus 1, these 2 to the 256 plus 1 inputs, if I hash them one by one, I’m sure I can find two inputs that have the same hash. But don’t get too excited, because you have to have time to work it out. It’s yours.
According to the birthday paradox, if 2^128+1 of these inputs are randomly selected, there is a 99.8% chance that at least one pair of colliding inputs will be found. So for a hash function with a length of 256 bits, it takes an average of 2^128 hashes to find a collision pair. If the computer performs 10,000 hashes per second, it would take about 10^27 years to complete 2^128 hashes. In blockchain, the crashworthiness of hash functions is used to verify the integrity of blocks and transactions, and tampering can be identified.
As mentioned above, mining requires miners to continuously calculate a value less than the given difficulty value through random numbers. The difficulty ** is an important reference for miners as they dig, determining approximately how many hashes a miner needs to run to produce a legitimate block. Bitcoin blocks are created roughly every 10 minutes, and in order for new blocks to be created at roughly the same rate, the difficulty value must be adjusted to reflect changes in computing power across the network.
The hash function adjusts the difficulty value to ensure that each block is dug in about 10 minutes. The difficulty value calculated by the hash function is of great significance to ensure the security of the blockchain system. As several American computer scientists wrote in a joint book, “Hash ciphers are the Swiss Army knives of cryptography. They find their place in a wide variety of applications that require different hash function characteristics to ensure security.” It has proved extremely difficult to determine a set of hash functions to achieve provable security across the board.”
Proof of work requires a target value. The target value of bitcoin proof of work is calculated by the following formula:
Target value = maximum target/difficulty value
Where, the maximum target value is a constant value:
0x00000000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
The size of the target value is inversely proportional to the difficulty value. ** Proof of work for Bitcoin is achieved by the fact that miners must calculate a block hash value less than the target value.
We can also simply understand that the process of proving the work of Bitcoin is to find out the hash value of a particular format (that is, a certain number of leading zeros is required) by constantly changing the block header (i.e., trying different random values) as the input for SHA256 hash operation. The more leading zeros you need, the harder it is.
4
Here’s an example to help you understand
Scene 1: Xiao Xing and Dumb are on the basketball court
Small star: silly, you are not thirsty, you want to buy water to drink, incidentally help me to buy a bottle of ha.
Ah stay: ha ha, your small idea I still don’t know, you also thirsty, you go, I don’t go.
Xiao Xing: ah ah, let’s not pull these useless, to flip a coin, ok, heads you go, tails I go, fair, how?
Silly: Ok.
……
Scene 2: Xiao Xing and A Silly are chatting on the spot
Silly: little star, come to my home to play today, on the way, there is a pizza shop, very delicious, by the way bring a little ha.
Xiao Xing: Oh, why don’t you come to my house and bring pizza?
Silly: little star, you even said so, it seems that we can only flip a coin.
Xiao Xing: ya, how do you throw this? How do I know if you have a trick.
Silly: well, that’s also true. How about this?
1. Consider encrypting the results
Dumb: I have A number in mind, let’s say A, and then A is multiplied by A number B to get C. A is my key, and I’ll give you the result C. You guess whether A is odd or even, and if you guess right, you win.
Xiao Xing: No, if you tell me C is 12 and I guess A is odd, you can say A is 4 and B is 3. I guess A is even, you could say A is 3, B is 4. Maybe when you tell me what C is, you tell me what B is.
Silly: that’s no good, tell you C and B, not equal to tell you how much A is, still guess fart. No, we have to do it another way.
2. Irreversible encryption
Silly: Xiao Xing, do you think this is ok? I want an A, after the following process:
A+123=B 2.B^2=C 3. A+123=B 2.B^2=C 3. D 4
Dumb: I give you E and the above calculation method, you guess whether A is odd or even, then I tell you what A is, you can follow the above calculation procedure to verify whether I am lying.
Xiao Xing: Well, let’s see, if dumb you think A is 5, then:
5+123=128 128^2=16384 D=638 E=638mod12=53
(mod stands for remainder of division)
Xiao Xing: oh, it’s amazing. A value of A corresponds to A unique value of E. A can’t be calculated according to E. You’re a bitch. Okay, that’s fair. Anyone can catch a lie.
Small star: silly, you set a question……
This type of encryption that loses some of the information is called “one-way encryption,” or hashing.
5
Common hash algorithms
1. Sha-1 algorithm
The input of SHA-1 is a message with a maximum length of less than 264 bits, the input message is processed in 512-bit packets, and the output is a 160-bit message digest. Sha-1 has the advantages of high implementation speed, easy implementation, and wide application range. Its algorithm is described as follows.
-
** Populates the input message: ** After populating, the message’s length mod 512 should be congruent with 448. The padding is 1 in the first place and 0 in the rest. The length before the message was filled is appended in a big-endian fashion to the last 64 bits left by the previous step. This step is necessary even if the message length is already the desired length. The length of the fill ranges from 1 to 512.
-
Initializer buffer: 160 bits can be used to hold the initial variable, intermediate digest, and final digest of the Hash function, but must be initialized first, assigning each 32-bit initial variable, i.e. :
- Enter the message processing main loop to process the message block: A 512-bit message block is processed at a time. A total of four rounds of processing are performed, with 20 operations in each round, as shown in the figure. The four rounds of processing have a similar structure, but each round uses different auxiliary functions and constants. The input for each round is the current group of messages being processed and the current values of the buffer, A, B, C, D, E. The output is still placed in the buffer to replace the old values of A, B, C, D, E. The output of the fourth round is then combined with the input CV of the first roundqAdd them up to produce CVq+ 1, where the addition is a buffer of 5 words CVqEach word in and the corresponding font 2 in32Add.
Figure the processing flow of a single 512 bit message block
- Output: After all message groups have been processed, the output of the last group is the resulting message digest value.
The step functions of SHA-1, shown in figure 1, are the most important functions and key components of SHA-1.
Figure SHA-1 step function
Each time SHA-1 runs the step function, the values of A, B, C, and D are assigned to registers B, C, D, and E. At the same time, the input values, constants, and submessage blocks of A, B, C, D, and E are assigned to A after the step function operation.
Where t is the number of steps, 0≤t≤79, Wt is a 32-bit word derived from the current 512-bit packet, and Kt is an addition constant.
The basic logic function f, whose input is three 32-bit words and output is one 32-bit word, is expressed as follows.
For the exported message group WT of each input group, the first 16 message words wt (0≤ T ≤15) are the 16 32-bit words corresponding to the message input group, and the rest wt (0≤t≤79) can be obtained according to the following formula:
Where, ROTLs represents the left circular shift S bit, as shown in the figure.
Figure SHA-1 shows the generation process of 80 message words
2. Sha-2 algorithm
The output length of the SHA-2 Hash algorithm can be 224, 256, 384, or 512 characters, corresponding to SHA-224, SHA-256, SHA-384, or SHA-512 respectively. It also contains two other algorithms: SHA-512/224 and SHA-512/256. The Hash algorithm has stronger security and more flexible output length than previous algorithms. Sha-256 is commonly used. The first four algorithms are briefly described below.
SHA – 256 algorithm
The input of sha-256 is a message with a maximum length of less than 264 bits, and the output is a 256-bit message digest. The input message is processed in 512-bit packets. The algorithm is described as follows.
(1) Message filling
Add one “1” and several “0” to make its length mod 512 congruent with 448. Appends a 64-bit length block to the end of the message, whose value is the length of the message before filling. The result is a message group with a length of 512 integer multiples. After filling, the message length is up to 264 bits.
(2) Initialize link variables
The intermediate and final results of the link variables are stored in A 256-bit buffer represented by eight 32-bit registers A, B, C, D, E, F, G, and H. The output is still placed in the buffer in place of the old A, B, C, D, E, F, G, and H. The initial link variable is initialized in eight registers A, B, C, D, E, F, G, and H:
The initial link variable is the first 32 bits of the binary representation of the decimal part taken from the square root of the first eight prime numbers (2, 3, 5, 7, 11, 13, 17, 19).
(3) Deal with the main loop module
Message blocks are processed in 512-bit chunks, in 64-step loops (see figure). The input for each round is the grouping of messages currently processed and the resulting 256-bit buffer A, B, C, D, E, F, G, and H of the previous round’s output. Different message words and constants are used in each step, and their retrieval methods are shown below.
Figure SHA-256 shows the compression function
(4) Obtain the final Hash value
After all 512-bit message block groups have been processed, the result of the last group processing is the final 256-bit message digest.
Step functions are the most important functions and key components of SHA-256. Its operation process is shown in the figure.
Figure SHA-256 step functions
Update registers A and E according to the values of T1 and T2. The input values of A, B, C, E, F, and G are assigned to B, C, D, F, G, and H in sequence.
Kt is obtained by taking the first 64 prime numbers (2,3,5,7…). The fractional part of the cube root, convert it to binary, and then take the first 64 bits of those 64 numbers as Kt. It provides a set of 64 bit random strings to eliminate any regularity in the input data.
For the exported message group Wt of each input group, the first 16 message words Wt (0≤ T ≤15) are directly calculated according to the 16 32-bit words corresponding to the message input group, and the rest are calculated according to the following formula:
Figure SHA-256 shows the process of generating 64 message words
SHA – 512 algorithm
Sha-512 is an algorithm with high security performance in SHA-2. It is mainly composed of plaintext filling, message expansion function transformation, and random number transformation. The initial value and intermediate calculation result are composed of eight 64-bit shift registers. The maximum length of the input is 2128 bits, and a 512-bit message digest is generated. The input message is divided into several 1024-bit blocks for processing. The specific parameters are: message digest length is 512 bits; Message length is less than 2128 bits; The message block size is 1024 bits; The message word size is 64 bits; The number of steps is 80. The following figure shows the entire process of processing the message and output the message digest as follows.
Figure SHA-512 shows the overall structure
-
Message filling: Fill one “1” and several “0” so that its length mod 1024 is the same as 896, and the number of fill bits is 0-1023. The length of the message before filling is appended to the fill message with a 128-bit field, whose value is the length of the message before filling.
-
Link variable initialization: The intermediate and final results of link variables are stored in A 512-bit buffer represented by eight 64-bit registers A, B, C, D, E, F, G, and H. The initial link variable is also stored in eight registers A, B, C, D, E, F, G and H, and its value is:
The initial link variable is stored in big-endian mode, that is, the most significant byte of the word is stored in the low address location. The initial link variable is taken from the first 64 bits of the binary representation of the fractional part of the square root of the first eight prime numbers.
-
Main loop operation: The message is processed in 1024-bit groups in an 80-step loop. Each iteration takes the 512-bit buffer values A, B, C, D, E, F, G, and H as input, and the values are taken from the calculation results compressed in the previous iteration. Different message words and constants are used in each step of calculation.
-
Compute the final Hash: After all N 1024-bit groups of the message have been processed, the 512-bit link variable output from the NTH iteration compression is the final Hash.
Step functions are the most critical part of SHA-512, and their operations are similar to SHA-256. The calculation equation of each step is as follows. The update values of B, C, D, F, G and H are the input status values of A, B, C, E, F and G respectively. Meanwhile, two temporary variables are generated to update registers A and E.
For each step t in the 80-step operation, a 64-bit message word Wt is used, whose value is exported from the 1024-bit message grouping Mi currently being processed, as shown in the figure. The first 16 message words Wt (0≤ T ≤15) correspond to the 16 32-bit words after the message input group, and the others are calculated according to the following formula:
Figure SHA-512 shows the process of generating 80 message words
Among them,
In the formula, ROTRn (X) indicates that the 64-bit variable X is rotated n bits right, and SHRn (X) indicates that the 64-bit variable X is rotated n bits right.
As can be seen from the figure, in the first 16 steps, the value of Wt is equal to the corresponding 64-bit word in the message group, while in the remaining 64 steps, its value is calculated from the previous 4 values, and two of the 4 values need to be shifted and cyclic shifted.
Kt is obtained by taking the first 80 primes (2,3,5,7…). The fractional part of the cube root, converting it to binary, and then taking the first 64 bits of the 80 numbers as Kt provides a random set of 64 bits to eliminate any regularity in the input data.
With SHA SHA – 224-384
Sha-256 and SHA-512 are fairly new Hash functions. The former defines a word as 32 bits and the latter as 64 bits. In fact, the structure of the two is the same, but there are differences in the number of cycles and the use of constants. Sha-224 and SHA-384 are truncated versions of the previous two Hash functions, which compute with different initial values.
Sha-224’s input message length is the same as sha-256’s, also shorter than 264 bits, and its packet size is 512 bits. The processing process is similar to sha-256’s, but there are two differences.
-
The message digest of SHA-224 is taken from the bit words of seven registers A, B, C, D, E, F, and G, while the message digest of SHA-256 is taken from the 32-bit words of eight registers A, B, C, D, E, F, G, and H.
-
Sha-224’s initial link variable is not the same as sha-256’s. It is stored in a high-end format, but it is obtained by taking the second 32 bits of the binary representation of the square root of the first 9 to 16 primes (23, 29, 31, 37, 41, 43, 47, 53). Sha-224’s initial link variables are as follows:
The detailed calculation procedure for SHA-224 is the same as that for SHA-256.
Sha-384 input message length is the same as SHA-512, also less than 2128 bits, and the packet size is also 1024 bits. The processing process is basically the same as SHA-512, but there are two differences.
-
The 384 bit message digest of SHA-384 is taken from A, B, C, D, E, and F, while the 384 bit message digest of SHA-512 is taken from A, B, C, D, E, F, G, and H.
-
Unlike sha-512, sha-384’s initial link variable is also stored in high-end format, but is obtained by taking the first 64 bits of the binary representation of the square root of the first 9 to 16 primes (23, 29, 31, 37, 41, 43, 47, 53). Sha-384’s initial link variables are as follows:
The detailed calculation steps for SHA-384 are the same as those for SHA-512.
Sha-3 algorithm
Sha-3 algorithm adopts Sponge structure, which is divided into two stages: absorption and extraction. The core substitution F of SHA-3 acts on a 5×5×64 three-dimensional matrix. The whole f has 24 rounds, and each round includes 5 elements θ, ρ, π, χ and τ. The five parts of the algorithm are applied to different dimensions of the three dimensional matrix. The θ element is a linear operation on a column; ρ link is acting on each line of linear operation, each line of 64 bits on the cycle shift operation; The π link is a linear operation that moves the elements of each path onto another path. χ segment is a nonlinear operation acting on each row, which is equivalent to replacing 5 bits in each row with another 5 bits. τ segment is a plus constant segment.
At present, the security analysis of SHA-3 algorithm in public literature is mainly carried out from the following aspects.
-
Collision attacks, preimage attacks, and second preimage attacks against the SHA-3 algorithm.
-
The core permutation of SHA-3 algorithm is analyzed. This kind of analysis mainly focuses on the distinction between algorithmic permutation and random permutation.
-
The difference characteristic of SHA-3 algorithm is developed, and the high probability difference chain of SHA-3 permutation is studied, and the difference partition splitter is constructed.
The idea of Keccak’s stereo encryption and sponge structure make SHA-3 superior to SHA-2 and even AES. The Sponge function establishes a mapping from an input of any length to an output of any length.
RIPEMD160 algorithm
RIPEMD (RACE Integrity Primitives Evaluation Message Digest) is a Message Digest for RACE Integrity check. RIPEMD uses the design principle of MD4 and improves the algorithm defects of MD4. Ripemd-128 was first released in 1996, and it is similar to SHA-1 in performance.
Ripemd-160 is an improvement of RIPEMD-128 and is the most common version of RIPEMD. Ripemd-160 outputs 160-bit Hash values. Violent collision search attacks on 160-bit Hash functions require 280 computations, greatly improving the computation intensity. The design of RIPEMD-160 fully absorbs some of the performance of MD4, MD5, and RIPEMD-128, so that ripEMD-160 has better collision resistance. It is intended to replace the 128-bit Hash functions MD4, MD5, and RIPEMD.
Ripemd-160 uses a 160-bit cache to store the intermediate result and the final Hash value of the algorithm. This cache consists of five 32-bit registers A, B, C, D, and E. The initial value of the register looks like this:
Data is stored in the form of low – byte storage at low – address.
The core of the processing algorithm is a compression function module with 10 loops, each of which consists of 16 processing steps. Using different primitive logic functions in each loop, the algorithm’s handling is divided into two different cases, in which five primitive logic functions are used in reverse order. Each loop returns A new value with input from the currently grouped message word and 160-bit cache values A, B, C, D, E. Each loop uses an extra constant, and at the end of the last loop, the final output is produced by adding the results A, B, C, D, E, A ‘, B ‘, C ‘, D ‘, E ‘and the initial values of the link variables. After processing all 512-bit packets, the resulting 160-bit output is the message digest.
In addition to the 128-bit and 160-bit versions, RIPEMD algorithms also have 256-bit and 320-bit versions, which together constitute the four members of the RIPEMD family: RIPEMD-128, RIPEMD-160, RIPEMD-256, and RIPEMD-320. The security of the 128-bit version has been questioned. The 256-bit and 320-bit versions reduce the possibility of accidental collisions, but they do not have a higher level of security than RIPEMD-128 and RIPEMD-160, because they are only based on the 128-bit and 160-bit versions. The initial parameters and s-box were modified to achieve 256-bit and 320-bit outputs.
Content Source:
Public id: Blockchain and Cryptocurrency Research, EXV Planet, Blockchain Addison
Huazhang Books: Linking the Future: Embracing the New Era of Blockchain and Digital Assets
Lecture 100: What is blockchain from the village ledger
Blockchain lecture 100: Why is blockchain called “block” “chain”?
Blockchain lecture 100: What is the Byzantine problem?
Blockchain lecture 100: Bitcoin gives central banks an example, says the World Bank
Blockchain lecture 100: What is the waste of computing power used to bash blockchain?
Blockchain 100 talk: Zhihu thousands of praise answer about clear mining process
Block chain 100 talk: single fight or group fight, two kinds of mining way you want to choose?
Block chain the encyclopedia dark | chatted 50 WeChat group, learning block chain required this 20 books
More than 400 white papers: Back to Basics on blockchain Technology
Activity recommended
Theme: Blockathon, challenge blockchain development, dare to come! (Click for details)
May 25-27, Blockathon2018 Beijing station, recruit 100 developers to challenge blockchain development.
Developers free, registration is subject to review. To register, identify the QR code below or click “Read the original text”.