This is the third day of my participation in Gwen Challenge

preface

To learn more, I plan to do a Why series of my own.

  • Interviewer: How much is 0.1 + 0.2
  • Classmate: It is not equal to 0.3, because of the accuracy problem
  • Interviewer: Can you tell me more about it
  • Classmate:…

Above the students, is once this I!

So,!

To solve 0.1 + 0.2 this primary school students will be the problem, roughly divided into four steps

  1. Hexadecimal conversion
  • Decimal to binary
  • Binary to decimal
  1. IEEE 754 floating point standard
  2. Floating point calculation
  3. 0.1 + 0.2

Hexadecimal conversion

Decimal to binary

  • Integer: mod by 2 and reverse order. Specific approach is: use 2 to divide decimal integer, can get a quotient and remainder; Then divide the quotient by 2, and another quotient and remainder will be obtained, and so on until the quotient is less than 1. Then, the remainder obtained first as the lowest significant bit of the binary number, and the remainder obtained later as the highest significant bit of the binary number are arranged in order.
  • Decimals: multiply by 2 and arrange in order. The specific approach is: multiply the decimal fraction with 2, you can get the product, take out the integral part of the product, and then multiply the remaining decimal part with 2, and get a product, and then take out the integral part of the product, and so on, until the decimal part of the product is zero, at this time 0 or 1 is the last bit of binary. Or achieve the required accuracy.

We still use 9.375 for analysis,

Let’s look at the integer part: 9, and as a rule,Mod 2, reverse order.

The result is: 1001

Let’s look at the decimal part: 0.375, as the rule goesMultiply by 2 and arrange in order

The result is: 011

Combined 9.375 = integer 2 + decimal 2 = 1001 +.011 = 1001.011

Validation:

(9.375).toString(2)  / / 1001.011
Number.prototype.toString.call(9.375.2)  / / 1001.011
Number.prototype.toString.call( Number(9.375), 2) / / 1001.011
Copy the code

Binary to decimal

  • Before the decimal point or the integer is multiplied from right to left by each binary number to the corresponding power of 2 and increasing
  • After the decimal point, we’re multiplying from left to right by two to the corresponding negative power and decreasing.

For example, the binary number 1001.011 is converted to decimal

Integer part: 1001 = 1 * 20 + 0 * 21 + 0 * 22 + 1 * 23 = 1 + 0 + 0 + 8 = 9 011 = 0 * 2-1 + 1 * 2-2 + + 1 * 2-3 = 0 + 0.25 + 0.125 = 0.375

1001.0112 = integer part + decimal part = 910 + 0.375 10 = 9.375

How do we verify the results, of course, call Number. The prototype. ToString

(9.375).toString(2)  / / 1001.011
Number.prototype.toString.call(9.375.2)  / / 1001.011
Number.prototype.toString.call( Number(9.375), 2) / / 1001.011

Copy the code

IEEE 754 floating point standard

Take the double precision floating point format, as shown above, with three parameters S E M:

Name Length Bit Position symbol bit Sign (S) : 1bit (B63) Exponent (E) : 11bit (B62-B52) Mantissa (M) : 52bit (B51-B0) The bias code used for the exponential part of the double precision (E) is 1023. S=1 means negative number and S=0 means positive numberCopy the code

(-1)^S*(1.m)*2^(e-1023) there is an additional parameter, the offset code 1023, more can be seen in IEEE 754 floating point standard 64-bit floating point. Binary might be hard to understand, but let’s start with a base 10 number, like 1001.125, which we can write as

  • 1001.125
  • 100.1125 * 101
  • 10.01125 * 102
  • 1.001125 * 103

The above (-1)^S*(1.m)*2^(e-1023) is the same as 1.001125*103, but in binary. If it’s a decimal, 0.0000125 is 1.25 times 10 minus 5

Let’s look at a binary example, such as 103.0625

  • S the sign bit

S is 0 because it’s positive

  • EIndex a
    1. Convert to binary1100111.0001
    2. Normalize 1.1001110001 * 26, 6 = E-1023, E = 1029
    3. E = 1029, convert to binary10000000101
  • M mantissa bits

Value of 1001110001 to 1.1001110001 M, because there are 52 length, adding 0 line behind, and the result is 1001110001000000000000000000000000000000000000000000

Joining together to 0-10000000101-1001110001000000000000000000000000000000000000000000

To verify this, go to IEEE 754 64-bit conversion tools.

You know, the decimal system has an infinite number of bad decimals, and the binary system also exists. In this case, base 10 is rounded, how about binary. It’s just 0 and 1, so 1 goes in. Of course, IEEE 754 has several rounding error modes. Read the IEEE 754 floating point standard for more details.

We look at 1.1, the tail section 000110011001100110011001100110011001100110011001100152 (11001) cycle, 53 is 1, carry it for you, The results of 0001100110011001100110011001100110011001100110011010

Now that we know how to convert decimal to binary floating-point storage, we can do the math.

Floating point calculation

The addition and subtraction operations of floating point numbers are generally completed by the following five steps: order matching, mansa operation, normalization, rounding, overflow judgment. For more details, read the steps for floating point numbers.

1. To order

The order here is the exponent digit, or simply the exponent digit is the same. △ E = E x-E y. Add △ E to the lower order code to make it equal to the higher order code. Take decimal notation for example, 123.5 + 1426.00456

  • Equivalent to 1.235*102 + 1.42600456 * 103
  • Align with the index high, the high is 3, become 0.1235*103 + 1.42600456 *103

123.5 + 1426.00456 = 0.1235*103 + 1.42600456 *103 = (0.125 + 1.42600456) *103 = 1.54950456 *103 = 1549.50456

For example, in decimal, the computer executes in binary. There could be several problems with this process.

  • It’s going to move to the right when it’s small versus big, because the exponent part, at most 52 bits, is going to be lost.
  • Adding or subtracting, the values may overflow, and hence the overflow judgment.

2. Mantissa operation

The mantissa operation is the addition and subtraction of the mantissa after the order is completed.

3. Normalization of results

In the machine, floating point numbers are stored in normalized form to ensure the uniqueness of their representation. For IEEE754 floating-point numbers, the mantissa must be of the form 1.m. Take decimal again.

80.5 + 90 = 8.05*101 + 9.0*101 = 17.051 Mantissa must be in the form of 1.m, normalized => 1.705 2

4. Rounding

Floating point operation in order or right gauge, mantissa needs to move right, the right out of the bits will be lost, resulting in the loss of accuracy of the operation result. To reduce this loss of precision, a certain number of removed bits can be reserved, known as guard bits, for rounding after normalization. The IEEE754 standard lists four alternative methods of rounding, with rounding used by default.

5. Overflow judgment

Different from fixed-point operation, the overflow of floating point number is judged by whether the value of the order code resulting from the operation produces an overflow. If the value of the order code exceeds the maximum positive number that the order code can represent, it is an overflow. Further, if the floating point number is positive at this time, it is a positive overflow, denoted as +∞; if the floating point number is negative, it is a negative overflow, denoted as -∞. If the value of the order code exceeds the minimum negative number that the order code can represent, it is underflow. Further, if the floating point number is positive, it is positive underflow, and if the floating point number is negative, it is negative underflow. Both positive and negative underflows are treated as 0

0.1 + 0.2 != 0.3

Convert to IEEE 754 standard binary data structure

On that basis, let’s start our topic 0.1 + 0.2! = 0.3, the computer will first convert decimal to binary and then add. The termination condition is until the fractional part of the product is zero, but from the following results, it is simply expressed as

Infinite loop, infinite loop, no, it’s the turn of the IEEE 754 standard, which defines single-precision and double-precision floating-point formats

2 * 0.1   
2 * 0.2        0

2 * 0.4        0
2 * 0.8        0
2 * 1.6        1
2 * 1.2        1

2 * 0.4        0
2 * 0.8        0
2 * 1.6        1
2 * 1.2        1

2 * 0.4        0
2 * 0.8        0
2 * 1.6        1
2 * 1.2        1. An infinite loop0011.Copy the code

Let’s start with 0.1 as an example

  • The sign bit S is zero because it’s positive.
  • Exponential part EFirst we convert 0.1 to base 2 0.0 0011 0011 (0011)An infinite loopBecause it is a positive number, the sign bit is 0. Then we express it in terms of normalized binary floating-point numbers, so it starts with 1.bbbbb… In this form, 1.1001 1001 (1001)An infinite loop x 24 –.

    E-1023 = -4 so E = 1019, the binary representation of 1019 is 1111111011, because there are 11 bits, 0 in front of it, 01111111011
  • Mantissa part M, an infinite loop 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001An infinite loop

    IEEE 754 Floating-Point uses round to nearest, tie to even rounding mode when 64-bit storage space cannot store complete infinite repeating decimals. 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 10011001The carry is 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1010

Finally you can also use a scale of 0-01111111011-1001100110011001100110011001100110011001100110011010 IEEE 754 64 conversion tool to verify their results are correct.

The same 0.2 the final result of 0-01111111100-1001100110011001100110011001100110011001100110011010

Next, standard floating-point arithmetic is performed

To order

Small versus large. 0.1 index 01111111011 = 1019 0.2 index 01111111100 = 1020 0.2 index is large, 0.1 adjustment index bit is 01111111100, at the same time the digit part moves one bit to the right, as follows:

0.1    0-01111111100-11001100110011001100110011001100110011001100110011010
0.2    0-01111111100-1001100110011001100110011001100110011001100110011010   
Copy the code

Mantissa operation

You can see there’s a carry

Mantissa 0.1 0.11001100110011001100110011001100110011001100110011010-1.1001100110011001100110011001100110011001100110011010 - 0.2 tail -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 10.01100110011001100110011001100110011001100110011001110 the resultsCopy the code

Normalization of results

Need one moves to the right, E + 1 = 1. 1020 + 1 = 1021 = 1111111101 M = 1.001100110011001100110011001100110011001100110011001110

Rounding processing

Mantissa decimal part 0011001100110011001100110011001100110011001100110011 10 length is 54, Round 0011001100110011001100110011001100110011001100110100

Overflow check

No index spillover

Result calculation:

E is 1021, S is 0 (1) ^ S * (1 M) * 2 ^ (E – 1023) = > (1.0011001100110011001100110011001100110011001100110100) * 2 ^ (2) = > 0.010011001100110011001100110011001100110011001100110100

Through online base conversion, the result is 0.30000000000000004.

Outside the words

Given this problem, how can we always guarantee the correctness of the results?

  • For example, if you know money, you can multiply 100 by two decimal places
  • Compared to a very small value. Such as Number. EPSILON

IEEE 754 64-bit conversion tool 0.1 + 0.2 === 0.30000000000000004