preface

Floating point numbers in JavaScript often have strange results, such as 0.1 + 0.2! = 0.3 or 1.005.toFixed(2) = 1.00, or the difference between number. MAX_VALUE and number. MAX_SAFE_INTEGER, etc.

Here for JavaScript floating point number storage standards and the above questions have done a more detailed arrangement, I hope to be useful to you.

IEEE 754

In JavaScript, unlike other languages, the Number type does not distinguish between integer and floating point. For all numbers including integers and decimals, the same storage, according to IEEE 754 standard of double precision, 64-bit fixed length, also known as double.

Since both integers and decimals are stored in 64 bits, there is no difference between integer and floating point for memory.

On the bit operation, the operand is converted to a 32 – bit signed number and the decimal part is discarded

The 64-bit bit consists of three parts.

  • The sign bit (S.sign) :63position0For positive numbers,1According to a negative number
  • Index (E.exponent) :52to62A,11Bit. The value ranges from0 ~ 2047. But the exponent can be negative, offset2023The latter value range becomes[- 1023, 1024]
  • Mantissa bits (M.mantissa) :0to51A,52position

The calculation formula is.


V = ( 1 ) S 2 E 1023 ( M + 1 ) V=(-1)^S2^{E-1023}(M+1)

64-bit representation of 11.25.

  • Converts integer and fractional parts to binary, i.e11.25 = 1011.01
  • Move the decimal point to number one1,2Between bits, normalized to1.01101 * 2 ^ 3
  • 11.25Is positive,S = 0. The other index is3,E = 1023 + 3 = 1026, i.e.,100 0000 0010
  • Drop the whole number1, leaving the mantissa01101, space to fill0, i.e.,0110 1000... 0000(52A)
  • 11.25the64The binary representation of a bit floating point number is0 10000000010 01101000... 0000

(Photo credit)

After the normalization of the integer part must be 1, storage can be omitted, only record the part after the decimal point, also save a bit of memory

Rounding to the nearest

In the rounding, 0 ~ 9 ten numbers, 0 is special, there will not be the case of missing, missing is the current number. Of the remaining 9 numbers, 1 ~ 4 are dropped, 5 ~ 9 are up, and the probability is 4/9 and 5/9, so it’s not fair.

The nearest round, or banker round, is four or six or fifty. In the case of 5, it depends on whether the first digit is odd or even. In the case of even, it depends on whether the first digit is odd or even. Therefore, the probability is 50%.

In binary, rounding to the nearest in which more than 1001 1000 in 1011 is less than 1, 1000 is abandoned, and in the case of 1000, the former is an odd or even, if 0 and 1 is 1.

1.001 1001 / / 1.010
1.001 0111 / / 1.001
1.001 1000 / / 1.010

1.100 1001 / / 1.101
1.100 0111 / / 1.100
1.100 1000 / / 1.100
Copy the code

Number.MAX_VALUE

Number.MAX_VALUE is the maximum Number that can be represented in JavaScript.

According to IEEE 754’s 64-bit standard, the following representations are obvious.

0 11111111111 1111111111111111111111111111111111111111111111111111
Copy the code

But note that all the exponent bits are 1 to denote NaN or Infinity, so the maximum exponent bit is 111, 1111, 1110.

0 11111111110 1111111111111111111111111111111111111111111111111111
Copy the code

Convert to decimal.

  1.1111111111111111111111111111111111111111111111111111 * 2 ^ (2046 - 1023)
= 1.1111111111111111111111111111111111111111111111111111 * 2 ^ 1023
= 1 1111111111111111111111111111111111111111111111111111 * 2 ^ (1023 - 52)
= 1 1111111111111111111111111111111111111111111111111111 * 2 ^ 971
= (2 ^ 53 - 1) * 2 ^ 971
= 1.7976931348623157 e+308
Copy the code

That is, number.max_value.

(Math.pow(2.53) - 1) * Math.pow(2.971) / / 1.7976931348623157 e+308
(Math.pow(2.53) - 1) * Math.pow(2.971) = = =Number.MAX_VALUE // true
Copy the code

Number.MAX_SAFE_INTEGER

Number.MAX_SAFE_INTEGER represents the largest safe integer in JavaScript.

Number.MAX_SAFE_INTEGER Indicates the binary Number. Note that the index is exactly 52.

0 10000110011 1111111111111111111111111111111111111111111111111111
Copy the code

Convert to decimal.

  1.1111111111111111111111111111111111111111111111111111 * 2 ^ (1075 - 1023)
= 1.1111111111111111111111111111111111111111111111111111 * 2 ^ 52
= 1 1111111111111111111111111111111111111111111111111111
= 2 ^ 53 - 1
= 9007199254740991
Copy the code

Then, why is it called a safe integer? It means that when the current integer is converted to binary, it can be stored entirely in the mantissa without rounding.

MAX_SAFE_INTEGER + 1 and number. MAX_SAFE_INTEGER + 2 are both omitted.

MAX_SAFE_INTEGER + 2,9007199254740993 binary representation.

100000000000000000000000000000000000000000000000000001
Copy the code

Standardization.

1.00000000000000000000000000000000000000000000000000001 * 2 ^ 53
Copy the code

Because the index bit can only hold 52 bits, the lowest bit is 1 and the preceding bit is 0, and the nearest round is discarded.

0000000000000000000000000000000000000000000000000000(1)
0000000000000000000000000000000000000000000000000000
Copy the code

64-bit floating-point representation.

0 10000110100 0000000000000000000000000000000000000000000000000000
Copy the code

MAX_SAFE_INTEGER + 1

9007199254740992 Specifies the floating point number.

0 10000110100 0000000000000000000000000000000000000000000000000000
Copy the code

Therefore, the following results will result.

Number.MAX_SAFE_INTEGER + 1= = =Number.MAX_SAFE_INTEGER + 2 // true
Copy the code

0.1 + 0.2 != 0.3

0.1

Let’s first convert 0.1 to binary, multiply by 2 and round.

0.1 * 2 = 0.2-- -- -- -- -- the whole00.2
0.2 * 2 = 0.4-- -- -- -- -- the whole00.4
0.4 * 2 = 0.8-- -- -- -- -- the whole00.8
0.8 * 2 = 1.6-- -- -- -- -- the whole10.6
0.6 * 2 = 1.2-- -- -- -- -- the whole10.2
0.2 * 2 = 0.4-- -- -- -- -- the whole00.4
Copy the code

The binary representation of 0.1, 0011 will go on forever.

0.0 0011 0011 0011 (0011)
Copy the code

Normalized, in addition, 0.1 is a positive number, S = 0, the index is -4 + 1023 = 1019, namely 011 1111 1011.

1.1 0011 0011 (0011) * 2 ^ -4
Copy the code

A maximum of 52 mantras are stored, and the mantras are rounded off by 1.

1001100110011001100110011001100110011001100110011001(10011)
1001100110011001100110011001100110011001100110011010
Copy the code

64-bit floating-point representation of 0.1.

0 01111111011 1001100110011001100110011001100110011001100110011010
Copy the code

0.2

0.2 binary.

0.0011 0011 0011 (0011)
Copy the code

Normalization 0.2, S = 0, the index is -3 + 1023 = 1020, i.e. 011 1111 1100.

1.1 0011 0011 (0011) * 2 ^ -3
Copy the code

Store 52 bits, drop the rest, advance 1.

1001100110011001100110011001100110011001100110011001(10011) 
1001100110011001100110011001100110011001100110011010
Copy the code

A 64-bit floating-point number of 0.2.

0 01111111100 1001100110011001100110011001100110011001100110011010
Copy the code

Thus, both 0.1 and 0.2 actually lose precision when converted to 64-bit double-precision floating-point numbers.

0 01111111011 1001100110011001100110011001100110011001100110011010 / / 0.1
0 01111111100 1001100110011001100110011001100110011001100110011010 / / 0.2
Copy the code

Floating-point operation three steps, order, sum, normalization

To order

The indices of 0.1 and 0.2 are not of the same order, so the order should be unified first, and the principle of following the small order to the large order should be followed.

For example, in the decimal system 1.5 * 10 ^ 10 + 1.23 * 10 ^ 13, reserve one place before and after the decimal point.

  • When the larger order is aligned with the smaller order, i.e1.5 times 10 to the 10 plus 1230 times 10 to the 10, the results for1231.5 times 10 to the 10When the rule is discarded1.5 times 10 to the 10
  • When the small order is aligned with the large order, i.e0.0015 * 10 ^ 13 + 1.23 * 10 ^ 13, the results for1.2315 * 10 ^ 13When the rule is discarded1.2 * 10 ^ 13

Obviously, small order to large order is closer to the actual result. If another order is added by 1, the mantissa bit will move 1 bit to the right, and the order will be completed at the same time.

Why does the mantissa move 1 to the right when you add 1 to the order?

The 64-bit floating-point representation of 0.1 is 1.1001 1001… 1001 1001 1010 * 2 ^ -4.

0 01111111011 (1.)1001100110011001100110011001100110011001100110011010
Copy the code

If you increase the order by 1, keep the value the same and move the decimal point 1 place to the left, 0.1 1001 1001… 1001, 1001, 1010 times 2 ^ -3, which is the same thing as moving the mantissa one to the right. When we move to the right, we fill the 1, which is the 1 of the omitted integer. Notice that if we move to the right, we can only fill in 0’s, because the whole integer is going to be 0’s.

0 01111111110 (0.)1100110011001100110011001100110011001100110011001101(0)
Copy the code

Therefore, the logarithmic process is as follows, which can be omitted since the low order is 0.

0 01111111011 1001100110011001100110011001100110011001100110011010 / / 0.1
0 01111111100  100110011001100110011001100110011001100110011001101(0) // The mantissa is moved to the right by 1
0 01111111100 1100110011001100110011001100110011001100110011001101 // void 1
Copy the code

Note that when the mantras are moved to the right, the lower displacement will result in a loss of precision. In order to reduce the error, some of the mantras will be retained and rounded in the future normalization

sum

Same order, sum.

  0 01111111100 1100110011001100110011001100110011001100110011001101 / / 0.1
+ 0 01111111100 1001100110011001100110011001100110011001100110011010 / / 0.2
Copy the code

To make it easier to understand, note that the mantissa bit produces a carry.

  0.1100110011001100110011001100110011001100110011001101
+ 1.1001100110011001100110011001100110011001100110011010
 10.0110011001100110011001100110011001100110011001100111
Copy the code

The canonical

The sum is 10.0 1100 1100… 1100 1100 111 * 2 ^ -3 = 1.00 1100 1100… 1100, 1100, 111 times 2 to the minus 2.

1.00110011001100110011001100110011001100110011001100111
Copy the code

Since the exponential bit can only hold 52 bits, round off and advance by 1.

0011001100110011001100110011001100110011001100110011(1)
0011001100110011001100110011001100110011001100110100
Copy the code

IEEE 754 double precision 64-bit representation.

0 01111111101 0011001100110011001100110011001100110011001100110100
Copy the code

1.0011 0011… 0011 0011 0100 * 2 ^ -2, which is converted to decimal 0.30000000000000004.

summary

To be clear, numbers like 0.1 and 0.2, while perfectly legible in decimal, are infinitely imprecise in binary. However, JavaScript complies with IEEE 754 double precision standard, only 64 bits, the conversion of the base to abandon the low to store, so there is an inevitable loss of accuracy.

In addition, in the process of addition, there may be rounding to order, sum and normalization, and there will be loss of precision.

1.005. ToFixed (2)

The binary representation of 1.005, 0000 1010 0011 1101 0111 will continue indefinitely.

1.000 (0000 1010 0011 1101 0111)
Copy the code

IEEE 754 double precision 64-bit representation.

0 01111111111 0000000101000111101011100001010001111010111000010100(0)
0 01111111111 0000000101000111101011100001010001111010111000010100
Copy the code

1.005 after truncation digits behind, is less than 1.005, using the Number. The prototype. ToPrecision look at 20 accuracy of 1.005.

So toFixed keeps two decimals when rounding.

1.005.toFixed(2) / / 1.00
Copy the code

Number.MAX_VALUE + 1 is not Infinity

Number.MAX_VALUE + 1 === Number.MAX_VALUE

The 64-bit floating-point Number represented by number. MAX_VALUE and 1.

0 11111111110 1111111111111111111111111111111111111111111111111111 // Number.MAX_VALUE
0 01111111111 0000000000000000000000000000000000000000000000000000 / / 1
Copy the code

The order of 1 increases from 0 to 1023 and the mantissa is shifted 1023 to the right.

  0 11111111110 1111111111111111111111111111111111111111111111111111 // Number.MAX_VALUE
+ 0 11111111110 0000000000000000000000000000000000000000000000000000(000.0001.) / / 1
Copy the code

The sum.

  1.1111111111111111111111111111111111111111111111111111
+ 0.0000000000000000000000000000000000000000000000000000(000.0001.)
  1.1111111111111111111111111111111111111111111111111111(000.0001.)
Copy the code

When you normalize, the low value is discarded, which is actually equivalent to number. MAX_VALUE plus 0, so you get the following result.

Number.MAX_VALUE + 1= = =Number.MAX_VALUE // true
Copy the code

2 ^ 970

How much does number.max_value plus Infinity equal?

According to IEEE 754, Infinity is represented by any number greater than or equal to the following.


2 E m a x ( 2 2 1 p 2 ) 2^{E_{max}}(2-\frac{2^{1-p}}{2})

In 64-bit floating-point numbers, Emax is 1023 and p is 53.

  2 ^ 1023 * (2 - 2 ^ (1 - 53) / 2)
= 2 ^ 1023 * (2 - 2 ^ -53)
= 2 ^ 970 * (2 ^ 54 - 1)
Copy the code

The difference between this result and number.max_value is.

  (2 ^ 54 - 1) * 2 ^ 970 - (2 ^ 53 - 1) * 2 ^ 971
= 2 ^ 970
Copy the code

So number.max_value plus at least 2 ^ 970 is Infinity.

Number.MAX_VALUE + 2 ^ 970 === Infinity

MAX_VALUE and 2 ^ 970 64 – bit floating – point representation.

0 11111111110 1111111111111111111111111111111111111111111111111111 // Number.MAX_VALUE
0 11111001001 0000000000000000000000000000000000000000000000000000 / / 2 ^ 970
Copy the code

The 2 ^ 970 order increases from 970 to 1023 with a difference of 53, and the mantissa moves 53 to the right.

  0 11111111110 1111111111111111111111111111111111111111111111111111 // Number.MAX_VALUE
+ 0 11111111110 0000000000000000000000000000000000000000000000000000(1) / / 2 ^ 970
Copy the code

Summation, rounding nearest, rounding low, advancing 1.

  1.1111111111111111111111111111111111111111111111111111
+ 0.0000000000000000000000000000000000000000000000000000(1)
  1.1111111111111111111111111111111111111111111111111111(1)
 10.0000000000000000000000000000000000000000000000000000
Copy the code

Note that the mantissa produces a carry, the exponent is incremented by 1, and the mantissa is moved 1 to the right.

The resulting 64-bit floating-point representation, and the following representation is actually the 64-bit floating-point representation of Infinity.

0 11111111111 0000000000000000000000000000000000000000000000000000
Copy the code

reference

  • JavaScript floating point traps and solutions
  • 0.30000000000000004
  • Why doesn’t 0.1 + 0.2 equal 0.3?
  • Why is a 32-bit floating point number accurate as a 7-bit significant number
  • Why is number. MAX_VALUE + 1 Infinity?

🎉 is at the end

🍻 fellows, if you have seen this article and think it is helpful to you, please like 👍 or Star ✨ to support it!

Manual code word, if there is an error, welcome to the comment area correction 💬~

Your support is my biggest motivation to update 💪~

GitHub/Gitee, GitHub Pages, Nuggets, CSDN updates, welcome to pay attention to 😉~