background

Enter the following two lines of code in the browser console window, and you get unexpected results

0.1 + 0.2 > 0.3  // true
0.1 + 0.2 = 0.30000000000000004
0.1 * 0.1 = 0.010000000000000002
Copy the code

Front knowledge

  • In the computer world, there would be only binary data, either 0 or 1, and then there would be a conversion process in order to express the most common decimal data in life

Decimal to binary

  • The whole process of converting from decimal to binary is summarized as follows:

    The integers are mod by integer divided by 2 until the quotient is 0, and the remainder is arranged in reverse order. The decimals are multiplied by 2 and rounded until the decimal part is 0 or the required precision is achieved. The integers are arranged in order

Binary to decimal

// Take binary 10101101.1101 as an example
// For the integer part 10101101, the calculation logic is as follows
// ← go from right to left
1 * 2^0 + 0 * 2^1 + 1 * 2^2 + 1 * 2^3 + 0 * 2^4 + 1 * 2^5 + 0 * 2^6 + 1 * 2^7
= 1 + 0 + 4 + 8 + 0 + 32 + 0 + 128
= 173

// The decimal part 1101 is calculated as follows
// From left to right →
1 * 2^ -1 + 1 * 2^ -2 + 0 * 2^ -3 + 1 * 2^ -4
= 1/2 + 1/4 + 0 + 1/16
= 13/16
= 0.8125
Copy the code

Scientific enumeration

  • The scientific notation for decimal 173.8125 is 1.738125 * 10^2
  • Decimal 173.8125 corresponds to binary 10101101.1101, which can be further represented by binary scientific notation, which is 1.01011011101 * 2^7. Similar to decimal notation, substituting base 10 for 2, 7 represents the number of decimal places to the right

    Key points: 1.01011011101 * 2^7 is binary, and the process of converting it to base 10 is as follows: First convert 1.01011011101 as base 2 to base 10, and get 1.35791015625. Then multiply it by 2^7 (1.35791015625 * 128) to get 173.8125 in decimal

Binary boundary problem

The maximum range of values that can be represented by n bits of binary is 1 for each of the n bits, corresponding to 2^ n-1 in decimal

// The maximum value that can be represented by 8-bit binary 11111111 can be calculated in two ways
// 1
1*2^0 + 1*2^1 + 1*2^2 + 1*2^3 + 1*2^4 + 1*2^5 + 1*2^6 + 1*2^7 = 127
// 2^n - 1
2^8 - 1 = 127
Copy the code

IEEE 754 Specification – Double Precision (64-bit)

  • JS is known to have only one numeric type, Number, which strictly follows IEEE754 64-bit double precision floating-point rules defined by the ECMAScript specification. Floating point number representation has the following characteristics:

    • Floating point numbers can represent a much larger range of values than the integer representation of the same number of digits
    • Floating-point numbers cannot accurately represent all values in their range of values, whereas signed and unsigned integers accurately represent every value in their range of values
    • Floating point numbers can only accurately represent values of m*2e
    • When biased-Exponent is 2E-1-1, the floating point number can accurately represent the integer values in the range
    • When bias-Exponent is not 2E-1-1, the floating point number cannot accurately represent the integer values in the range

    Because some values cannot be accurately represented (stored), the deviation becomes more obvious after the calculation

  • IEEE 754 floating-point numbers are made up of three parts: sign bit, exponent bias, and fraction, so a JavaScript number representation of the binary should look like this:

    • Sign bit: 1bit. 0 indicates a positive number and 1 indicates a negative number
    • Exponent bias: 11bit, denoting the number to which the power of 2 is defined in scientific notation, the number of digits to be moved; Since there will be positive and negative numbers, an offset is used here, that is, the real index +1023, which represents [-1023 ~ 1024]; So minus 1023 is all 0, and 1024 is all 1
    • Fraction: indicates the accuracy (decimal part, the specification will omit the 1 in the units digit), 52bit. Note that since the first digit of the decimal point must be 1 (hidden digit), it is actually 52+1=53 digits
  • A float (Value) can be represented as Value = sign * exponent * fraction

  • The decimal conversion to IEEE 754 goes through three main processes

    • Convert to binary representation
    • The converted binary is represented by scientific notation
    • Converts the binary represented by scientific notation to the IEEE 754 standard representation
  • The decimal 0.1 conversion result is as follows

    (1) will be converted to binary 0.1 0.00011001100110011001100110011001100110011001100110011001 (1001) infinite loop (2) converts the binary science counting method are: ^ 2-4 * 1.1001100110011001100110011001100110011001100110011001 (1001) infinite loop (3) can get from scientific notation sign value is 0, the exponent value is – 4, -4 + 1023 (fixed offset fixed value, In the 64-bit case 1023) = 1019 is then converted to an 11-bit binary for 01111111011 (4) Fraction is the value after the decimal point in scientific notation 1001100110011001100110011001100110011001100110011010 (the longest fraction is 52, if the length of the decimal places not 52 with 0 is lacking, but here more than 52 so here produced truncation, The computer will not automatically store the 1 to the left of the decimal point. When converting to base 10 again, the computer will automatically add the 1 to the left of the decimal point. Finally, the binary stored in the computer is: 0, 01111111011, 1001100110011001100110011001100110011001100110011010

    In step (3), you can see that 0.1 is truncated during the storage process, because the computer can only store 52 bits at most, so there is an accuracy bias here, so that when converting binary to decimal again, the original value will not be the same

    The last digit of a decimal decimal that can be converted to a finite binary decimal must end in 5 (because only 0.5 * 2 becomes an integer, that is, binary can accurately represent a decimal with finite digits and a multiple of 2 in the denominator), So the decimal values between 0.1 and 0.9, with the exception of 0.5, are lost in binary conversion

  • The steps for converting binary to decimal again are as follows:

    • If sign is 0, it means a positive +
    • The offset (exponent) is 01111111011, converted to 1019 in decimal, and then subtracted from the fixed offset 1023 to give the exponent -4
    • Decimal point on the right side of the binary (significand) for 1001100110011001100110011001100110011001100110011010, because the store is omitted when the left of the decimal point 1 so you need to add, 1.1001100110011001100110011001100110011001100110011010
    • Finally using the above formula to get value = + 2 ^ 4 * 1.1001100110011001100110011001100110011001100110011010 finally put the string is converted to a decimal to binary 0.100000000000000005551115123126 can be
  • So the previous formula can be written as follows: where e is the exponential offset

  • When calculating the exponential offset value above, a fixed offset value 1023 will be added, which can be calculated by the following formula:



    Where e is the length of the bit storing the exponent, e is 11 in 64-bit double precision binary, so the fixed offset can be calculated to be 1023

  • USES the index of the actual value (such as the front – 4) and a fixed offset value way said floating-point index, advantage is that can use the length of e a bit unsigned integer to represent all the index value, which makes the comparison of two floating-point number index size more easily, can, in fact, according to the size of the dictionary order compare two floating-point representation

    The actual value of the index may be positive or negative, so one of the 11 bits should be used to represent the sign bit. If the complement code is used, the whole sign bit and the sign bit of the index itself cannot be simply compared in size. Therefore, this method of adding fixed offset values is adopted, which is called order code in Chinese

  • So the exponential part of the 64-bit double precision binary can represent the range of -1023 to +1023 (2^ 10-1, the 11th bit is the sign bit), with a fixed offset value of 1023, the exponential offset value range is 0 to 2046, so the exponential offset value is positive. We don’t need to worry about symbols when storing, we can directly store the exponential offset with 11 bits, and finally store the exponential offset in the range of 0 to 2^ 11-1 (2047)

  • In addition, in order to restore the actual value of the index, the offset value of the index needs to be subtracted from the fixed offset value 1023. The actual value of the final index before the fixed offset value is added ranges from -1023 to 1024, where -1023 is used to represent 0, and 1024 is used to represent infinity. Excluding these two values, the index ranges from -1022 to 1023

  • The previous formula needs to be subdivided. There are two formulas:

    • When the exponential offset is not 0, the following formula is used

    • When the exponential offset is 0, the following formula is used

      sign E exponent offset = actual value + fixed offset 1023 fraction The calculation process The values in the JS
      0 11111111110 (1023 + 1023) 1111111111111111111111111111111111111111111111111111 (1) ^ 0 x ^ 2 = 1.7976931348623157 e+308 1024 Number.MAX_VALUE
      0 00000000000 (1023 + 1023) 0000000000000000000000000000000000000000000000000001 (-1)^0 x 2^-52 x 2^-1022=5e-324 Number.MIN_VALUE Specifies the minimum positive Number
      0 52 (10000110011 + 1023) 1111111111111111111111111111111111111111111111111111 (-1)^0 x 2^53=9007199254740991 Number.MAX_SAFE_INTEGER
      1 52 (10000110011 + 1023) 1111111111111111111111111111111111111111111111111111 (-1)^1 x 2^53=-9007199254740991 Number.MIN_SAFE_INTEGER
      0 11111111111 (1024 + 1023) 0000000000000000000000000000000000000000000000000000 (-1)^0 x 1 x 2^1024 Infinity is infinite
      1 11111111111 (1024 + 1023) 0000000000000000000000000000000000000000000000000000 (-1)^1 x 1 x 2^1024 – Infinity minus Infinity
      0 00000000000 0000000000000000000000000000000000000000000000000000 (-1)^0 x 0 x 2^-1022 0
      1 00000000000 0000000000000000000000000000000000000000000000000000 (-1)^1 x 0 x 2^-1022 0

    In the calculation, there is a step not written, using the first example, to see where 2^1024 comes from: 1.(52 ones) x 2^(2046-1023) = 1.(52 ones) x 2^(1023-52) = (53 ones) x 2^(1023-52) = 53 bits of binary notation is (2^ 53-1) x 2^971 1.(52 1’s) × 2^(2046-1023) = 1.999999999998 * 2^1023

  • Crisis of large numbers, why is 2^53-1 the greatest safe integer? How about bigger than that?

    • Use 2^53 to explain why 2^53-1 is the safest integer. What’s safe

      Turn 2 ^ 53 binary = > 100000000000000000000000000000000000000000000000000000 (53) into scientific notation = > 1.00000000000000000000000000000000000000000000000000000 (53 0) x 2 ^ 53 into computer = > mantissa bits only 52 so removed at the end of the 0 only 52 0

      Turn 2 ^ 53 + 1 binary = > 100000000000000000000000000000000000000000000000000001 (52 0) into scientific notation = > 1.00000000000000000000000000000000000000000000000000001 (52 0) x 2 ^ 53 into computer = > mantissa bits only 52 so removed at the end of the 1 only 52 0

    • As can be seen, 2^53 and 2^53+1 are stored in the computer fraction part, the index part are the same, so two different numbers in the computer storage is the same, when greater than this safety value may appear precision loss, so it is very unsafe. So 2^53-1 is the largest safe integer in JavaScript

  • If the accuracy of decimals is lost in computers, why does num = 0.1 get 0.1?

    • Number.toprecision () is similar to toFixed, denoting the Number of significant digits to be reserved. We know that 0.1 loses precision in the process of storage, because it is an infinite loop when converting to base 2. The reason why we write 0.1 can get 0.1 is because the JS help does the processing.
  • Const num = 0.1

    Number.toprecision () is similar to toFixed, denoting the Number of significant digits to be reserved. We know that 0.1 actually loses precision in the process of storage, because it is an infinite loop when converting to base 2. The reason why we write 0.1 can get 0.1 is because JS does the processing for us

    As you can see from the figure, we left 0.1 with 25 significant digits, but the result is not 0.1, so js does the truncation for us by default. Then the question can be translated into:What are the rules for truncating a double floating-point number?In a double – precision floating – point numberEnglish wikiCan be found in the can know

    • If the decimal significant digit does not exceed 15 bits, then the decimal is the same when stored and read, and JS does not truncate
    • If the decimal significant digit has at least 17 digits, the presicion can store only 53 digits at most, and the following digit is truncated. After truncation, a truncated digit is recalculated. For example, 0.1 is the same as 0.10000000000000001 (17), because the latter is stored in the same form as the former, so the latter is stored in 0.1
  • Why does example 1.335.tofixed (2) get 1.33?

    • 1.335 when stored as a number is actually stored as 1.335. The representation of the float in the figure below, although truncated at 53 bits and beyond, is converted to base 10, resulting in 1.335

    • Why do we call toPrecision and get the value of 1.335 before it was truncated? The Number format is limited by the maximum Number of digits stored, so 1.335 will be truncated. However, when we call toPrecision(), we can see that the string is actually represented as a string, no matter how long the string is. So we can get what it was before it was truncated. When you convert the truncated value to Number, you will get the truncated value again because you are limited by the Number of digits a double-precision floating-point Number can store

0.1 + 0.2! == 0.3 -> true analysis

With the above matting, since 0.1 + 0.2! == 0.3, so it’s not just JavaScript that can cause this problem, but anything that uses IEEE 754 double-precision floating-point encoding to represent floating-point numbers. The analysis process

/ / 0.1
e = -4;
m = 1.1001100110011001100110011001100110011001100110011010 (52A)/ / 0.2
e = -3;
m = 1.1001100110011001100110011001100110011001100110011010 (52A)Copy the code

The m here refers to the 52 digits after the decimal point, and the integer part 1 before the decimal point is the hidden bit mentioned earlier

And then you add them up, and one of the questions here is what do you do if you don’t have consistent exponents, usually you move it to the right, because even if you overflow on the right, the loss of precision is much less than if you overflow on the left

e = -4;
m = 1.1001100110011001100110011001100110011001100110011010 (52Bit) + e = -3; 
m = 1.1001100110011001100110011001100110011001100110011010 (52A)Copy the code

conversion

e = -3; 
m = 0.1100110011001100110011001100110011001100110011001101 (52Bit) + e = -3; 
m = 1.1001100110011001100110011001100110011001100110011010 (52A)Copy the code

get

e = -3; 
m = 10.0110011001100110011001100110011001100110011001100111 (52A)Copy the code

Reserve one integer

e = -2;
m = 1.00110011001100110011001100110011001100110011001100111 (53A)Copy the code

It has already overflowed (more than 52 bits), so it’s time to round. How do I round it to the nearest number? For example, if 1.101 is reserved for 2 decimal places, the result could be 1.10 and 1.11, both of which are equally close. The rule is to keep the even number, which in this case is 1.10

So before we go back up here, the result is this

m = 1.001100110011001100110011001100110011001100110011010052A)Copy the code

And then you get the final binary number

2 ^ -2 * 1.0011001100110011001100110011001100110011001100110100 
= 0.010011001100110011001100110011001100110011001100110100
Copy the code

Finally converted to decimal: 0.30000000000000004

The solution

Special constant number.epsilon

According to the specification, number. EPSILON represents the difference between 1 and the smallest floating point Number greater than 1. In fact, number. EPSILON is the smallest precision that JavaScript can represent

Compares the absolute difference between two numbers and number.epsilon

function numbersEqual(num1, num2) {
    return Math.abs(num1 - num2) < Number.EPSILON
}
const a = 0.1+0.2, b=0.3;
console.log(numbersEqual(a, b)); // true
Copy the code

For compatibility reasons, this property is supported in Chrome, but not in IE (IE10 is not compatible), so there are incompatibilities to resolve in IE

Number.EPSILON=(function(){   // Resolve compatibility issues
  return Number.EPSILON?Number.EPSILON:Math.pow(2, -52); }) ();// This is a self-calling function that checks and returns a result as soon as the JS file is loaded into memory
//if(! Number.EPSILON){
// Number.EPSILON=Math.pow(2,-52);
/ /}
// This code is more efficient and aesthetically pleasing
function numbersequal(a,b){ 
  return Math.abs(a-b) < Number.EPSILON;
}
// Then decide
const a=0.1+0.2, b=0.3;
console.log(numbersequal(a, b)); // true
Copy the code

toFixed()

  • The toFixed(num) method rounds the Number to a specified decimal Number
  • Num Specifies the number of digits of the decimal number. The value is between 0 and 20, including both 0 and 20. Some implementations support a larger range of values
  • Note in particular: toFixed() returns a string representation of a number that is rounded if necessary and filled with zeros if necessary so that the decimal part has the specified number of digits. If the value is greater than 1 e + 21, this method can simply call the Number. The prototype. The toString () method and returns an exponential notation format string
  • The precision problem can be solved this way
parseFloat((mathematical expression).tofixed (digits));// toFixed() must be between 0 and 20
/ / run
parseFloat((0.1 + 0.2).toFixed(10))// The result is 0.3
parseFloat((0.3 / 0.1).toFixed(10)) // The result is 3
parseFloat((0.7 * 180).toFixed(10))// The result is 126
parseFloat((1.0 - 0.9).toFixed(10)) // Result is 0.1
parseFloat((9.7 * 100).toFixed(10)) // Result is 970
parseFloat((2.22 + 0.1).toFixed(10)) // Result is 2.32
Copy the code

Primary packaging

function add(num1, num2) {
  const num1Digits = (num1.toString().split('. ') [1] | |' ').length;
  const num2Digits = (num2.toString().split('. ') [1] | |' ').length;
  const baseNum = Math.pow(10.Math.max(num1Digits, num2Digits));
  return (num1 * baseNum + num2 * baseNum) / baseNum;
}

function roundFractional(x, n) {
  return Math.round(x * Math.pow(10, n)) / Math.pow(10, n);
}
Copy the code

Third Party Libraries

Math.js, D.js, bignumber.js, decimal. Js, big.js, etc