Why’s THE Design is a series of articles about programming decisions in THE field of computing. In each article, we present a specific question and discuss THE pros and cons of this Design and its impact on implementation from a different perspective.

The equation 0.1 + 0.2 = 0.3 seems obvious, but the previous article why 0.1 + 0.2 = 0.300000004 analyzed why this equation is not true in most programming languages. Standard floating-point numbers can be guaranteed with either 32-bit single-precision floating-point numbers or 64-bit double-precision floating-point numbers, and all programming languages that implement floating-point numbers correctly experience “errors” like this:

> 0.1 + 0.2
0.30000000000000004
Copy the code

Floating point is an essential concept in programming languages. It is a trade-off between performance and precision. Too high precision requires more bits and more computations, and too low precision cannot meet common computing requirements. The problem with this decision, however, is not that different from that in software engineering — how to make the most of limited resources to achieve a specific purpose.

Figure 1 – Tradeoff between performance and accuracy

Although floating-point numbers offer relatively excellent performance, the use of low-precision floating-point numbers in financial systems can have very serious consequences. Suppose we use a 64-bit floating-point double to store balances on an account at an exchange or bank. This opens up the possibility of attack by the user, who can use the precision limit of the double to create more balances:

Figure 2 – The financial system and floating point numbers

When the user first adds 0.1 units and 0.2 units of assets to the account, the user will get 0.30000000000000004 when using the double precision floating point number, and the user can get 0.00000000000000004 windfall by withdrawing all these assets 1. If the user repeats it enough times, it can take the bank out of business, everyone. Here is a code that uses floating point numbers to handle top-up and withdrawal:

Var balance float64 = 0 func main() {deposit(.1) deposit(.2) if balance, OK := withdraw(0.30000000000000004); ok { fmt.Println(balance) } } func deposit(v float64) { balance += v } func withdraw(v float64) (float64, bool) { if v <= balance { balance -= v return v, true } return 0, false }Copy the code

The code above is an ideal case. Today’s mature financial systems cannot make this kind of stupid mistake, but it is still possible in some new exchanges, but it is very difficult to actually implement it. If the number of resources we can control is infinite, we can naturally achieve infinite precision decimals. However, resources are always limited. Some programming languages or libraries provide higher precision decimals in the following two ways to ensure that the equation 0.1 + 0.2 = 0.3 holds true:

  • Use 128-bit high precision fixed point number or infinite precision fixed point number;
  • Using rational number type and fraction system to ensure the accuracy of calculation;

Both of these methods can achieve higher precision decimal systems, but their principles are slightly different, and we’ll look at how they are designed.

Decimal fraction

Many times floating point accuracy is lost due to data correlation conversions between different bases. As we mentioned in why 0.1 + 0.2 = 0.300000004, we cannot accurately represent 0.1 and 0.2 in decimal with a finite number of binary digits, which results in a loss of accuracy. These precision losses can add up to big errors in the end:

Figure 3 – Loss of precision between binary and decimal

As shown in the figure below, since both 0.25 and 0.5 decimal decimals can be accurately represented in binary floating-point numbers, 0.25 + 0.5 must also be accurate in floating-point numbers 2:

Figure 4-0.25 and 0.5 are represented by floating point numbers

To solve the precision problem of floating point numbers, some programming languages introduced the Decimal Decimal Decimal. Decimal is very common in different communities, and if a programming language does not have native support for Decimal, we are sure to find a language-specific implementation of Decimal libraries in the open source community. Java provides decimals with infinite precision through BigDecimal, which contains three key member variables intVal, Scale, and Precision3:

public class BigDecimal extends Number implements Comparable<BigDecimal> { private BigInteger intVal; private int scale; private int precision = 0; . }Copy the code

When we use BigDecimal to represent 1234.56, the three fields in BigDecimal will look like the following:

  • intValIn stores all digits after the decimal point, i.e123456;
  • scaleIn stores the number of digits of decimals, i.e2;
  • previsionIs stored in all the significant digits, the first four decimal places, two decimal places, i.e6;

Figure 5 – BigDecimal implementation

The use of multiple integers by BigDecimal sidesteps the problem that binary cannot accurately represent partial decimal numbers, because BigInteger can represent integers of arbitrary length using arrays, so if the machine’s memory resources are infinite, BigDecimal can theoretically represent decimals with infinite precision.

Although some programming languages implement theoretically infinite precision in BigDecimal, in practice most of us do not require infinite precision. Programming languages such as C# provide 28-29 bits of precision through 16-byte Decimal. The use of 16-byte Decimal in financial systems generally ensures the accuracy of computations 4.

Rational number

While using Decimal and BigDecimal can largely solve the accuracy problem of floating point numbers, they are still helpless when it comes to infinite decimals, decimals in Decimal can never accurately represent 1/3, and no matter how many Decimal places are used, loss of accuracy is unavoidable:

Figure 6 – Accuracy of infinite decimals

When we encounter such situations, using Rational numbers is the best way to solve such problems. Some programming languages use Rational numbers as part of the standard library for scientific computing requirements, such as Julia5 and Haskell6. As an important part of rational numbers, fractions can accurately represent 1/10, 1/5 and 1/3. Julia, as a common programming language in scientific calculation, we can express fractions in the following ways:

julia> 1//3
1//3

julia> numerator(1//3)
1

julia> denominator(1//3)
3
Copy the code

This kind of method to solve the problem of accuracy is closer to the original mathematical formula, score of molecules and the denominator is the rational structure of two variables, multiple points in the operation of addition, subtraction, multiplication, and division with the mathematical calculation of score no difference, nature also won’t cause loss of accuracy, we can simply look at the Java realization of rational number 7:

public class Rational implements Comparable<Rational> { private int num; // the numerator private int den; // the denominator public double toDouble() { return (double) num / den; }... }Copy the code

Num and den represent the numerator and denominator of a fraction, respectively, and the toDouble method provides the ability to convert the current rational number to a floating point number. Because floating point numbers are more common in software engineering, rational numbers can be used for most calculations when rigorous scientific calculations are needed. And at the end of the conversion back to float to reduce possible errors.

However, it should be noted that this method of using rational number calculation is not only relatively troublesome in use, but also cannot be compared with floating point in performance. A common addition and subtraction method requires the use of several times the assembly instructions of floating point operation, so it must be avoided in unnecessary scenarios.

conclusion

It is not complicated to ensure that the formula 0.1 + 0.2 = 0.3 is valid. The author believes that there are other ways to implement the formula besides the ones described in the paper, but the two schemes described in the paper are the most common ones. Let’s review how the formula 0.1 + 0.2 = 0.3 works:

  • Use two integers in decimal-an integer value and an exponent to represent either finite or infinite precision decimals. some programming languages use 128-bit onesDecimalRepresents a number with a precision of 28 to 29 bits, as used by some programming languagesBigDecimalA number representing infinite precision;
  • Using two decimal integers — numerator and denominator — to represent exact fractions reduces the loss of precision in floating-point calculations.

Rational numbers and decimals are concepts in mathematics. Mathematics is a very rigorous and precise subject. By introducing a large number of concepts and symbols, the calculation in mathematics can achieve absolute accuracy. But as a software engineering project, it needs to be in the complicated physical world, the limited use of limited resources to solve the problem, so we need a trade-off between multiple solutions and choice, the rational Numbers and irrational Numbers are in the math can implemented in the software, but must want to know when in use – we sacrificed in order to get these? In the end, let’s take a look at some of the more open-ended questions that interested readers can ponder:

  • What are the structures for small numbers in your most common programming language, and what fields do they contain?
  • How well do floating point, decimal and rational strategies perform in addition, subtraction, multiplication and division?

If you have questions about the content of this article or want to learn more about the reasons behind some design decisions in software engineering, you can leave a comment below on this blog. The author will respond to the questions in this article and select the appropriate topics for subsequent content.

Recommended reading

  • Why 0.1 + 0.2 = 0.300000004
  • Why does 0.1 + 0.2 = 0.3

  1. Deposit and Withdraw with float64 gist.github.com/draveness/f… ↩ ︎

  2. Ieee-754 Floating Point Converter www.h-schmidt.net/FloatConver… ↩ ︎

  3. The Source for Java. Math. BigDecimal developer.classpath.org/doc/java/ma… ↩ ︎

  4. Floating-point numeric types (C# reference) docs.microsoft.com/en-us/dotne… ↩ ︎

  5. The Rational Numbers docs.julialang.org/en/v1/manua… ↩ ︎

  6. Data. Thewire hackage.haskell.org/package/bas… ↩ ︎

  7. Rational. Java introcs.cs.princeton.edu/java/92symb… ↩ ︎

Transfer application





Creative Commons Attribution 4.0 International License agreement

The article images

Guide to illustration of technical articles