Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

Seeing this title, many people may think, what’s weird about decimals? It’s not the same as integers. Addition, subtraction, multiplication and division also have no difference. What’s weird about it?

Let’s look at a simple example, define a decimal and an integer, and print it out.

The core code is really only three lines

float n1=2.1 f;
int n2 = 1;
printf( "%f,%d \r\n\r\n\r\n", n1,n2 );
Copy the code

View the result using the serial port print tool

The output is also normal, nothing strange.

Let’s take a look at how these two data are stored in memory

The integer 1 is stored in memory in two bytes starting at 0x000010, which is 0x0001. The decimal 2.1 is stored in memory in four bytes starting at 0x00000C, 0x40066666, which is filled with question marks ??????? What the hell is this? Why does 2.1 store so much data in memory?

The value of 2.1 is not 2.1, but 2.0999999044, so why does the printf() function print 2.1? That’s because the printf() function processes the data as it prints it.

If this example doesn’t seem intuitive, add 0.0001 to the float every time you print.

You can see that after adding 0.0001 to 2.1003, the calculation is already wrong. So why does this happen? This is about to start from the microcontroller internal storage of decimals.

Let’s first look at converting 2.1 to hexadecimal. You can just go to this website and calculate it

h-schmidt.net/FloatConverter/IEEE754.html

You can see that the hexadecimal value of 2.1 is 0x40066666, which is the same as the value viewed in memory. Note The value in the memory of the microcontroller is correct.

So this value is coming again? This has to say that C language for floating point storage requirements. In C, a floating point number is 32 bits, where the first digit represents a sign bit, 0 represents a positive number, and 1 represents a negative number. The next eight bits represent the exponent part, so the system will represent the floating point number as an exponent, and these eight bits are the value of the exponent part, but the exponent is base two. The remaining 23 bits represent the mantissa, or significant data.

Convert 0x40066666 to a base 2 number

The highest digit, 0, means this number is positive, and then the next eight digits, that’s the exponential part. But the exponent here is offset by 127, which means that this minus 127 is the exponent. Here the value of the 8-bit binary is 1000 0000 which is 128,128 minus 127=1, indicating that the value of the exponential part is 1. So that’s 2 to the first. The remaining 23 bits, 001100110011001100110, or 0x066666, represent valid data converted to base 10, which is 419430.

You can also see this calculation on this website. Look at this website can be a little confused, said so much, then the specific calculation steps are how to come? So how does this 2.1 translate to binary?

Step 1: Convert the integer part to binary. The integer part is 2, so base 2 is 10

Step 2: Convert the decimal to binary, that is, convert 0.1 to 2^a+2^b+2^c……. 2^n

This transformation is difficult to implement manually, and can be implemented in a simpler way. Multiply the decimal part of the decimal system by 2, take the integer part as one bit of the binary system, and multiply the remaining decimal by 2 until there are no remaining decimals. The specific implementation steps are as follows

0.1 * 2 = 0.2 0.2 * 2 = 0.4 0.4 * 2 = 0.8 0.8 * 2 = 1.6 1 0.6 * 2 = 1.2 1 0.2 * 2 = 0.4 0.4 * 2 = 0.8 0 0.8 * 2 = 1.6 1 0.6 * 2 = 1.2 1 0.2 * 2 = 0.4 0.4 * 2 = 0.8 0.8 * 2 = 1.6 1 0.6 * 2 = 1.2 1 0.2 * 2 = 0.4 0.4 * 2 = 0.8 0 0.8 * 2 = 1.6 1 0.6 * 2 = 1.2 1 0.2 * 2 = 0.4 0.4 * 2 = 0.8 0 0.8 * 2 = 1.6 1 0.6 * 2 = 1.2 1Copy the code

You can see that this is a number that circulates in an infinite loop of 0011, which means that the decimal part can only approach 0.1 indefinitely, not equal to 0.1. So the precision of this decimal will be lost inside the microcontroller.

The decimal part of the binary number is 0 0011 0011 0011 0011 0011 0011 0011

Then you combine the integers and decimals. The integer part 10 plus the decimal part is 10.0 0011 0011 0011 0011 0011 0011

Now, the exponent, the integer 2 is 2^1 in terms of exponents, so the exponent part is at most 1. The number is then converted to an exponential representation.

Move the decimal point forward one bit so that the whole number is 1 and multiply it by 2^1.

Step 3: Convert the exponential part to binary.

The exponent is 1, but you can’t just write 0000, 0001, because in decimals, the leftmost place is the highest, and the right most place is the lowest. So the left-hand bit is 1/2 to the first, and the right-hand bit is 1/2 to the eighth. So when you compute the binary value of the exponent, you add 127 to it, and then convert it to binary.

1+127=128, the binary number of 128 is 1000, 0000

Step 4: Combine indices and valid data

Index is 1000 0000 valid data is 1.00 0011 0011 0011 0011 0011 0011

When composing, remove the 1 from the integer part of the valid data, because the integer part is always 1 when converted to binary exponential representation. So the combined data is 1000 0000.00 0011 0011 0011 0011 0011 0011

Step 5: Add sign bits

The highest bit is the sign bit, where 0 means a number is positive and 1 means a number is negative. 2.1 is a positive number, so the highest digit is 0.

Add all the data together and you get 0 1000 0000.00 0011 0011 0011 0011 0011 0011

0100 0000 0000 0110 0110 0110 0110 0110 0110

One symbol bit, eight index bits, and 23 valid data bits. If the valid data bits are less than 23 bits, 23 bits must be made up.

The binary number converted to a hexadecimal number is 0x40066666, which is the same as the data stored inside the computer.

If you look at this, the accuracy of floating point numbers is lost when they are stored. Therefore, in the process of calculation, how to keep storage and calculation, then the loss of accuracy will be more severe. This explains why, in the example above, the error occurred only at the fourth step.

When someone says 0.1+0.2 is not equal to 0.3, he is called stupid. Maybe you will smile, because 0.1+0.2 is really not equal to 0.3, it is equal to 0.300000012.

Decimals are so strange, can’t you use decimals when writing programs? Of course not, when there is decimal calculation in the program, in order not to lose the accuracy of the calculation, generally will first expand the decimal multiple into an integer, and then calculate, after the calculation is completed, the integer will be reduced by a certain multiple, in the change back to the decimal.

The program tested above is modified as follows:

Since the sum is 0.0001, the data will be uniformly expanded by 10000 times, so that when the sum is added, 1 can be directly added. When the sum is finished, the data will be reduced by 10000 times, so that the printed data result is correct.