For a better reading experience, go to floating-point numbers stored in computers – Mculover666’s personal blog.

The data types used to store decimals are float and double. What binary code is used to store floating-point numbers on a computer?

This article will reveal how floating-point numbers are stored on computers

1. Representation of floating point numbers in programming (writing)

Floating-point numbers are programmed in two ways:

  • Traditional writing: Write directly (eg.0.1234)
  • Scientific notation: e (eg. 12.34E-2)

2. Decimal conversion between decimal and binary floating point

  • Decimal -> binary floating – point number
  • Binary floating point -> decimal decimal

Refer to common base system and its transformation for details.

3. Floating point numbers are stored in the computer according to IEEE 754 standards

IEEE 754

IEEE 754, full ANSI/IEEE Std 754-1985, is a widely used binary floating-point arithmetic standard for cpus and floating-point computers.

The IEEE 754 standard defines the format for representing floating-point numbers, some special values (infinity (Inf) and non-values (NaN)), and “floating-point operators” for those values.

IEEE 754 specifies four ways to represent floating point values:

  • Single accuracy (32 bits)
  • Double precision (64 bits)
  • Extended single accuracy (above 43 bits, rarely used)
  • Extended double precision (79 bits or more, usually achieved as 80 bits)

Storage format for binary floating point numbers in IEEE 754

The storage format of binary floating point number stipulated in IEEE 754 standard is shown in the figure, which has three fields respectively:

  • Sign: The highest bit is the sign bit, with 0 representing a positive number and 1 representing a negative number (corresponding to the symbol of a binary decimal)
  • Exponent: The coded value of a number (corresponding to the exponent of a binary decimal)
  • Fraction: The coded value of valid values (corresponding to valid values of binary decimals, ranging from 1 to 2)

So, the actual value of a binary floating point number is:

V alue = (− 1) s I g fract I on ∗ 2 expon nt value = (-1)^{sign} * fraction * 2^{exponent} Value = (1 -) sign ∗ fraction ∗ 2 exponent

Here’s an example:

eg.

The decimal 5.625 is converted into binary 101.101b, which is expressed as 1.01101*2^2 according to IEEE 754 standard, so the values of each field are:

  • Sign: 0
  • Exponent: 2
  • Fraction: 1.01101

According to this formula, we get the numbers that should be stored in the three fields of IEEE 754 standard.

The length of each field in IEEE 754

In THE IEEE 754 standard, the length of the three regions is different depending on the representation.

In terms of common single and double precision:

  • 32-bit single precision

  • 64-bit double precision

Moving on to the crucial part —— —— how are valid and exponential values encoded?

Valid Value segment for IEEE 754 storage format (Fraction)

IEEE 754 specifies the range of valid values: greater than or equal to 1, less than 2.

This means that the integer part of a valid value is always 1, so this can ignore (and not store) the 1 and the decimal point of the integer part and store only the decimal part of the value.

This has the advantage of saving one bit. For example, 32-bit single-precision only specifies 23 bits for storing significant digits, but can actually store 24 significant digits.

eg.

In the previous example, the valid value field is 1.01101B, and the integer 1 is omitted. Only 01101B is stored, and there is not enough zeros on the right. Therefore, in single-precision storage, the 23-bit valid value field is 0110 1000 0000 0000 0000 000.

IEEE 754 Storage Format for Exponents

The storage of exponent E is complicated.

First, E is an unsigned int. This means that if E is 8 bits, its value range is 0255; If E is 11 bits, its value range is 02047. However, in scientific notation, E can be negative, so IEEE 754 states that the true value of E must be followed by an intermediate number, which is 127 for 8-bit E; For an 11-digit E, the middle number is 1023.

Let’s use the previous example to illustrate this:

eg.

The index field is 2. In 32-bit single-precision storage, 2 + 127 = 129, so the value stored in the 8-bit index field is the binary of 129, i.e. : 1000 0001B.

Finally, three different cases of the exponent E represent the number:

  • E not all zero or all one: floating point numbers are represented by the rule above, where the calculated value of the exponent E is subtracted 127 (or 1023) from the actual value, followed by the omitted 1.
  • E is all zeros: the floating-point exponent E is equal to 1-127 (or 1-1023), and the significant number M is no longer added with the first 1, but reverted to the decimal of 0.xxxxxx. This is done to represent plus or minus zero, and very small numbers that are close to zero.
  • E is all 1: if the significant digits M are all 0, it means ± infinity (plus or minus depending on the sign bit S); If the significant digit M is not all 0, the number is not a number (NaN).

The C program verifies the sample results

To sum up, the decimal 5.625 in the example is stored in 32-bit single precision:

0 10000001 01101000000000000000000
= 0100 0000 1011 0100 0000 0000 0000 0000
= 0x40b40000
Copy the code

Check whether the result is correct in C:

/** * @brief Verifies the IEEE 754 standard for storing floating-point numbers in computers * @author McUlover666 * @date June 23, 2019 14:49:35 */
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    float i = 5.625;    //0100 0000 1011 0100 0000 0000 0000 0000 0000
    int *p = &i;

    printf("i = %f\n", i);
    printf("i = %x\n", *p);

    system("pause");
    return 0;
}
/** * Mingw-w64 * -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- I = 5.625000 I = 40 b40000 * -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - * /
Copy the code

Compiling with MINGw-W64 on 64-bit Win7 system, the running results are as follows:

Finally, a decimal to IEEE 754 floating point number widget is attached: www.styb.cn/cms/ieee_75…