 # In-depth analysis of data storage in memory

Posted on May 27, 2023, 4:20 p.m. by Luis Bishop
Category: The code of life Tag: c++

This is an article on plain knowledge of C language interpretation, the inside of the each point and subject has a part of my personal understanding, you can use it as a supplement of knowledge, also can regard it as a class or review notes, I'll as their own learning progress slowly update some informative articles and some classic written test questions, Click "like" or "bookmark" if you think it's good! Welcome to criticize and correct in the comments section!

## 1. Data types

Basic built-in types:

``Char/short / / short/character data type integer int / / long / / integer long long long/a long integer, float / / single precision floating point type double / / double-precision floating-point type note:Copy the code``

Note:

1. In C, character types are stored as the character's ASCLL value, so char is often counted as one of the integer types. 2.C does not have string types because string types are treated as arrays of characters.

### 1.1 Basic classification of types:

Integer family:

``````char
unsigned char
signed char
short
unsigned short(int)
signed short(int)
int
unsigned int
signed int
long
unsigned long(int)
signed long(int)
Copy the code``````

Floating point family:

``````float
double
Copy the code``````

Construction type:

`` Array type  struct type  enum type enum  Union type unionCopy the code``

Pointer type:

``````int *pi;
char *pc;
float* pf;
void* pv;
Copy the code``````

Air type:

Void denotes an empty type (no type);

Commonly applied to function return types, function parameters, and pointer types.

## 2. Store the integer in memory

As we all know, the creation of variables is to create space in memory, and the size of space is determined by different types, so how is the data stored in the created memory?

Such as:

``int a=20; // Allocate 4 bytes of space for a int b=-10; // Allocate 4 bytes of space to BCopy the code``

That brings us to the following concepts

### 2.1 Source code, inverse code, complement code

There are three ways of representing integers in computers: source code, inverse code and complement.

All three methods have two parts: sign bit and number bit. Sign bit is 0 for "positive", 1 for "negative", and number bit

Each of the three ways of expressing.

Source code: binary directly in accordance with the positive and negative form of translation into binary.

Reverse code: the original code of the symbol bit unchanged, other bits in turn according to the bit on the line.

Complement: Add 1 to the inverse to get the complement.

Note: positive numbers have the same primitive, inverse and complement codes.

Here's an example:

``#define _CRT_SECURE_NO_WARNINGS #includestdio.h int main() { The source-code / / / / 10000000000000000000000000001010-11111111111111111111111111110101 - radix-minus-one complement / / 11111111111111111111111111110110 - complement (radix-minus-one complement + 1) return 0; }Copy the code``

The question then arises, in what form is the value stored in the computer, is it source code, inverse code or complement code?

In computer systems, values are represented and stored by complement codes.

The reason is that, using complement code, the symbol bit and the numerical bit can be unified processing; At the same time, addition and subtraction can also be unified processing (CPU only adder); In addition, complement code and source code conversion, its operation process is the same, do not need additional hardware circuit.

To get a clearer idea of how data is stored, let's look at the following examples:

Case 1: Int A =-10 (F6FFFFFF) =-10 (F6FFFFFF) =-10 (F6FFFFFF);

``#define _CRT_SECURE_NO_WARNINGS #includestdio.h int main() { The source-code / / / / 10000000000000000000000000001010-11111111111111111111111111110101 - radix-minus-one complement / / 11111111111111111111111111110110 - complement (radix-minus-one complement + 1) / / FFFFFFF6 (order is not the same problem, and the picture below 2.2 size explains in the introduction) return 0; }Copy the code``

Thus, integers store complement in memory

In the example above, we can see how integers are stored in memory in what form. Now we can see how integers are computed in a computer:

Example 2:

``#define _CRT_SECURE_NO_WARNINGS #includestdio.h int main() {//1-1 //1+(-1) // The original code / / / / 00000000000000000000000000000001-1 10000000000000000000000000000001-1 of the original code / / results of - 10000000000000000000000000000010-2 / / errors / / complement addition: / / / / 00000000000000000000000000000001-1 complement complement of 11111111111111111111111111111111 -- minus one / / results for 33-100000000000000000000000000000000, more than 1 / / again because binary only 32 bits, so the extra one will not put, Lost / / 00000000000000000000000000000000 - the end result: 0 / / right return 0; }Copy the code``

As can be seen from the above comment, the result will be settled correctly only if the calculation is performed using the complement.

Now let's go back and see how a positive and negative number is stored in memory:

``#define _CRT_SECURE_NO_WARNINGS #includestdio.h int main() { int a = -10; 11111111111111111111111111110110 / / / / complement: FFFFFFF6 int b = 10; / / complement: 00000000000000000000000000001010 / / 0000000 a return 0; }Copy the code``

Note: Hexadecimal and 2-base conversion rules are 4-bit to conversion, 8-bit to conversion, and 4-bit to conversion. In general, the conversion between base 2^n and base 2 is n-bit to one. Now we find another problem, why are a and B stored in reverse order? These are related to the big and small ends of the data.

### 2.2 Introduction and judgment of small endian byte order

What is big and small end:

In big-endian (storage) mode, the low value of data is stored in the high address of memory, while the high value of data is stored in the low address of memory.

In the little endian (storage) mode, the low value of data is stored in the low address of memory, while the high value of data is stored in the high address of memory.

Here's an example: For the hexadecimal number A, because the address changes from left to right from low to high And because 44 is the low order of data, 11 is the high order of data, 44 is on the left and 11 is on the right when there is an address, which means that the low order of data is stored in the low address of memory, and the high order of data is stored in the high address of memory, so in the current compiler, we use the little endian (byte order) storage mode.

So why are there big and small ends?

This is because in computer systems, we are measured in bytes, one byte for each address unit, and each byte is 8 bits.

But in C, in addition to 8-bit chars, there are 16-bit short, 32-bit long (depending on the compiler), and for processors with more than 8 bits, such as 16-bit or 32-bit processors, since the register width is larger than one byte, Then of course there is the problem of how to arrange multiple bytes. ** This leads to big-endian and small-endian storage modes.

For example, if a 16bit short x has an address of 0x0010 in memory and the value of x is 0x1122, then 0x11 is the high byte and 0x22 is the low byte. For big-endian mode, 0x11 is placed in the low address, 0x0010, and 0x22 in the high address, 0x0011. Small-end mode, just the opposite. Our common X86 architecture is small-endian mode, while the KEIL C51 is large-endian mode. Many ARM, DSP are small - end mode. Some ARM processors also have hardware options for big-endian or small-endian mode.

Let's take a look at a pen test:

Baidu system Engineer in 2015

Please briefly describe the concepts of big-endian and small-endian, and design a small program to determine the current machine's byte order.

Analysis examples:

Means let's define a number in the topic, through a series of code to determine the data storage model of the machine is big end and small end, such as to define a number of int a = 1, the number is 4 bytes, respectively is 00 00 00, 01, the main problem is how to know the content of the first byte of a, so has the following ideas:

If on a address, access the contents of the first byte is zero, then the storage mode is the main mode of the machine (high 00 stored in memory in the lower address), on the contrary, if access to the content of the first byte is 1, the storage mode is the small end of the machine (low 01 stored in memory in the lower address) A pointer to a char is dereferenced as the value of the first byte of the number.

``````#define _CRT_SECURE_NO_WARNINGS
#includestdio.h
int check_sys()
{
int a = 1;
char* p = (char*)a;
//将int*类型的a强制类型转换为char*类型
return *p;
//*p是1的时候返回1，*p是0的时候返回0
}

int main()
{
//写代码判断当前机器字节序
int ret= check_sys(); ;
if (ret == 1)
{
printf("小端\n");
}
else
{
printf("大端\n");
}

return 0;
}
Copy the code``````

### 2.3 practice

Exercise 1:

``// What is the output? #includestdio.h int main() { char a= -1; signed char b=-1; unsigned char c=-1; printf("a=%d,b=%d,c=%d",a,b,c); return 0; }Copy the code``

1. unsigned charChar is an unsigned byte. The size of a char variable is usually 1 byte (1 byte =8 bits) and is an integer.
2. Integer promotion **** : CInteger arithmeticOperations are always performed with at least the precision of the missing integer type. In order to get this precision, the expressioncharacterAnd short operands are converted to normal integers before they are used. This conversion is called integer promotion, and integer promotion is promoted according to the sign bit of the variable's data type.
3. The C language standard does not specify whether a char is unsigned or signed, depending on the compiler, but for int, int is signed and short is signed short.

Output result: The analysis is as follows: Exercise 2:

``// What is the output? #includestdio.h int main() { char a = -128; printf("%u\n",a); //%u is an unsigned integer return 0; }Copy the code``

Output result: The analysis is as follows:  Note: Base 2, base 8, base 10, base 16 from bottom up.

Exercises 3

``// What is the output? #includestdio.h int main() { char a = 128; printf("%u\n",a); return 0; }Copy the code``

Output result: We got the same answer as we did in the previous problem

The analysis is as follows:  Note: the value of a signed char ranges from -128 to +127. **

For example, if we define a char a= 129,129, the final eight bits of the complement (truncated) are 10000001, which is the value -1 for a char

Here's a trick: If the number exceeded is positive, subtract 256; if the number exceeded is negative, add 256

Exercises 4

``// What is the output? #includestdio.h int main() { int i= -20; unsigned int j = 10; printf("%d\n", i+j); (%d) return 0; }Copy the code``

Output result: Honestly according to the form of complement operation, the analysis is as follows: Exercises five

``// What is the output? #includestdio.h int main() { unsigned int i; for (i = 9; i = 0; i--) { printf("%u\n", i); } return 0; }Copy the code``

Output results (output results wrong can be the first few steps of their own step by step compilation, make clear the rule) : infinite loop The analysis is as follows: Exercise 6

``// What is the output? #includestdio.h #includestring.h int main() { char a; int i; for (i = 0; i  1000; i++) { a[i] = -1 - i; } printf("%d", strlen(a)); return 0; }Copy the code``

Output result: The analysis is as follows: Exercises 7

``// What is the output? #includestdio.h int main() { for (i = 0; i = 255; i++) { printf("hello world\n"); } return 0; }Copy the code``

Output (an infinite loop for Hello World) : If you know what unsigned numbers are, it's easy to understand. We'll have a better idea of how the integer family is stored in memory, so let's look at how floating-point types are stored in memory:

## 3. Memory storage of floating point types

Common floating point numbers:

3.14159

1E10 (1E is the scientific notation for floating point numbers specified in C language, 1E10 stands for 1.0×10 to the tenth power)

The floating point family includes the float, double, and long double types.

To take a closer look at floating point storage in memory, let's look at an example:

### 3.1 An Example

``#includestdio.h int main() { int n = 9; float *pFloat = (float *)n; Printf ("n = %d\n",n); Printf ("*pFloat value: %f\n",*pFloat); * pFloat = 9.0; Printf ("num = %d\n",n); Printf ("*pFloat value: %f\n",*pFloat); return 0; }Copy the code``

What is the output? (Think about it for yourself) Is it a little different from what you expected? What about these values?

### 3.2 Floating point storage rules

Num and *pFloat are the same number in memory. To understand this result, it is important to understand how floating point numbers are represented inside a computer.

Detailed interpretation:

According to the international standard IEEE (Institute of Electrical and Electronic Engineering) 754, any binary floating point number V can be expressed in the following form:

• (-1)^S * M * 2^E
• (-1)^s represents the sign bit. When s=0, V is positive. When s is equal to 1, V is negative.
• M is a significant number, greater than or equal to 1, less than 2.
• 2 to the E is the exponent bit.

Here's an example:

5.0 in decimal, written in binary is 101.0, which is the same as (-1)^0*1.01 * 2^2.

Add :(for a base 10 number, 1.01 becomes 101 multiplied by 10 to the 2nd power, and so on, for a base 2 number, 1.01 becomes 101 multiplied by 2 to the 2nd power);

So, according to the format of V above, s=0, M=1.01, E=2.

-5.0 in decimal, written in binary is -101.0, which is the same as (-1)^1*1.01 * 2^2. So, s=1, M=1.01, E=2.

IEEE 754 states:

For 32-bit floats, the highest 1 bit is the sign bit S, the next 8 bits are the exponent E, and the remaining 23 bits are the significant digit M. For 64-bit floating-point numbers (doubles), the highest 1 bit is the sign bit S, the next 11 bits are the exponent E, and the remaining 52 bits are the significant digit M. IEEE 754 has some special provisions for the significant digits M and E.

As mentioned earlier, 1≤M2, that is, M can be written as 1. XXXXXX, where XXXXXX represents the decimal part.

IEEE 754 states that when storing M inside a computer, by default the first digit of the number is always 1, so it can be dropped to save only the XXXXXX portion that follows. For example, when saving 1.01, only save 01, wait until read, add the first 1. The purpose of doing this is to save one significant digit. In a 32-bit floating-point number, for example, there are only 23 bits left for M. If you drop the first 1, you can store 24 significant digits.

If there are not 23 bits of M in there, just add a 0 after it.

As for the exponent E, the situation is more complicated:

First, E is an unsigned int,

This shows that:

If E is 8 bits, its value ranges from 0 to 255. If E contains 11 digits, the value ranges from 0 to 2047.

However, we know that in scientific notation E can be negative (e.g., 2 to the negative 1, in which case E is equal to negative 1), so IEEE 754 states that the true value of E must be stored in memory with an intermediate number, which is 127 for 8-bit E; For an 11-digit E, the middle number is 1023. For example, E of 2^10 is 10, so saving as a 32-bit floating-point number must be saved as 10+127=137, or 10001001.

What does that mean? Let's look at this example:

Examples:

``#includestdio.h int main() {float f = 5.5; // What does floating point f look like in memory? return 0; }Copy the code``

What does floating point f look like in memory? Step by step: As shown in figure: The above shows how floating-point numbers are stored in memory. Then, the exponent E can be pulled out of memory in three different ways:

1.E is not all 0 or all 1

Floating-point numbers are represented by subtracting 127 (or 1023) from the calculated value of the exponent E to get the true value, followed by the significant number M with the first 1.

Such as: The binary form of 0.5 (1/2) is 0.1. Since the positive part must be 1, it is 1.0*2^(-1) if the decimal point is moved 1 bit to the right. The order code is -1+127=126, which is 01111110, and the mantras 1.0 is 0 when the integer part is removed. If the number of bits from 0 to 23 is 00000000000000000000000, the binary representation is:

0, 01111110, 00000000000000000000000

2. E is 0

At this point, the floating-point exponent E equals 1 minus 127 (or 1 minus 1023), which is the true value, and the significant number M no longer adds the first 1 (because the real number is too small), but reverts to the decimal of 0.xxxxxx (as specified). This is done to represent ±0 (positive and negative numbers that are infinitely close to 0), as well as very small numbers that are close to 0.

3. E to 1

In this case, if the significant digit M is all 0, it means ± infinity (plus or minus depending on the sign bit S);

Ok, so much for the representation rules for floating-point numbers, let's go back to the previous questionable question:

### Explain the previous topic:

``#includestdio.h int main() { int n = 9; float *pFloat = (float *)n; Printf ("n = %d\n",n); Printf ("*pFloat value: %f\n",*pFloat); * pFloat = 9.0; Printf ("num = %d\n",n); Printf ("*pFloat value: %f\n",*pFloat); return 0; }Copy the code`` Let's take it step by step:

First, the value of n:

Int n=9 is an integer, and the output is %d, so the output should be 9;

Look again at the first value of *pFloat:

``````float *pFloat = (float *)n;
Copy the code``````

This line of code means that you take the address of integer n, cast it to an address of type float (unchanged content), and receive it with a pointer of type float, pFloat

``Printf ("*pFloat value: %f\n",*pFloat);Copy the code``

The binary sequence of int 9:0000 0000 0000 0000 0000 0000 0000 1001 is considered to be a floating-point number because it is output as %f (decimal). 0 00000000 00000000000000000001001 S = 0, E = 0, M = 0.00000000000000000001001 (E 0, for the whole front fill 0 M) so by floating point number is: 0.00000···01001*2^(-126), infinitely close to 0, and because **%f can only print 6 decimal places **, so the final output is 0.000000

Num = num

``* pFloat = 9.0;Copy the code``

9.0:1001.0 = (-1)^0*1.001*2^3, S=0, M=001, E=3+127=130 0 10000010 001000000000000000000000000 and because the final output is % D (integer), the sign bit of the binary number is 0, is a positive number, so the source code is equal to the complement, so the binary number is converted to the 10 base printing place, the final result is: Finally, look at the second value of *pFloat:

``Printf ("*pFloat value: %f\n",*pFloat);Copy the code``

This is easy to explain; *pFloat = 9.0 is defined as a decimal, and the output is also in %f (decimal), so the final output is the unpretentious 9.000000

Do these explanations make sense?

This is the first blog written by the blogger, the content is a bit much, it took the blogger several days, if you think it is good, or more or less to help you, then you can like a wave of support, but also welcome the big guy in the comment area criticism, we see you next time!

Search