Get aliyun coupon for free. My live broadcast – “THE Road to PHP Advancement”

The elder brother of the bird

Why byte alignment

The root cause of the need for byte alignment is the efficiency with which the CPU accesses the data. The CPU reads data each time from memory addresses that are multiples of 4 bytes (32-bit CPU) or 8 bytes (64-bit CPU). If not aligned, it is likely that a 4-byte int will be read twice. See the experiment below for a concrete demonstration.

The aligned value of the data type itself

Align data types by their own size. The memory address of a variable is exactly an integer multiple of its length

The experiment

#include <stdio.h>

int main(int argc, char const *argv[])
{   
    char a = 1; // 0x7fff5fbff77f,sizeof(a):1
    int  b = 1; // 0x7fff5fbff778,sizeof(b):4
    int  c = 1; // 0x7fff5fbff774,sizeof(c):4
    char d = 1; // 0x7fff5fbff773,sizeof(e):1
    int  e = 1; // 0x7fff5fbff76c,sizeof(f):4
    
    printf("%p,sizeof(a):%lu\n",&a,sizeof(a));
    printf("%p,sizeof(b):%lu\n",&b,sizeof(b));
    printf("%p,sizeof(c):%lu\n",&c,sizeof(c));
    printf("%p,sizeof(d):%lu\n",&d,sizeof(d));
    printf("%p,sizeof(e):%lu\n",&e,sizeof(e));

    return 0;
}Copy the code

To illustrate, the left side of the figure is the memory diagram of the code above, with the gray area indicating the unused memory of the program. On the right, a short F is declared after a char A based on the above code.

From the above experiment and the picture, we can find the following rules:

  1. Abcde allocates the memory addresses of the five variables in descending order;
  2. If you look closely, they don’t have memory addresses right next to each other;
  3. andintType variables have even memory addresses (The address of an odd int variable cannot exist);
  4. On closer inspection, we find that the addresses of int variables are divisible by 4, so variables are aligned on the stack according to the size of each data type.
  5. The addition ofshort fThe addresses aren’t right next to each othera, but to align with their own data size, that is, the even number of addresses to apply.
  6. Each variable on the stack requests memory, return the address is the smallest contiguous memory address.

On the other hand, if the memory addresses of variables A, B, and C are not aligned, for example, the CPU reads only 8 bytes at a time, which means that the last byte of variable C has not been read. The efficiency of accessing data is reduced.

Each variable on the stack requests memory, return the address is the smallest contiguous memory address. What’s going on here?

To verify the memory diagram I drew above, let’s say I have an int variable and its value takes up four bytes, how does it hold data in four bytes? Let’s use hexadecimal to demonstrate 0x12345678.

  1. Why use an 8-bit hexadecimal? Since int is 4 bytes, each byte has 8 bits, and each byte has two states of 0/1, then 2^8=256, which is 16^2. So an 8-bit hexadecimal number fills the memory of an int.
  2. Why 12345678, purely demo convenience.

I first store variable B and then use char pointer P to access the usage of four bytes of B in sequence.

#include <stdio.h>

int main(int argc, char const *argv[])
{
    char a = 1;             // 0x7fff5fbff777
    int  b = 0x12345678;    // 0x7fff5fbff770
    char c = 1;             // 0x7fff5fbff76f
    printf("%p\n",&a);
    printf("%p\n",&b);
    printf("%p\n",&c);

    char *p = (char *)&b;
    
    printf("%x %x %x %x\n", p[0],p[1],p[2],p[3]); // 78 56 34 12
    printf("%p %p %p %p\n", &p[0],&p[1],&p[2],&p[3]); // 0x7fff5fbff770 0x7fff5fbff771 0x7fff5fbff772 0x7fff5fbff773
        
    return 0;
}Copy the code

The highest bit of variable B 0x12345678 is 0x12, and the lowest bit is 0x78.

Here we have to explain the big and small side mode

  1. In the little-endian method, the lowest byte is emitted at the lower end of memory, that is, the starting address of the value, and the highest byte is emitted at the high end of memory.
  2. Big-endian means that the highest byte is stored at the lowest address of memory, that is, the start address of the value, and the lowest byte is stored at the highest address of memory.

So, my current environment is in the form of a little endian.

Why are there big ends and small ends? You have to ask the hardware manufacturers, they are more capricious, so history is like this.

Byte alignment in structure

The value of the member with the largest self-aligned value is the criterion.

The experiment

int main(int argc, char const *argv[])
{
    struct str1{
        char a;
        short b;
        int c;
    };
    
    printf("sizeof(f):%lu\n",sizeof(struct str1));
    
    struct str2{
        char a;
        int c;
        short b;
    };
    
    printf("sizeof(g):%lu\n",sizeof(struct str2));
    
    struct str1 a;
    printf("a.a %p\n",&a.a);
    printf("a.b %p\n",&a.b);
    printf("a.c %p\n",&a.c);
    
    struct str2 b;
    printf("b.a %p\n",&b.a);
    printf("b.c %p\n",&b.c);
    printf("b.b %p\n",&b.b);

    
    return 0;
}Copy the code

The results of

sizeof(f):8
sizeof(g):12
a.a 0x7fff5fbff778
a.b 0x7fff5fbff77a
a.c 0x7fff5fbff77c
b.a 0x7fff5fbff768
b.c 0x7fff5fbff76c
b.b 0x7fff5fbff770Copy the code

The principle of

The gray table padding is used for alignment to ensure that the final structure size is an integer multiple of the size of the longest member.

exception

Is there a case of non-byte alignment in practice? Yes, for example, in our RPC framework, data transmission will be set as compact, so that cross-platform and cross-language can be easily achieved. The use of #pragma Pack (1) in network applications, known as variable tightening, not only reduces network traffic, but also can be compatible with various systems without causing unpacking errors due to different system alignment

Pragma Pack (1) and Attribute ((Packed)) are used in yar_header