An object file is an intermediate file that has been compiled from source code but not linked, and has the same structural format as an executable. ELF (Executable Linkable Format) file under Linux, and PE (Portable Executable) file under Windows.

1. Object file structure

The object file contains compiled machine instruction code, data and some symbol tables, strings, etc. The object file stores this information as sections or segments, based on attributes.

Main parts:

Segment Content
.text code
.data A global variable that has been initializedandLocal static variable
.bss An uninitialized global variableandLocal static variableorInitialized to 0 (global/local static)theThe size of theThe sum of the
Why are program instructions and program data stored in two separate sections
  • Data and code segments are mapped to two different virtual memory regions.
  • The code segment is read-only for the process, and the data segment can be read and modified. Separate the two segments to set different permissions to prevent instructions from being maliciously modified.
  • CPU caches have data caches and instruction caches, which are stored separately to improve cache hit ratio.
  • Multiple copies of the same program whose program instructions are the same require only one copy of read-only instructions in runtime memory, as do other read-only resources.

Object file structure details

int printf(const char* format, ...);

int global_init_var = 84;
int global_uninit_var;

void func1(int i)
{
    printf("%d\n", i);
}

int main(void)
{
    static int static_init_var = 85;
    static int static_uninit_var;

    int a = 1;
    int b;

    func1(static_init_var + static_uninit_var + a + b);

    return a;
}
Copy the code

Objdump -h source.o Displays the structure of the target file

source.o: file format elf64-x86-64 Sections: Idx Name Size VMA LMA File off Algn 0. text 0000005f 0000000000000000 0000000000000000 000000000000 2**0 CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE 1. data 00000008 0000000000000000 0000000000000000 0000000000000000 000000A0 2**2 CONTENTS, ALLOC, LOAD, DATA 2. BSS 00000004 0000000000000000 0000000000000000 0000000000000000 000000A8 2**2 BSS segment ALLOC 3. rodata 00000004 0000000000000000 0000000000000000 000000A8 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 4. Comment 0000002b 0000000000000000 0000000000000000 000000000000 2**0 READONLY 5.note.GNU-stack 00000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 000000d7 2**0 Stack prompt CONTENTS, READONLY 6 .note.gnu.property 00000020 0000000000000000 0000000000000000 000000d8 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA 7 .eh_frame 00000058 0000000000000000 0000000000000000 000000f8 2**3 CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATACopy the code

Section CONTENTS this section exists in the object file, BSS section has no CONTENTS property, BSS has no CONTENTS in the object file. The BSS segment is simply the sum of the sizes of uninitialized global variables and local static variables or (global/local static) variables initialized to 0. At runtime, the program allocates memory space for these variables according to the size of the MEMORY recorded in the BSS segment.

Size source.o Displays the size of each segment in the ELF file

   text    data     bss     dec     hex     filename
    219       8       4     231     e7      source.o
Copy the code

Code segment data segment and read-only data segment

  • .data segment: initialized global and local static variables
  • .rodata segment: string constants
  • Const variables are stored in read-only data segments

To specify a variable or code to be placed in a specific segment:

__attribute__((section("FOO"))) void foo(a)
{}Copy the code

2. ELF file structure description

1. The file header

Readelf -h source. O # -h header file7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00Class: ELF64 Data:2 'S complement, little endian #1(Current) # Version OS/ABI: UNIX-system V #0ABI version Type:RELELF file type (Relocatable file) Machine: Advanced Micro Devices X86- 64.Hardware platform Version:0x1# Hardware platform version Entry point:0x0Start of program headers:0Start of section headers: (bytes into file)1184(bytes into file) # position of segment table Flags:0x0
  Size of this header:               64(bytes) # Size of program headers:0 (bytes)
  Number of program headers:         0
  Size of section headers:           64(bytes) # Number of section headers:14Number of ELF file middle Section header string table index:13The subscript of the segment in which the string table residesCopy the code

The ELF header describes some basic information about the entire file, most importantly the address offset of the segment table in ELF and the number of middle ELF files. A segment table is like an array. Each element is a segment descriptor for a segment. The segment descriptor records the basic attributes of a segment. The target file does not have Program headers. More information about Program headers is in Chapter-6.

Added: ELF file types are:

type instructions The instance
REL (Relocatable file) Relocatable files, which contain code and data, can be used to link to executable files or shared object files .o or.obj files
DYN (Shared object file) Share object files, containing code and data. So or DLL files
EXEC (Executable file) Executable file / bin/bash file
CORE (Core file)

2. The segment table

The segment table is an array of segment descriptors that record information about ELF segments (segment name, segment length, segment offset in the file, read and write permissions, etc.).

Readelf -s source. O # -s Segment14 section headers, starting at offset 0x4a0:

Section Headers:
  [Nr] Name              Type      Address           Offset     Size              EntSize          Flags Link  Info  Align
  [ 0]                   NULL      0000000000000000  00000000  0000000000000000  0000000000000000  0     0     0
  [ 1] .text             PROGBITS  0000000000000000  00000040  000000000000005f  0000000000000000  AX    0     0     1
  [ 2] .rela.text        RELA      0000000000000000  00000380  0000000000000078  0000000000000018  I     11    1     8
  [ 3] .data             PROGBITS  0000000000000000  000000a0  0000000000000008  0000000000000000  WA    0     0     4
  [ 4] .bss              NOBITS    0000000000000000  000000a8  0000000000000004  0000000000000000  WA    0     0     4
  [ 5] .rodata           PROGBITS  0000000000000000  000000a8  0000000000000004  0000000000000000   A    0     0     1
  [ 6] .comment          PROGBITS  0000000000000000  000000ac  000000000000002b  0000000000000001  MS    0     0     1
  [ 7] .note.GNU-stack   PROGBITS  0000000000000000  000000d7  0000000000000000  0000000000000000        0     0     1
  [ 8] .note.gnu.propert NOTE      0000000000000000  000000d8  0000000000000020  0000000000000000   A    0     0     8
  [ 9] .eh_frame         PROGBITS  0000000000000000  000000f8  0000000000000058  0000000000000000   A    0     0     8
  [10] .rela.eh_frame    RELA      0000000000000000  000003f8  0000000000000030  0000000000000018   I    11    9     8
  [11] .symtab           SYMTAB    0000000000000000  00000150  00000000000001b0  0000000000000018        12    12    8
  [12] .strtab           STRTAB    0000000000000000  00000300  0000000000000080  0000000000000000        0     0     1
  [13] .shstrtab         STRTAB    0000000000000000  00000428  0000000000000074  0000000000000000        0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

Copy the code
  • The segment descriptor size is fixed, so the Name field (sh_name) is just the offset of the string in the.shstrtab segment table string table.
  • Segment Offset Offset (sh_offset) indicates the segment Offset in the ELF file. The BSS segment Offset is the same as the read-only data segment Offset, because BSS does not exist in the ELF file and is meaningless.
  • The type NOBITS indicates that the segment has no content in the file, such as the BSS segment.
  • The flag bit of a segment represents the attributes of that segment in the process’s virtual address space. A (alloc) indicates that the segment needs to allocate space in the process space.
  • RELA relocatable type segment. The Link field represents the symbol table’s subscript, and the Info field indicates the relocation segment (table) of which segment it is.
  • Address is the virtual space Address, the virtual space Address was not allocated before the link, so all zeros.

3. Relocatable table

source.o:     file format elf64-x86-64

RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE 
0000000000000017 R_X86_64_PC32     .rodata-0x0000000000000004
0000000000000021 R_X86_64_PLT32    printf-0x0000000000000004
000000000000003d R_X86_64_PC32     .data
0000000000000043 R_X86_64_PC32     .bss-0x0000000000000004
0000000000000056 R_X86_64_PLT32    func1-0x0000000000000004


RELOCATION RECORDS FOR [.eh_frame]:
OFFSET           TYPE              VALUE 
0000000000000020 R_X86_64_PC32     .text
0000000000000040 R_X86_64_PC32     .text+0x0000000000000028
Copy the code

Combined with the target file assembly code:

0000000000000028 <main>:
0000000000000028: f3 0f 1e fa           endbr64 
000000000000002c: 55                    push   %rbp
000000000000002d: 48 89 e5              mov    %rsp,%rbp
0000000000000030: 48 83 ec 10           sub    $0x10,%rsp
0000000000000034: c7 45 f8 01 00 00 00  movl   $0x1,-0x8(% RBP) # initialize variable A 000000000000003b: 8b15 00 00 00 00     mov    0x0(%rip),%edx        # 41 <main+0x19>
0000000000000041: 8b 05 00 00 00 00     mov    0x0(%rip),%eax        # 47 <main+0x1f>
0000000000000047: 01 c2                 add    %eax,%edx
0000000000000049: 8b 45 f8              mov    -0x8(%rbp),%eax
000000000000004c: 01 c2                 add    %eax,%edx
000000000000004e: 8b 45 fc              mov    -0x4(%rbp),%eax
0000000000000051: 01 d0                 add    %edx,%eax
0000000000000053: 89 c7                 mov    %eax,%edi
0000000000000055: e8 00 00 00 00        callq  5a <main+0x32>
000000000000005a: 8b 45 f8              mov    -0x8(%rbp),%eax
000000000000005d: c9                    leaveq 
000000000000005e: c3                    retq  
Copy the code

Offset 0x0056 is a reference to the func1 function, leaving 00 00 00 00 00 before the link, which needs to be repositioned during the link. Relocation corrects only global and external symbols.

When linking, the linker needs to correct the absolute address reference in the object file (e.g., printf, func1). The relocation information is recorded in the corresponding relocation segment (relocation table). The relocation segment corresponding to the.text segment is.rela.text.

4. String table and segment table String table

ELF all strings (variable names, function names, etc.) are stored centrally in the string table. References to these strings in the segment table can be replaced by offsets of these strings in the string table.

Hex dump of section '.strtab':
  0x00000000 00736f75 7263652e 63007374 61746963 .source.c.static
  0x00000010 5f696e69 745f7661 722e3139 32320073 _init_var.1922.s
  0x00000020 74617469 635f756e 696e6974 5f766172 tatic_uninit_var
  0x00000030 2e313932 3300676c 6f62616c 5f696e69 .1923.global_ini
  0x00000040 745f7661 7200676c 6f62616c 5f756e69 t_var.global_uni
  0x00000050 6e69745f 76617200 61646472 0066756e nit_var.addr.fun
  0x00000060 6331005f 474c4f42 414c5f4f 46465345 c1._GLOBAL_OFFSE
  0x00000070 545f5441 424c455f 00707269 6e746600 T_TABLE_.printf.
  0x00000080 6d61696e 0076616c 756500            main.value.

Hex dump of section '.shstrtab': 0x00000000 002e7379 6d746162 002e7374 72746162 .. symtab.. strtab 0x00000010 002e7368 73747274 6162002e 72656c61 .. shstrtab.. rela 0x00000020 2e746578 74002e64 61746100 2e627373 .text.. data.. bss 0x00000030 002e7265 6c612e64 6174612e 72656c2e .. rela.data.rel. 0x00000040 6c6f6361 6c002e72 6f646174 61002e63 local.. rodata.. c 0x00000050 6f6d6d65 6e74002e 6e6f7465 2e474e55 omment.. note.GNU 0x00000060 2d737461 636b002e 6e6f7465 2e676e75 -stack.. note.gnu 0x00000070 2e70726f 70657274 79002e72 656c612e .property.. rela. 0x00000080 65685f66 72616d65 00 eh_frame.Copy the code

5. The symbol table

Symbol table '.symtab' contains 18 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS source.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 
     6: 0000000000000004     4 OBJECT  LOCAL  DEFAULT    3 static_init_var.1920
     7: 0000000000000000     4 OBJECT  LOCAL  DEFAULT    4 static_uninit_var.1921
     8: 0000000000000000     0 SECTION LOCAL  DEFAULT    7 
     9: 0000000000000000     0 SECTION LOCAL  DEFAULT    8 
    10: 0000000000000000     0 SECTION LOCAL  DEFAULT    9 
    11: 0000000000000000     0 SECTION LOCAL  DEFAULT    6 
    12: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    3 global_init_var
    13: 0000000000000004     4 OBJECT  GLOBAL DEFAULT  COM global_uninit_var
    14: 0000000000000000    40 FUNC    GLOBAL DEFAULT    1 func1
    15: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _GLOBAL_OFFSET_TABLE_
    16: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND printf
    17: 0000000000000028    55 FUNC    GLOBAL DEFAULT    1 main

Copy the code

The contents of the symbol table:

  • Global symbols defined in this object file can be referenced by other object files. For example, global_init_var, main, func1, etc.
  • Global symbols defined in external files and referenced in this file are also called external symbols. For example, printf.
  • Local symbol (static_init_var).
  • Section name.

Description:

  • Value is the Value of the symbol, which in the case of a variable or function (in the object file) is its address (the offset address of the segment). For example: Global_init_var is the first data in the. Data segment so its Value is 0x000000 and has a size of 4 bytes. Static_init_var is the second data in the. Data segment and its Value is 0x000004.

    Hex dump of section '.data': 0x00000000 54000000 55000000 T... U...Copy the code
  • In the case of an executable, Value is the last virtual address of the symbol.

    54: 0000000000001129    24 FUNC    GLOBAL DEFAULT   14 func
    55: 0000000000001170   101 FUNC    GLOBAL DEFAULT   14 __libc_csu_init
    56: 0000000000004018     0 NOTYPE  GLOBAL DEFAULT   24 _end
    57: 0000000000001040    47 FUNC    GLOBAL DEFAULT   14 _start
    58: 0000000000004010     0 NOTYPE  GLOBAL DEFAULT   24__bss_start
    59: 0000000000001141    37 FUNC    GLOBAL DEFAULT   14 main
    Copy the code
  • Type refers to the Type of the symbol, SECTION or OBJECT or function FUNC.

  • The Ndx refers to the subscript of the segment where the symbol is located. The initialized global variable (global_init_var) and local static variable (static_init_var) are in the.data segment (3). Printf belongs to an undefined external variable (UND). Uninitialized global variables (global_uninit_var) are placed in the COMMON block because space is allocated in the.bss section when the final link is an executable (also in ELF file format).

6. Strong and weak symbols

extern int ext;                   // External variables are neither strong nor weak

int weak;                         // weak symbol for initialized global variables
int strong = 1;                   // Initializes the global variable with a strong sign
__attribute__((weak)) weak2 = 2;  // Specify a weak symbol

int main(a)
{
  return 0;
}
Copy the code
  • Strong symbols are not allowed to be defined more than once
  • A symbol is strong if it is strong in one object file and weak in all other object files.
  • If a symbol is weak in all object files, the one that takes up more space is selected.
  • Strong symbolic references report an error if the symbol is not defined if the link is not defined
  • Weak symbolic references do not fail if they are undefined, and the linker defaults to either 0 (weak global_uninit_var) or a special value.
  • The variable A, b is on the stack, b is uninitialized, and its value is random.