1. The meaning of generating machine code with C language

The previous operating system development we use assembly language, and the use of assembly language development brings the problem is too large code. For example, if we need to calculate the sum from 1 to 100, then in assembly language we need the following code:

RESULT	dd	0
mov	ax,	0

SUM:
	inc word[RESULT]
	add ax, word[RESULT]	
	cmp word[RESULT], 101
	jne SUM
Copy the code

At the same time, we need to pay attention to whether the size of the register matches the data, and whether the memory space used is consistent with the space we open up. With C, we just need the following code, without worrying about the use of registers and memory space, and leave everything to the compiler:

int result = 0;
for(int i=1; i<=100; i++) { result = result + i }Copy the code

2. Use the GCC compiler to compile C language into machine code

In compiling a normal C file, we simply compile it into an executable using the GCC compiler. Such as:

gcc script.c -o run
Copy the code

The specific process is as follows:

The file script.o formed after compilation is similar to our previous loader.bin binary file, but the difference is that the script.o file contains some data structures required for running in the current system, which are not what we need, so we will delete these data structures in the later stage. Let’s take the simplest script.c file as an example:

int main(a)
{
	return 0;
}
Copy the code

We can get the script.o target binary directly by following these steps:

gcc -mcmodel=large -fno-builtin -m64 -c script.c
Copy the code

Where -mcmodel=large sets the memory range to a large range. This option does not restrict the address and size of the code section. -fno-builtin does not recognize built-in functions that are not prefixed with __builtin_. -m64 Generates files that can run on 64-bit systems. At this point, you can view the contents of the target binary file using the objdump command from the command line. 55, 48, 89, e5… That’s the binary machine code we need.

bash# objdump -d script.o

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <main>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	b8 00 00 00 00       	mov    $0x0,%eax
   9:	5d                   	pop    %rbp
   a:	c3                   	retq
Copy the code

We can use XXD to view the pure binary display of the binary object file:

bash# xxd script.o00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............ 00000010: 0100 3e00 0100 0000 0000 0000 0000 0000 .. >... 0000 0000 0000 0000 0000 0000 0000................ 00000030: 0000 0000 4000 0000 0000 4000 0b00 0a00 .... @... @... 00000040: 5548 89e5 b800 0000 005d c300 4743 433a UH....... ] . GCC: 00000050:2028 474E 5529 2034 2e38 2E35 2032 3031 (GNU) 4.8.5 201 00000060: 3530 3632 3320 2852 6564 2048 6174 2034 50623 (Red Hat 4 00000070: 2e38 2e35 2d33 3929 0000 0000 0000.8.5-39)........ 00000080: 1400 0000 0000 0000 017a 5200 0178 1001 ......... zR.. x.. 00000090: 1b0c 0708 9001 0000 1c00 0000 1c00 0000 ................ 000000a0: 0000 0000 0b00 0000 0041 0e10 8602 430d ......... A.... C. 000000b0: 0646 0c07 0800 0000 0000 0000 0000 0000 .F.............. 000000c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ .Copy the code

The offset of the machine code we need is 40 points, but the whole object file contains not only the machine code we need, but also a lot of data structures. So to get the machine code we need, we need to use the objCopy tool:

objcopy -I elf64-x86-64 -R ".eh_frame" -O binary script.o script.bin
Copy the code

XXD = XXD = XXD = XXD

bash# xxd script.bin00000000: 5548 89e5 b800 0000 005d c3 UH....... ] .Copy the code

So the script.bin we get is the machine code we need to run on our operating system.