In this article, we’ll walk through the process of compiling links through examples.

Compile link procedure

The compilation of the program is staged, mainly in the following stages:

  • Pretreatment: Prepressing
  • Compiler: Compilation
  • Assembly: the Assembly
  • Link: Linking

The whole process is illustrated as follows:

Next, let’s use the simplest C file to illustrate the source-to-executable process

Source code to executable file

There are many ways to manually or automatically compile source code on macOS, and here we use Clang to demonstrate the process. For simplicity, just implement a simple hello.c program

#include <stdio.h>
#define MAX_AGE 120

int main(a) {
	printf("%s\n"."hello world~");
    printf("%d\n", MAX_AGE); // Simple output
	return 0;
}
Copy the code

In general, the hello.c source code can be directly invoked by clang’s compile command to output the executable a.out through the compile link.

$ clang hello.c Compile command
$ ./a.out 	# a.out execution
hello world~
120
Copy the code

The entire compilation process of the program can be displayed by command

$ clang -ccc-print-phases hello.c
+- 0: input, "hello.c", c
+- 1: preprocessor, {0}, cpp-output
+- 2: compiler, {1}, ir
+- 3: backend, {2}, assembler
+- 4: assembler, {3}, object
+- 5: linker, {4}, image
6: bind-arch, "x86_64", {5}, image
Copy the code

If we had compiled directly with Clang without entering any parameters, we would have skipped the details, but today we’ll explore them step by step.

Clang instruction description

Since we need to use clang, which supports a number of directives, let’s start with a brief overview. You can see the description of man Clang, excerpted from it, and see the official documentation for detailed directives: Clang – The Clang C, C++, and Objective-C Compiler

$ man clang
NAME
       clang - the Clang C, C++, and Objective-C compiler

DESCRIPTION
       clang  is a C, C++, and Objective-C compiler which encompasses preprocessing, pars-
       ing, optimization, code generation, assembly,  and  linking.   Depending  on  which
       high-level mode setting is passed, Clang will stop before doing a full link.  While
       Clang is highly integrated, it is important to understand the  stages  of  compila-
       tion, to understand how to invoke it.  These stages are:
...
       to use the static analyzer.
Copy the code

It can be seen that the instructions are mainly divided into the following parts

  • Driver
  • Preprocessing
  • Parsing and Semantic Analysis
  • Code Generation and Optimization
  • Assembler
  • Linker
  • Clang Static Analyzer

What we focus on is these Options, which will be the key parameter in our demonstration below.

OPTIONS
   Stage Selection Options
       -E     Run the preprocessor stage.

       -fsyntax-only
              Run the preprocessor, parser and type checking stages.

       -S     Run  the  previous stages as well as LLVM generation and optimization stages
              and target-specific code generation, producing an assembly file.

       -c     Run all of the above, plus the assembler, generating a  target  ".o"  object
              file.

       no stage selection option
              If no stage selection option is specified, all stages above are run, and the
              linker is run to combine the results into an executable or shared library.
Copy the code

Other Common parameters

-o <file> Write output to file. # Specify an output file -v Show commands to run and use verbose outputCopy the code

Let’s get started

Compilation process depth

1. Pretreatment

The preprocessing phase typically handles precompiled instructions starting with # in code, such as

  • Remove the comment
  • delete#defineAnd expand the macro definition
  • will#includeInclude files to insert into the instruction location, etc. (i.e. replace macros, remove comments, expand header files, produce.i files)

Preprocess hello.c above

$ clang -E hello.c -o hello.i
Copy the code

The output hello. I file is as follows. Because there are many files, only some of them are extracted here

# 1 "hello.c"Extern int __vsnprintF_chk (char * RESTRICT, size_t, int, size_t, const char * RESTRICT, va_list);# 408 "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/stdio.h"2, 3, 4,
# 2 "hello.c" 2


int main() {
 printf("%s\n", "hello world~");
    printf("%d\n", 120);
 return 0;
}
Copy the code

As you can see from the above, the comments are gone, and the macro MAX_AGE we defined is converted directly to 120, but the code basically doesn’t change much

2, compile,

During the compile phase, the precompiled file (hello.i) is normally processed as follows:

  • Lexical analysis
    • Cut the code into piecesToken, such as size brackets, and mark their position
  • Syntax analysis
    • Verify syntax by combining sequences of words into phrases to form an abstract syntax treeAST
  • Generate intermediate codeIR
    • Will be generated in the previous stepASTTraversal translate intoLLVM IR
  • BC intermediate code generation (optional)
    • After Xcode7 openbitcode, Apple makes further optimization by optimizing the postIR, can generate intermediate code.bc
  • Assembly code generation
    • Through one by onePassTo optimize and finally generate assembly code

At the end of AST construction, IR file is output, which is the intermediate code generated by the front end of the compiler. By setting the compilation parameter -s, the. I file can be directly converted into assembly language to generate.

$ clang -fmodules -fsyntax-only -Xclang -ast-dump hello.c # output AST
$ clang -S -emit-llvm hello.c Generate an intermediate IR file
$ clang -S hello.i -o hello.s # Direct compilation generates assembly
Copy the code

The output assembly code hello.s reads as follows:

	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 11, 0	sdk_version 11, 3
	.globl	_main                           ## -- Begin function main
	.p2align	4, 0x90
_main:                                  ## @main
	.cfi_startproc
## %bb.0:
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	subq	$16, %rsp
	movl	$0, -4(%rbp)
	leaq	L_.str(%rip), %rdi
	leaq	L_.str.1(%rip), %rsi
	movb	$0, %al
	callq	_printf
	leaq	L_.str.2(%rip), %rdi
	movl	$120, %esi
	movl	%eax, -8(%rbp)                  ## 4-byte Spill
	movb	$0, %al
	callq	_printf
	xorl	%ecx, %ecx
	movl	%eax, -12(%rbp)                 ## 4-byte Spill
	movl	%ecx, %eax
	addq	$16, %rsp
	popq	%rbp
	retq
	.cfi_endproc
                                        ## -- End function
	.section	__TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
	.asciz	"%s\n"

L_.str.1:                               ## @.str.1
	.asciz	"hello world~"

L_.str.2:                               ## @.str.2
	.asciz	"%d\n"

.subsections_via_symbols
Copy the code

3, assembly

In this stage, the assembly code hello.s generated in the previous stage is converted into a platform-specific object file, also known as an object file. The output format is.o

$clang -c hello.s -o hello.o
Copy the code

At this point, the object file is output, with the following internal sections:

cffa edfe 0700 0001 0300 0000 0100 0000
0400 0000 0802 0000 0020 0000 0000 0000
1900 0000 8801 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
c000 0000 0000 0000 2802 0000 0000 0000
c000 0000 0000 0000 0700 0000 0700 0000
0400 0000 0000 0000 5f5f 7465 7874 0000
0000 0000 0000 0000 5f5f 5445 5854 0000
Copy the code

As you can see, the inside is binary code, just to16There it is. There it iscffa edfeIs there a kind of familiar feeling 😆. You guessed it, this kind ofThe target fileIn fact, there are specific results, which are typicalMach-OFile, drag inMachOViewThat is visibleAt this point, the object file is generated, although it is machine code, but it cannot be executed directly. All resources must be linked.

4, links,

Link phase, which typically links the object file into an executable. In this phase, multiple object files are merged into one executable file or dynamic library file, and the output format is: .out or.so usually, we do multi-file or module development, and share code through library files, etc., so different object files may have variables that reference each other or call functions. A linker is a program that links different object files (.o files) together. For example, we often call methods and variables in the Foundation framework and UIKit framework, but these frameworks are not in the same object file as our code, which requires the linker to link them to our own code. Remember that the hello.c file above refers to the system library function printf, which cannot be used without linking. Let’s try linking to the hello.o file

$ ld hello.o
Undefined symbols for architecture x86_64:
  "_printf", referenced from:
      _main in hello.o
ld: symbol(s) not found for architecture x86_64
#So this is an external symbol, and we need to specify the library that _printf is in in order to link

ld hello.o /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib/libc.tbd

#The libc.tbd path can be obtained from the command line, simplifying the command above
ld hello.o `xcrun --show-sdk-path`/usr/lib/libc.tbd

#Execute the generated A.out file
$ ./a.out
hello world~
120
Copy the code

At this point, our object file is linked outa.outIt’s an executable program, but it’s also a MachO file, get rid of that.outHave a lookTo summarize, the compiler does two important things when it compiles code:

  • Compile the source code into assembly
  • Categorize and summarize symbols

In the next video we’ll talk about symbols.

  • Stackoverflow.com/questions/9…