The main steps from code to program are as follows:

  • precompiled
  • compile
  • assembly
  • link

1 the precompiled

The function of precompilation is mainly to read the source program, the pseudo-instructions and special symbols in the source program are processed, including

  • The macro instruction #define Name TokenString replaces Name with TokenString
  • Conditional compilation instruction #ifdef
  • The header file contains the directive #include
  • The special symbol LINE identifies the LINE number, FILE source program name, and comment

After precompilation, an output file with the same meaning but different content as the unpreprocessed source file, without macros, conditional compilation instructions, header files including instructions, and other special symbols, will be used as input for compilation

2 compile

Macroscopically speaking, compilation refers to converting a program written in a high-level language into a program written in a low-level language. Here, it refers to converting a program written in C language into an assembler. The compilation process can be further divided into:

  • Lexical analysis
  • Syntax analysis
  • Semantic analysis
  • Intermediate code generation
  • Intermediate code optimization
  • Object code generation

The above process is a classic compilation process. Not all compilers are divided into the above stages. Some compilations do not generate intermediate code, and some compilations do not optimize

  • The difference between an interpreter and a compiler
    • The interpreter translates as it executes and finally gets the result of the program’s execution, without generating object code
    • The compiler only translates the source program into the object program and outputs the object program equivalent to the source program

2.1 Lexical analysis

The task of lexical analysis is to read the preprocessed source program from left to right character by character, scan and decompose the characters that constitute the source program, and identify the words one by one.

Word: a group of logically connected characters with a collective meaning, such as keywords, operators, delimiters, constants, etc

Word symbols output by a lexical analyzer can be expressed in the following binary form (word category, value of the word itself)

2.2 Grammar Analysis

Parsing is the decomposition of word sequences into various grammatical phrases, such as statements and expressions, based on lexical analysis.

According to the grammar rules of the language, that is, the rules describing the structure of the program, it can determine whether the entire input string constitutes a grammatically correct program through grammar analysis

Grammatical analysis can be divided into top-down analysis and bottom-up analysis

  • Top-down analysis: Starting from the start symbol of the grammar, consider how to use the current input symbol to uniquely determine which production to replace the corresponding non-terminal in the following derivation, such as LL analysis, in short, start from the start symbol, see if the target string can be obtained

  • Bottom-up analysis: at each step, select a routable substring from the current sentence pattern and reduce it to a non-terminal symbol, such as operator precedence analysis, LR analysis

2.3 Semantic Analysis

Semantic analysis examines source programs for semantic errors and collects type information for the code generation phase. Such as checking that each operator has the correct operand; For example, cast an operator’s operand

2.4 Intermediate code generation

Intermediate code: Simple structure, meaning clear sign system, this system can be set to mark a variety of forms, there are two design principles, one is easy to generate, one is easy to translate it into target code, a lot of compiler USES approximately three address instruction among four type code, the form of a type of this kind of quaternary (operators and operands 1, Operand 2, result)

  • The role of intermediate code
    • It is used as a bridge between the source language and the target language, avoiding the large semantic span between the two, and making the logical structure of the compiler simpler and clearer
    • It facilitates redirection of the compiler
    • It is good for objective independent optimizations

2.5 Code Optimization

Code optimization is to change or transform the intermediate code generated in the previous stage, in order to make the generated code more efficient, save time and space

  • Type of code optimization
    • Local optimization
      • Delete a common subtable
      • Combined known quantity
      • Facsimile transmission
      • Delete useless assignments
    • Loop optimization
      • Code outside the lift
      • The strength weakening
      • Transform the loop control conditions
    • Global optimization

2.6 Object code generation

The object code generation task is to replace intermediate code with absolute instruction code on a particular machine or instruction code that can be relocated or assembly quality code

3 assembly

The purpose of assembly is to translate the assembly code into machine instructions, just do the translation work, get obJ files

4 links

Linking is the process of linking the source files of different modules to different target files obtained in the previous process into executable files, which is the process of our executable program. Linking is divided into static link and dynamic link

  • Static link: before the program runs, link each target module and its required library functions into a complete assembly module, which will not be disassembled later. A complete load module, also known as an executable, is formed by linking first. It is usually no longer taken apart and loaded into memory when it needs to be run
  • Dynamic link: get a group of target modules after compilation, load into memory, using the way of loading and linking. For example, find the corresponding external call module when you want to call it, load it into memory,
    • Easy to modify and update
    • Facilitate the realization of the target module sharing