Reference:

  • Software Engineering Research Module:blog.csdn.net/LoveLion
    • Eclipse AST overview of AST and ASTView
    • [Eclipse AST] Obtaining and accessing the AST
    • [Eclipse AST] Create an AST
    • [Eclipse AST] Changes to the AST
  • Redy Syntax Analysis – An introduction to abstract syntax trees
  • Detail AST abstract syntax tree

AST profile

Abstract syntax tree (AST) ** is a tree representation of the abstract syntax structure of the source code. Each node in the tree represents a structure in the source code. This is abstract because the abstract syntax tree does not represent every detail of the occurrence of the real syntax. Nested parentheses are implicit in the structure of the tree and are not represented as nodes. The abstract syntax tree does not depend on the source language syntax, that is to say the grammar analysis phase method context, no text, because when writing grammar, often to equivalent transformation grammar (to eliminate left recursion, backtracking, ambiguity, etc.), this will give the grammar analysis is introduced into some redundant components, adverse effects on the later stages, even can make us a stage becomes chaos. For this reason, many compilers often have to construct parse trees independently to establish a clear interface for the front end and back end. Abstract syntax trees are widely used in many fields, such as browsers, intelligent editors, and compilers.

The instance

  1. Four operational expressions

Expression: 1+3*(4-1)+2

  1. xml
<letter>
  <address>
    <city>ShiChuang</city>
  </address>
  <people>
    <id>12478</id>
    <name>Nosic</name>
  </people>
</letter>
Copy the code

  1. Procedure 1
whileb ! =0
{
    if a > b
        a = a-b
    else
        b = b-a
}
return a
Copy the code

  1. Program 2
sum=0
for i in range(0.100)
    sum=sum+i
end
Copy the code

According to Eclipse AST definitions, each node in the diagram corresponds to a node name, each child node also plays a role (identity) in the parent node to which it is attached, and leaf nodes are basically names, operators, and literals. Table 1 shows the node names of the four children of the ForStatement node in Figure 1 and the roles attached to the parent node.Table 1 Node names and roles of each child node

Text -> Process of abstracting syntax trees

  1. Lexical analysis: text -> Token list
    • Remove whitespace, sort tokens, remove whitespace, and then sort tokens, which belong to syntax keywords, which belong to operators, which belong to statement cutoff positions, which belong to data
  2. Syntax analysis: Token list -> syntax binary tree
    • Scan the token stream and analyze its syntax. This step should be to analyze a token stream. The final statement details the execution rules, then uses a combination of inverse Polish expressions to form a binary tree, which is merged from the bottom up

The first step: Lexical analysis, also known as scanning scanner. It reads our code and combines them into tokens according to predetermined rules. It also removes whitespace, comments, and so on. Finally, the entire code is split into a list of tokens (or a one-dimensional array).

const a = 5;
/ / convert
[{value: 'const', type: 'keyword'}, {value: 'a', type: 'identifier'}, ...]
Copy the code

When lexically analyzing the source code, it will read the code letter by letter, so it is figuratively called scanning-scans. When it encounters Spaces, operators, or special symbols, it considers a statement to be complete. Step 2: Parsers, also known as parsers, convert the lexically parsed array into a tree form and, at the same time, validate the syntax. Throw syntax errors if there are any.

[{value: 'const', type: 'keyword'}, {value: 'a', type: 'identifier'}, ...]
// Parse the tree form
{
   type: "VariableDeclarator", 
   id: {
       type: "Identifier",
       name: "a"},... }Copy the code

When generating a tree, the parser removes unnecessary tokens (such as incomplete parentheses), so the AST does not match 100% of the source code. A parser that covers 100% of all code structures is called a CST (Concrete syntax tree).

Use ASTView to browse the abstract syntax tree

IDEA also provides a plugin for JDT AstView to browse abstract syntax trees. Here is a screenshot of Eclipse JDT AstView to have a perceptual understanding of AST

【Eclipse AST】 Overview of AST and ASTView