Detail how Javac is compiled

This article has participated in the good article call order activity, click to see: [back end, big front end double track submission, 20,000 yuan prize pool for you to challenge!]

preface

When I talked earlier about Android resource packaging, I mentioned that the overall process of Android compilation and packaging into APK, from Java files to class files is a very important part of it. This article will explore the conversion process, through understanding these, we can have a clearer understanding of the compilation process of Java files, but also can help us to have a deeper grasp of APT related knowledge.

The execution flow of Java code

Java code is executed in three main steps:

Java files to Class files (compile time)
Loading the class file into memory via the classloader (load run)
Execute code (run time)

What is Javac? Overview of the code compilation process

paraphrase

Javac is the full name for Java Compliler, which is a compiler that can convert one language specification into another. Javac’s job is to convert source code into binary code that the JVM can recognize, which is ostensibly a class file. In effect, you convert a Java source file into a series of binary numbers whose meaning only the JVM can recognize.

How to obtain javac source code

Click to download the Javac source website

Click zip to download the source code,

Decompress and import using the compiler

With the source code, we can play happily, step by step to uncover its veil.

An overview of the compilation process

The entrance to the javac compiler action com. Sun. View javac. Main. JavaCompiler, main code logic in the compile () and compile2 () method,

com.sun.tools.javac.main.JavaCompiler.java public void compile(List<JavaFileObject> sourceFileObjects, List<String> classnames, Iterable<? extends Processor> processors) { ... Try {// Initialize plug-in annotations (processors); // These method calls must be chained to avoid memory Leaks delegateCompiler = //processAnnotations-- Perform annotations processing //enterTrees = processtrees (enterTrees(stopIfError(compilestr.parse, parseFiles(sourceFileObjects))), classnames); / / analysis and code generation delegateCompiler.com pile2 (); delegateCompiler.close(); elapsed_msec = delegateCompiler.elapsed_msec; } catch (Abort ex) { if (devVerbose) ex.printStackTrace(System.err); } finally { if (procEnvImpl ! = null) procEnvImpl.close(); } } private void compile2() { try { switch (compilePolicy) { ... case BY_TODO: while (! Todo.isempty ()) //generate-- generate bytecode //desugar-- generate bytecode //flow-- data flow analysis //attribute-- tag check generate(desugar(flow(attribute(todo.remove())))); break; }}... }Copy the code

These are the steps for compiling javAC

Parse and populate the symbol table
The annotation processing of a plug-in annotation processor
Analysis and bytecode generation

Here’s a diagram:

Next, we will explain each one in detail

Parse and populate the symbol table

Lexical analysis

Lexical analysis: Converts the source code file into token stream through the lexical analyzer, that is, divides the source code into one token,

Canonical tokens include:

Java keywords: package, import, public, private, class, int, for, and so on
Custom words: package name, class name, method name, variable name, etc
Symbols: +, -, *, /, (, {, &, etc

Javac source, the lexical analysis process by com. Sun. View javac. The parser class implements under the package

Lexer and Parser: Javac’s main Parser interface classes
Scanner: The default implementation of Lexer, which reads individual characters from Java source files one by one and categorizes the lexics it reads
JavacParser: Specifies which words are compliant with Java language specifications
Token: Specifies a valid keyword for all Java languages
Names: Stores and represents the resolved lexicon

See how TokenKind defines the keyword in Token:

Lexical analysis is done in the parseCompilationUnit method of JavacParser:

public JCTree.JCCompilationUnit parseCompilationUnit() { Token firstToken = token; JCExpression pid = null; JCModifiers mods = null; boolean consumedToplevelDoc = false; boolean seenImport = false; boolean seenPackage = false; List<JCAnnotation> packageAnnotations = List.nil(); If (token. Kind == MONKEYS_AT) mods = modifiersOpt(); If (token. Kind == package) {seenPackage = true; if (mods ! = null) { checkNoMods(mods.flags); packageAnnotations = mods.annotations; mods = null; } nextToken(); pid = qualident(false); accept(SEMI); } ListBuffer<JCTree> defs = new ListBuffer<JCTree>(); boolean checkForImports = true; boolean firstTypeDecl = true; while (token.kind ! = EOF) { if (token.pos > 0 && token.pos <= endPosTable.errorEndPos) { // error recovery skip(checkForImports, false, false, false); if (token.kind == EOF) break; } // Parse import declaration if (checkForImports && mods == null && token. Kind == import) {seenImport = true; defs.append(importDeclaration()); } else {// parse class body Comment docComment = token.comment(commentstyle.javadoc); if (firstTypeDecl && ! seenImport && ! seenPackage) { docComment = firstToken.comment(CommentStyle.JAVADOC); consumedToplevelDoc = true; } JCTree def = typeDeclaration(mods, docComment); if (def instanceof JCExpressionStatement) def = ((JCExpressionStatement)def).expr; defs.append(def); if (def instanceof JCClassDecl) checkForImports = false; mods = null; firstTypeDecl = false; }}}Copy the code

The lexical analysis starts with the first character of the source file, and finds the package, import, class definition, attribute and method definition at a time according to the Java syntax specification, and finally builds an abstract syntax tree.

Here’s an example:

package test; 
public class Cifa { 
    int a; 
    int b = a + 1;
}
Copy the code

Take a look at this code, after the lexical analysis

In this token stream, in addition to the Java-defined keywords, there is a special token: token.identifier, which represents custom names such as class names, method names, variable names, and so on. When Javac performs lexical analysis, JavacParser controls which tokens appear in what order and where according to the Java language specification.

Syntax analysis

Syntax analysis: The token stream is converted into an Abstract Syntax Tree using a parser, which can be understood as combining the tokens obtained through lexical analysis into a sentence and checking whether there is any Syntax problem.

An AST is a tree representation of the syntax structure of a program code. Each node of the syntax tree represents a syntax structure in the program code, such as packages, types, modifiers, interfaces, return values, and so on.

Com. Sun. View javac. The tree. The TreeMaker – all syntax node is generated by it, according to the node Name object to construct a grammar
Com. Sun. View javac. Tree. JCTree $JCIf – to achieve a every syntax nodexxxTreeInterface, which is inherited fromTreeInterfaces such as IfTree syntax nodes represent an expression of type if,public static class JCIf extends JCStatement implements IfTree {}All JCxx classes are defined in JCTree as a static inner class

Com. Sun. View javac. Tree. JCTree three properties
- Tree tag: Each syntax node is represented as an integer, and the next node increments the last one by one;
- Pos: is also an integer that stores the starting position of the syntax node in the source code. A file location is 0, and -1 indicates that it does not exist
- Type: What Java type does it represent, such as int, float, string, etc.

Example of syntax analysis:

package compile; public class Yufa { int a; private int b = a + 1; //getter public int getB() { return b; } //setter public void setB(int b) { this.b = b; }}Copy the code

This code after syntax analysis to get the abstract syntax tree

Each branch from JCClassDecl is a complete block of code. The four branches above correspond to two lines of attribute operation statement and two method block code blocks in our code, which in effect completes the function of the parser: the Token words are made into a sentence (or a sentence block).

Fill the symbol table

After the lexical analysis and syntax analysis are completed, it is the process of filling the symbol table, which is a table composed of a group of symbol addresses and symbol information. The information registered in the symbol table is needed at different stages of compilation. In semantic analysis, the content registered in symbol table is used for semantic checking and intermediate code generation. During the object code generation phase, the symbol table is the basis for address assignment when the symbol name is assigned.

In the source of javac, symbol table filled by com.sun.tools.javac.com p.E nter class implements

Annotation handler

Since jdk1.5, Java has provided support for annotations, which work at runtime just like normal Java code. The JSR-269 specification, implemented in JDK1.6, provides a set of apis for plug-in annotation handlers that process annotations at compile time. It can be thought of as a set of compiler plug-ins that can read, modify, and add any element in the abstract syntax tree. If these plug-ins make changes to the syntax tree during annotation processing, the compiler will go back to parsing and populating the symbol table until no further changes are made to the syntax tree by any of the plug-in annotation processors, each Round known as a Round.

With the annotation handler API, we can interfere with the behavior of the compiler. Any element in the syntax tree, even code comments, can be accessed in the plug-in, so there are many things you can do with the plug-in annotation handler. I won’t go into details here.

Semantic analysis

Grammatical analysis, the compiler will obtain the abstract syntax tree representation of the program code, the syntax tree can be said a right source program abstract structure, but there is no guarantee that the source program is logical, and the main task of the semantic analysis is right on the structure of context about the properties of the source program of review, such as type of review.

int a = 1; boolean b = false; char c = 2; . Int d = a+c; int d = b+c; char d = a+c;Copy the code

As in the above code, if the above three assignment operations occur, they all form a properly structured syntax tree, but only the first one is semantically correct, and the other two are illogical and cannot be compiled.

Mark inspection

In the compilation process of JavAC, semantic analysis is divided into two steps: annotation check and data and control flow analysis, namely the Attribute and flow methods mentioned at the beginning.

Annotation checks whether a variable has been declared before it is used, and whether the data type between the variable and the assigned value matches.

For details, Check Attr and Check in the COMP package.

Data and control flow analysis

Further validation of program context logic can check, for example, whether a local variable is assigned before it is used, and whether each path of a method has a return value. Whether all checked exceptions are handled correctly.

The purpose of data and control flow analysis at compile time is basically the same as that of data and control flow analysis at class load time, but the scope of verification is different. Some verification items can only be performed at compile time or run time.

Title, method of sugar

Syntactic sugar: Syntactic sugar is the syntax that makes it easier to express an operation in a programming language. It makes it easier for the programmer to use the language: the operation can be made clearer, more convenient, or more appropriate to the programmer’s programming habits.
Syntax sugar: Syntax sugar exists primarily for developer convenience, but is not recognized by the JVM virtual machine and is restored to the simple base syntax at compile time, in the Javac source codecom.sun.tools.javac.main.JavaCompilerThere is a step in thisdesugar()It’s specifically designed to solve sugar. Which grammatical sugars will be removed?Generic erasing, automatic unboxing and boxing, for-each enhanced for loops, method edge length parameters, inner classesAnd so on, concrete implementation can refer to Lower and TransTypes two classes

Bytecode generation

Bytecode generation is the last stage of the javac compiler process in javac source by com. Sun. View javac. JVM. Gen class to complete

The bytecode generation phase involves not only writing to disk the information generated in the previous steps: syntax trees, symbol tables, and bytecode, but also a small amount of code addition and conversion by the compiler.

The instance constructor init() and class constructor clinit() methods of the class are added to the syntax tree at this stage. In addition to generating the constructor, there are other optimizations, such as replacing the string + operation with the StringBuffer or the StringBuilder’s Append operation.

Completed the syntax tree traversal and after adjustment, would have to fill all the required information symbol table to com. Sun. View javac. JVM. ClassWriter class, by this class writeClass output byte code () method, to generate the final class files.

public JavaFileObject writeClass(ClassSymbol c) throws IOException, PoolOverflow, StringOverflow { JavaFileObject outFile = fileManager.getJavaFileForOutput(CLASS_OUTPUT, c.flatname.toString(), JavaFileObject.Kind.CLASS, c.sourcefile); OutputStream out = outFile.openOutputStream(); try { writeClassFile(out, c); if (verbose) log.printVerbose("wrote.file", outFile); out.close(); out = null; } finally { if (out ! = null) { // if we are propagating an exception, delete the file out.close(); outFile.delete(); outFile = null; } } return outFile; // may be null if write failed }Copy the code

conclusion

Aside from the annotation handler, let’s take a look back at the overall javac compilation process:

As you can see from the figure above, compilation can be roughly divided into three parts:

Lexical analysis: The token stream is obtained through this analyzer
Syntax analysis: The abstract syntax tree is obtained by the parser, and the structure of the syntax tree is verified
Semantic analysis: obtain annotation syntax tree by semantic analyzer and verify the logic of syntax tree

After the above three steps, the bytecode file is finally generated and the compilation is complete.

The javAC compilation process is a complex process, and if we can understand it, we have a great help to master Java and the JVM virtual machine.

The resources

Understanding Advanced Features and Best Practices of the Java Virtual Machine-JVM
www.cnblogs.com/wade-luffy/…
Blog.csdn.net/u012484172/…
Blog.csdn.net/u012484172/…