What is a virtual machine?

“Virtual machine” is a very big concept, from the literal meaning of understanding, “virtual machine” is “virtual computer”, when we learn server-side programming, I believe that most of the students have been exposed to virtual machine. There is such a scenario, since most of the computers we use in our daily life are Windows operating system, but the vast majority of server software is running on Linux system, assuming that we carry out programming on Windows, we cannot directly test on Windows, which is very inconvenient. Based on such a scenario then there is a virtual machine, its role is to run Linux system on the basis of Windows system, and then we can be very convenient in the Windows system test Linux system procedures. The Linux operating system is virtualized by some technical means, and the process is too complicated to be described in a few words.

The virtual machines I want to talk about today are slightly different, but they solve the same problems. The above virtual machine, it virtual out of a complete operating system, I call it “operating system level virtual machine”. The virtual machine we are going to talk about today is aimed at programming languages. The effect it can achieve is that the same code runs on different operating systems and outputs the same results. It can be written once and run everywhere. We are very familiar with Java, PHP, Python and other programming languages, in fact, are based on virtual machine languages, they are cross-platform, we only need to write the code once, can run on different operating systems, and output almost identical results.

Those of you who have known about system programming know that different operating systems may provide different “system APIs” for the same function. For example, both Windows and Linux systems provide network listening APIs, but their corresponding SOCKET APIs are different. Suppose we use a platform-specific programming language (e.g. C, C++), we must pay attention to such a difference in programming, and for different operating systems to do corresponding compatibility processing, otherwise the program on Linux will run normally, but Windows will report an error. There are so many similar differences that you can only read the corresponding system programming manual for details. Some system apis completely different, and some just individual parameters are different, the same method name, programmers need to watch these when writing code, to write robust cross-platform code, this is very difficult for beginners, and as a result, the programmer will need to spend a large part of the energy on the compatibility issue, Instead of focusing on the actual functionality.

With virtual machines, this problem is gone. The role of virtual machine is simply the intermediary agent. For example, when we first come to a big city to rent a house, there are so many landlords in big cities like Beijing, Shanghai and Guangzhou. If there is no real estate agent (virtual machine), we need to connect with N landlords before we can rent a suitable house. With the real estate agent (virtual machine), we only need to tell the real estate agent (virtual machine) what kind of house we want to rent, and then the real estate agent (virtual machine) will coordinate with various landlords, so that we can rent the right house. The process is different, but the final result is the same. Similarly, taking Socket API call as an example, we give the code prepared to the virtual machine, and then the virtual machine is responsible for calling the system API, which is equivalent to adding a layer of intermediary agent in the middle. The virtual machine will choose the correct Soekct API according to the operating system to help us complete the final function. The advantage of this is that programmers no longer need to pay attention to the details of the underlying API, but can focus on the writing of real functions. The virtual machine helps us shield the details of the underlying system API, and the programming threshold is greatly reduced, and the code robustness is also greatly improved.

The execution of PHP

PHP explains the execution process

For those of you who know PHP, PHP is an interpreted language, also known as a scripting language, characterized by its lightness and ease of use. Traditional programming languages need to be compiled, linked, and then executed and output before they can be run. While the scripting language (PHP) omits this process, directly through the shell command can be executed and output the corresponding results, very lightweight, intuitive, easy to use. To tell you the truth, I also learned Java when I started programming. Why did I end up in PHP? Maybe it was these characteristics that attracted me. I just talked about the advantages of PHP, but most of the time there are some gains and some losses, and I think it’s the same with programming languages, PHP is so lightweight and easy to use that it has to be at the expense of some advantages, otherwise why wouldn’t other programming languages do the same. Next, let’s talk about the implementation process of PHP. I think once we understand the implementation process of PHP, we can understand the trade-offs of the PHP language design.

Here’s what PHP is doingOpcacheThe main process by which a program is run after caching.



Figure 1

As can be seen from Figure 1, after loading the PHP code file, the word symbols (tokens) are extracted from the code first through the lexer (RE2C/LEX), and then through the parser (Yacc/Bison). After finding the syntax structure from the tokens, the abstract syntax tree (AST) is generated. Then the static compiler generates OPCODE, and finally the interpreter simulates machine instructions to execute each OPCODE.

In addition, when PHP has Opcache open, ZendVm will cache Opcode and store it in shared memory. Not only that, but ZendVM also optimizes the compiled Opcode using methods inlining, constant propagation, de-duplication, etc. With Opcache, not only can you omit lexing, parsing, static compilation, etc., but OpCode is also optimized to run programs faster than when they were first executed.

This is how PHP interprets the execution. Although it is very programmer-friendly and avoids the static compilation step, it is actually done by the virtual machine for us, sacrificing some performance in exchange for lightness, ease of use and flexibility. Among them, lexical analysis, syntax analysis, static compilation, interpretation execution these processes are completed at execution time.

Compiled language execution procedures

Now that we’ve looked at the execution of interpreted languages, let’s look at this for comparisonCompiled languageTo see how it compares to interpreted languages.



Figure 2

fromFigure 2We can see that the execution process in the dotted box includes: lexical analysis, syntax analysis, and compilation. These three steps are also available in PHP interpretation execution. The only difference is that C/C++ completes these three steps in advance by the compiler during compilation, which can save a lot of time and overhead at runtime. After the assembly code is generated, the fourth step islinkHere the executable file refers to the binary machine code, the CPU can be directly executed without any additional translation, these four steps together calledStatic compilation. As you can see clearly,Compiled languageThe relativeInterpreted languageMore work upfront, but in return for better performance and execution efficiency. Therefore, in large projects, due to high performance requirements and large amount of code, interpreted language will greatly reduce the execution efficiency, while static compilation can achieve better execution efficiency and reduce the cost of server procurement.

JIT is what?

JIT is arguably the most technical content of the virtual machine technology, we were speaking just now interpreted language and compiled language implementation process, and analyzes their respective advantages and disadvantages, we can think about, is there a technology, have interpreted languages the advantages of light weight, easy to get started, at the same time also has compiled language performance, the conclusion is that the JIT. The JIT technology in programming languages stands for just-in-time compilation. What does it stand for? Let’s start with Wikipedia’s definition of just-in-time compilation.

In computer technology, just-in-time compilation (JIT); Just-in-time compilation, also known as dynamic translation or runtime compilation, is a method of executing computer code that involves compiling during program execution (at runtime) rather than before execution. Typically, this involves source code or the more common bytecode to machine code conversion, which is then executed directly. A system that implements a JIT compiler typically constantly analyzes the code being executed and identifies certain parts of the code in which it is compiled or recompiled.

We just said that JIT has both the lightweight ease of use of interpreted languages and the high performance, so how does it get there? The following figure describes the execution process of PHP with JIT enabled. PHP8-JIT is a further step on the basis of Opcache optimization. After optimizing the Opcode stored in Opcache, it compiles the Opcode into an executable file that can be recognized by CPU. Binaries, equivalent to a c + + compiled executable file, but do not need to complete before running the process, but at run time, the virtual machine on a background thread, which converts the Opcode binaries, after a binary file cache, when performing the logic, the next time the CPU can be executed directly, do not need to explain it through, The theoretical performance is the same as C++. The benefit is that the PHP language retains its ease of use, flexibility, and performance.



Figure 3

JIT trigger conditions

The JIT is essentially a part of the runtime code, converted to an executable file and cached, to speed up the next execution of the code. So is the JIT triggered when the program is started?

The JIT does not work when the program is first started, meaning that the PHP/Java code is still being interpreted the first time it is executed, and the JIT does not fire until the program has been running for some time. At this point, are you wondering like me why JIT doesn’t just cache all the code into executable files when the program is launched, just like C++. It’s more efficient. There are a few such applications in the Java language, but they are not mainstream. There are several reasons as follows

  1. Compiling them all into binaries takes a lot of time, and the program starts very slowly, which is unacceptable for large projects
  2. Not all code needs to be optimized for performance, and most code is not used very much in real world scenarios
  3. Compiling to binary takes up a lot of capacity
  4. Compiling well ahead of time is equivalent to static compilation, and JIT compilation has many irreplaceable advantages over static compilation

JIT trigger condition, mainly based on the “hot spot detection counter”, virtual opportunities for each method (or block) to establish a counter, if more than a certain threshold will think it is a method of “hot spot”, after a threshold value, the virtual opportunities open background threads to compile the code block into an executable file, cached in memory, speed up execution next time. This is just a brief description of the trigger rules for the hot spot code. The actual rules adopted by the virtual machine are much more complex than this.

JIT& Advantages and Disadvantages of Pre-Compiling

The JIT compiler is done at run time, and it’s easy to see that it has several obvious disadvantages over pre-compilation. First, JIT compilation consumes runtime computational resources that could otherwise be used to execute the program, regardless of how optimized the JIT compiler is (for example: Layered compilation), which is always can’t avoid problems, one of the most consume resources step is “analysis” a process, such as analyzing whether this method is called, never abstract method whether forever will only be called a single version of the conclusion, the information to generate high quality code has a very high value, but to accurately get the information, It has to go through a lot of time consuming computation, consuming a lot of runtime computing resources. On the other hand, if all this time-consuming work is done ahead of time, the runtime can enjoy the high performance of high-quality code, or at best, a little bit slower ahead of time, but that’s acceptable.

Having said that, is JIT compilation really no better for performance optimization than pre-compilation? Conclusion No, JIT compilation has many irreplaceable advantages over pre-compilation. Because the JIT compiler works at run time, the JIT compiler is able to obtain real data about the program. By continuously collecting monitoring information about the program at run time, and analyzing this data, the JIT compiler can make some radical optimizations on the program that the previous static compiler cannot do.

First of all, performance analysis guidance optimization. For example, when the JIT compiler is running, through the monitoring data of the program running, if it finds that some code blocks are executed particularly frequently, it can focus on optimizing this piece of code, for example: allocating better registers, caching, etc.

Then there is radical prediction optimization. Have an interface for example, its implementation class has three, but in the real operation process, more than 95% of the time are running A the implementation class, through the analysis of data, it can be aggressive forecast it, execute every time A, if it is found that the prediction error A few times, can return to explain again, but only small probability event, And it does not affect the results of program execution.

Finally, link time optimization. The traditional compiler step is that compiler optimization and link are separated. What does that mean? Join A program needs to use A, B, C three libraries, the compiler first compiled the three libraries, and A variety of means of optimization, converted into assembly code to save in the file, the last step is to link the three assembly files, and eventually converted into executable files. The problem here is that the A, B, and C libraries are optimized separately at compile time. Assuming that some of the methods in A and B are repeatedly executed, or that methods can be optimized inline, this cannot be done. The difference with the JIT compiler, however, is that it is dynamically linked at runtime and can be optimized for the entire program’s call stack, which is more thorough optimization.

conclusion

The main purpose of writing this blog is to summarize my learning of virtual machine related technology during this period. When I searched PHP virtual machine related articles on Google, I found very few articles to refer to. Since the execution principles of Java and PHP are very similar, I think I can learn how ZendVM works by learning Java Virtual Machine. Java Virtual Machine is very mature and can be said to be the originator of virtual machine. There are many excellent books in the world of JVM. JIT technology is amazing to me. Finally, PHP is the best language in the world!

reference

  • Deep Understanding of the Java Virtual Machine (3rd Edition)
  • Deep understanding of PHP opcode optimization
  • A JIT introduction to the new PHP 8 features
  • PHP JIT in Depth
  • A preliminary study of Java 9 AOT
  • How PHP’s Just In Time compiler works