What is the JVM?

JVM, short for Java Virtual Machine (Java Virtual Machine), is a specification for computing devices, which is a fictitious computer that is implemented by emulating various computer functions on a real computer.

Introducing the Java languageThe virtual machineAfter that, the Java language does not need to be recompiled to run on different platforms. The Java language uses the Java Virtual machine to mask platform-specific information so that Java language compilers only need to generate information that runs on the Java virtual machineThe target code(The bytecode) can run unmodified on multiple platforms.

Second, the basic structure

Looking at the logical structure of the Java platform, we can see the JVM from the following figure:

The figure above shows the logical modules that the Java platform contains, and the difference between the JDK and the JRE. The JVM, JRE, and JDK are the pillars of the Java language, and they work together. The difference is that the Jdk and JRE are real, whereas the JVM is an abstract concept that doesn’t really exist.

JDK

The Java Development Kit (JDK) is a software Development Kit (SDK) for the Java language. The JDK exists physically and is a collection of Programming Tools, JRE, and JVM.

JRE

Java Runtime Environment (JRE) Is a physical Java Runtime Environment that consists of Java apis and JVMS and provides a minimum Environment for executing Java applications.

JVM

The JVM is a specification for computing devices. It is a software implementation of a fictional computer. Simply put, the JVM is a container for running byte code programs.

Third, the PRINCIPLE of JVM

The JVM is the core and foundation of Java, the virtual processor between the Java compiler and the OS platform. It is an abstract computer using a software approach based on the underlying operating system and hardware platform on which Java bytecode programs can be executed.

The Java compiler simply faces the JVM, generating code or bytecode files that the JVM can understand. Java source files are encoded into bytecode programs, and each instruction is translated by the JVM into machine code for a different platform to run on a specific platform.

The JVM system:

1. ClassLoader

The classloader is responsible for loading the types (classes and interfaces) in the program and giving them unique names to identify them.

(1) The class loader process

  • Loading

The binary bytecode is found and loaded into the JVM, which loads the class through the class name, the package in which the class resides, and the ClassLoader. Therefore, identify a loaded class: class name + package name + ClassLoader instance ID.

  • Linking

It is responsible for verifying the format of the binary bytecode, initializing the static variables in the loading class, and resolving the interfaces called in the class. After verification is complete, the JVM initializes the static variables in the class and assigns them to default values. Finally, all the properties and methods in the class are compared and verified to ensure that the properties and methods to be invoked exist and have access permissions (such as private, public, etc.). Otherwise, NoSuchMethodError, NoSuchFieldError and other error messages will be generated.

  • Initializing

Is responsible for performing static initialization code, constructor code, and static property initialization in the class. The initialization process is triggered in four cases.

1) call new

Reflection calls a method in a class

③ Subclass calls initialization

(4) The JVM startup process terminates the initialization class

(2) The relationship between class loaders

The Bootstrap Classloader is initialized after the Java VIRTUAL machine is started.

The Bootstrap Classloader is responsible for loading the ExtClassLoader and setting the parent loader of the ExtClassLoader to the Bootstrap Classloader

After loading the ExtClassLoader, the Bootstrap Classloader loads the AppClassLoader and specifies the parent loader of the AppClassLoader as the ExtClassLoader.

(3) The role of class loaders

Class Loader implementation Responsible for loading
Bootstrap Loader C++ %JAVA_HOME%/jre/lib.%JAVA_HOME%/jre/classesAnd the path specified by the -xbootclasspath argument and the class in
Extension ClassLoader Java %JAVA_HOME%/jre/lib/ext, all under the pathclassesDirectory andjava.ext.dirsClass library in the path specified by the system variable
Application ClassLoader Java ClasspathClass at the specified location orjarDocumentation, it’s alsoJavaThe default class loader for the program

(4) Classloader features

  • hierarchy

1) Booststrap this

When the JVM starts, it initialitates the ClassLoader and loads all the class files in the JRE /lib/rt.jar (Sun JDK implementation) in $JAVA_HONE. This JAR contains all the interfaces and implementations defined by the Java specification.

(2) the Extension of this

The JVM uses this Classloader to load several JAR packages that extend functionality

(3) the System this

The JVM uses this ClassLoader to load jar packages and directories in the ClassPath specified in the startup parameter. In the Sun JDK, the class name for ClassLoader is AppClassLoader.

(4) the User – Defined this

User-defined Classloaders are classloaders implemented by Java developers based on the abstract ClassLoader class, which can be used to load jars and directories that are not in the ClassPath.

  • Proxy pattern: Based on the hierarchy, the proxy of a class can be proxied between loaders. When the loader loads a class, it first checks to see if it was loaded in the parent loader. If the class is already loaded by the upper loader, the class is used directly. Instead, the classloader requests that the class be loaded

  • Visibility limitation: a child loader can find classes in a parent loader, but a parent loader cannot find classes in a child loader.

  • Unload is not allowed: a classloader can load a class but cannot unload it, but it can delete the current classloader and create a new classloader to load.

(5) Delegation Mode

In Java, the parent delegate mechanism is used to load a ClassLoader. The following steps are used to load a class using the parent delegate mechanism:

  1. The currentClassLoaderFirst of all, we will query whether the class has been loaded from our own loaded classes. If so, we will directly return the original loaded class.
  2. The currentClassLoaderIf the class is not found in the cache of, the parent class loader is delegated to load the class. The parent class loader adopts the same strategy, first checking its own cache, and then delegating the parent class to load the class untilBootstrap ClassLoader.
  3. When none of the parent classloaders has been loaded, the current classloader loads it and puts it in its own cache so that it can be returned the next time there is a load request.

2. Runtime data area

The first block: PC register

It is used to store the JVM instructions that each thread will execute next. If the method is native, no information is stored in the PC register. In Java multithreading, each thread has its own PC to switch between different thread contexts.

The second block: JVM stack

The JVM stack is thread private. Each thread creates the JVM stack. The JVM stack holds variables that are local to the basic types in the current thread. Boolean, char, byte, short, int, long, float, double), partial returns, and Stack frames. Objects of non-basic types hold only one address on the JVM Stack that points to the heap.

Block 3: Heap

This is the area where the JVM stores object instances and array values. You can assume that the memory for all new objects in Java is allocated here, and that the memory for objects in the Heap needs to be collected by the GC.

(1) The heap is shared by all threads in the JVM, so the allocation of object memory on it needs to be locked, which also leads to the overhead of new objects is relatively large

(2) In order to improve the efficiency of object memory Allocation, Sun Hotspot JVM will allocate a separate space TLAB (Thread Local Allocation Buffer) for each created Thread. The size of TLAB is calculated by the JVM according to the running condition. There is no need to lock objects allocated on TLAB, so the JVM tries to allocate memory on TLAB for threaded objects. In this case, the JVM is almost as efficient at allocating object memory as C, but if the object is too large, it still uses the heap space allocation directly

(3) TLAB only works on the new generation of Eden Space, so when writing Java programs, it is usually more efficient to allocate multiple small objects than large ones.

(4) All newly created objects will be stored in the Yong Generation. If the Young Generation data survives one or more GC, it will be transferred to the OldGeneration. New objects are always created in Eden Space.

Section 4: Method Area

(1) In Sun JDK, this area corresponds to PermanetGeneration, also known as persistent generation.

(2) The method area stores the information of the loaded Class (name, modifier, etc.), static variables in the Class, constants defined as final types in the Class, Field information in the Class, and method information in the Class. When the developer obtains information through getName, isInterface and other methods in the Class object in the program, This data comes from the method region, which is also shared globally and is also GC under certain conditions. When the method region needs more memory than it is allowed to use, an OutOfMemory error message is thrown.

Block 5: Runtime Constant Pool

Space is allocated from the method area for information about fixed constants, methods, and Field references in the class.

Native Method Stacks Native Method Stacks Native Method Stacks

The JVM supports the execution of native methods with a native method stack, which stores the state of each native method call.

3. Garbage collector

An algorithm to determine if an object is dead

Since the program counter, the Java virtual machine stack, and the local method stack are all owned by the thread, the memory they occupy is born with the thread and reclaimed with the end of the thread. The Java heap and method area, on the other hand, are shared by threads and are the focus of GC.

There are almost all objects in the heap, and GC needs to consider which objects are still alive and which are dead and can be collected.

There are two algorithms to determine whether an object is alive:

1.) Reference counting algorithm: add a reference counter to an object, increments the counter by 1 every time the object is applied in a place; When the reference is invalid, the counter is subtracted by 1; When the counter is 0, the object is dead and returnable. However, it is difficult to solve the situation where two objects refer to each other circularly.

2.) Reachability analysis algorithm: Through a series of objects called “GC Roots” as the starting point, start to search down from these nodes, and the path searched is called reference chain. When an object is not connected to GC Roots by any reference chain (that is, the object is unreachable to GC Roots), it is proved that the object is dead and recyclable. Java objects that can be GC Roots include objects referenced in the virtual machine stack, objects referenced by Native methods in the local method stack, objects referenced by static properties in the method area, and objects referenced by constants in the method area.

In mainstream implementations of commercial programming languages such as our own Java, reachabability analysis algorithms are used to determine whether an object is alive or not.

The garbage collection

Basic principles of Garbage Collection (GC) : Memory object is no longer used in recycling, for recycling in the GC method known as collector, because the GC need to consume some resource and time, Java in through analysis of the characteristics of the object’s lifecycle, according to the new and old generation way to collect object, by as much as possible to shorten the GC pause for application

  • The collection of Cenozoic objects is called minor GC;

  • The collection of objects from the older generation is called Full GC;

  • The program calls system.gc () to force the gc to be Full GC.

There are four types of references to A JVM object:

  • Strong references: By default, all objects are strong references (instances of this object that have no other object references are collected by GC)

  • Soft references: Soft references are an application provided in Java that is suitable for caching scenarios (they are only GC if they are out of memory)

  • Weak references: Must be collected by GC during GC

  • Virtual reference: Because the virtual reference is only used to tell whether the object has been GC

(1) Garbage collection algorithm
Mark-clear algorithm

The most basic algorithm is divided into two stages of marking and clearing: first, the objects that need to be reclaimed are marked, and all the marked objects are reclaimed uniformly after the completion of marking.

It has two disadvantages: one is the efficiency problem, the marking and the removal process is not efficient; One is the space problem. After the flag is cleared, a large number of discontinuous memory fragments (similar to the disk fragments in our computer) will be generated. So much space debris that when a large object needs to be allocated, it cannot find enough continuous memory and has to trigger another garbage collection action in advance.

Replication algorithm

To solve the efficiency problem, the “copy” algorithm was developed, which divided the available memory into two equal sized pieces by capacity and only needed to use one piece at a time. When a block of memory runs out, the remaining objects are copied to another block, and the newly used memory space is cleaned up at once. This solves the memory fragmentation problem, but at the cost of reducing the size of the content by half.

Mark-collation algorithm

When the object survival rate is high, the replication algorithm will carry out frequent replication operations, and the efficiency will be reduced. So we have the mark-tidy algorithm, which is the same as the mark-clean algorithm, but instead of cleaning up the object directly in the next step, we move all the living objects to one side, and then clean up the memory directly beyond the end boundary.

Generational collection algorithm

The current GC of commercial virtual machines uses a generational collection algorithm, which is nothing new. Instead, the heap is divided according to the lifetime of the object: For younger and older generations, the method area is called the permanent generation (permanent generation has been deprecated in the new version, introducing the concept of meta-space, which uses JVM memory while meta-space directly uses physical memory).

In this way, different collection algorithms can be used according to the characteristics of each generation.

Objects in the new generation “die quickly,” with a large number dying and a small number surviving each GC, using a replication algorithm. The new generation is divided into Eden zone and Survivor zone (Survivor from and Survivor to). The default size ratio is 8:1:1.

Objects in the old era use mark-clean or mark-clean algorithms because they have high object survivability and no extra space to guarantee allocation.

When Eden is full, we use Survivor from. When Survivor from is full, we Minor GC. Copy from Eden and Survivor from into Survivor to, then clear Eden and Survivor from, and the original Survivor from becomes the new Survivor to. The original Survivor to becomes the new Survivor from. When copying, if Survivor to cannot hold all the surviving objects, the object is copied into the old age according to the allocation guarantee of the old age (similar to the loan guarantee of a bank), and if the old age cannot hold either, a Full GC (old age GC) is performed.


Large objects directly into old age: a parameter in the JVM configuration – XX: PretenureSizeThreshold, make more than the set value of object directly into old age, the purpose is to avoid in Eden and Survivor area between a lot of memory copy.

Long-lived objects enter old age: The JVM defines an object age counter for each object. If an object is still alive after Eden was born and after the first Minor GC and can be accommodated by Survivor, it will be moved to Survivor with an age of 1. If he doesn’t survive a Minor GC, his age is increased by 1, and when he reaches a certain age (15 by default, which can be set by XX:MaxTenuringThreshold), he moves to the old age. However, the JVM does not always require that the age must reach the maximum age to be advanced to the old age. If the sum of the size of all objects in the Survivor space of the same age (for example, age x) is greater than half of a Survivor, all objects older than or equal to x go directly to the old age without waiting for the maximum age requirement.

(2) Garbage collector

The garbage collection algorithm is the methodology, and the garbage collector is the implementation. The JVM specification does not have any rules on how the garbage collector should be implemented, so the garbage collector provided by different vendors and different versions of virtual machines can vary considerably, looking only at the HotSpot virtual machine here.

After JDk7/8, all collector and combination (wiring) of HotSpot virtual machine are as follows:

Serial collector

The Serial collector is the most basic and oldest collector, and was once the only choice for the new generation of mobile phones. It is single-threaded, using only one CPU or one collection thread to complete garbage collection, and it must “Stop the World” by suspending all other threads while it is collecting. Stopping all user threads is not acceptable for many applications. For example, if you are doing something and someone forces you to stop, can you count the “alpacas” rushing through your mind?

Nevertheless, it is still the default generation collector run by the virtual machine in client mode: simple and efficient (compared to other collectors with a single thread, because there is no thread-switching overhead, etc.).

Working Schematic:

ParNew collector

The ParNew collector is a multithreaded version of the Serial collector, which has the same behavior (collection algorithm, stop the World, object allocation rules, reclamation policy, etc.) as the Serial collector except that it uses multithreading.

Is the new generation collector of choice for many JVMS running in Server mode, one of the more important reasons is that besides Serial, it works well with older CMS collectors.

Working Schematic:

Parallel Scavenge

New generation collector, parallel multithreaded collector. The goal is to achieve a manageable throughput (the ratio of CPU running user code to total CPU consumption, i.e., throughput = line of user code /[line of user code + garbage collection time]), so that the CPU time can be used efficiently to complete the operation of the program as soon as possible. Suitable for tasks that operate in the background and do not require much interaction.

Serial Old collector

An older version of the Serial collector, a single-threaded, “tag sorting” algorithm, is intended primarily for use by virtual machines in Client mode.

In Server mode:

Apply the Parallel Scavenge in JDK 1.5 versions

Can be used as a backstop for CMS where Concurrent Mode Failure occurs

Working Schematic:

Parallel Old collector

Be insane. Parallel Scavenge is an insane, multithreaded, “tag decorating” algorithm. It wasn’t available until JDK 1.6. Be insane. Parallel Scavenge can only be used with Serial Old. Due to the poor performance of Serial Old, the advantage of Parallel Scavenge cannot be played out

With the advent of the Parallel Old collector, “throughput first” collectors finally have a real combination. Apply the Parallel Scavenge/Parallel Old combination for throughput and CPU sensitive applications. The working diagram of the combination is as follows:

CMS collector

The Concurrent Mark Sweep (CMS) collector is a collector that aims to capture the shortest recovery pause time, short pause time, good user experience.

Based on the “mark clearing” algorithm, concurrent collection, low pause, complex operation process, divided into 4 steps:

*1) Initial tag: * Only tag objects that GC Roots can directly relate to. This is fast, but requires “Stop The World”.

*2) Concurrency markup: * is the process of tracing the reference chain, which can be executed concurrently with the user thread.

*3) Resigning: * Fixed The part of The object’s token record during The concurrent token phase where The token changes as The user thread continues to run, which takes longer than The initial token but is much shorter than The concurrent token.

*4) Concurrent clear: * Clear marks objects that can be reclaimed and can be executed concurrently with user threads

Since both concurrent marking and concurrent cleanup, which take the longest time throughout the process, can work with the user thread, the CMS collector’s memory reclamation process and the user thread are executed concurrently in general.

Working Schematic:

The CSM collector has three disadvantages:

1) Very sensitive to CPU resources

Concurrent collection does not pause the user thread, but it still slows down the application and reduces overall throughput because it consumes some of the CPU resources.

The default number of CMS collection threads is =(Number of cpus +3)/4; If the number of cpus is more than four and the collection thread consumes more than 25% of the CPU resources, user programs may be affected. With less than four, the impact is greater and may not be acceptable.

2) Cannot handle floating garbage (garbage generated by user threads during Concurrent cleanup is called floating garbage), “Concurrent Mode Failure” may occur.

A certain amount of memory space needs to be reserved for concurrent clearance, which cannot be collected in the old era when other collectors are almost full; A “Concurrent Mode Failure” occurs if the CMS does not reserve sufficient memory space for the program. In this case, the JVM enables a backup scenario: temporarily enabling the Serail Old collector causes another Full GC to occur;

**3) Large amount of memory fragmentation: **CMS is based on the “mark-clear” algorithm, and a large amount of discontinuous memory fragmentation is generated without compression operation after clearing. As a result, when allocating large memory objects, sufficient continuous memory cannot be found, and another Full GC action needs to be triggered in advance.


G1 collector

Garbage-first (G1) is the commercial collector launched by JDK7-U4. G1 is a garbage collector for server-side applications. Its mission is to replace the CMS collector in the future.

G1 collector features:

Parallelism and concurrency: Can take full advantage of the hardware advantages of multi-CPU, multi-core environment, shorten the pause time; Can execute concurrently with user threads.

** Generational collection: **G1 can manage the entire heap independently without the coordination of other GC collectors, treating new objects and objects that have been alive for some time differently.

** Spatial integration: ** Mark sorting algorithm is used overall and copy algorithm is used locally (between two regions), no memory fragmentation, no GC is triggered early because large objects cannot find enough continuous space, which is better than the CMS collector.

** Predictable pauses: ** In addition to the pursuit of low pauses, the ability to model predictable pause times allows the user to explicitly specify that less than N milliseconds are spent on garbage collection in a time segment of M milliseconds in length, which is superior to the CMS collector.

Why predictable pauses?

Because you can systematically avoid region-wide garbage collection in the entire Java heap.

The G1 collector divides memory into independent regions of the same size. The concept of the new generation and the old generation are retained, but they are no longer physically isolated.

G1 tracks each Region to obtain its collection value and maintains a priority list in the background.

According to the allowed collection time, the Region with the largest value is collected First (the origin of the name garbage-first).

This ensures that the maximum collection efficiency can be achieved in the limited time.

What if the object is referenced by another Region object?

Do you need to scan the entire Java heap for accuracy when determining whether an object is alive? In other generational collectors, there is a problem (especially G1) : do the younger generation have to scan the older generation to recycle? Regardless of G1 or any other generational collector, the JVM uses A Remembered Set to avoid global scanning: each Region has a corresponding Remembered Set; Each time Reference data is written, a Write Barrier is generated to temporarily interrupt the operation. Then check whether the Reference to be written refers to an object in a different Region from the Reference data (other collectors: check whether older objects refer to newer objects). If they are different, use CardTable to record the relevant reference information to the Remembered Set corresponding to the Region where the reference points to the object.

For garbage collection, adding a Remembered Set to the enumeration range on the GC root node guarantees that no global scans will be performed and that there will be no omissions.


Without counting the operation of maintaining the Remembered Set, the recycling process can be broken down into four steps (similar to CMS) :

1) Initial Mark: only Mark The objects that can be directly associated with GC Roots, and modify The value of TAMS(Next Top at Mark Start), so that new objects can be created in The correct available Region when The user program is running concurrently in The Next stage. “Stop The World” is required.

2) Concurrent marking: The reactability analysis starts with GC Roots to find out the surviving objects, which takes a long time and can be executed concurrently with user threads

3) Final mark: to correct the mark record of the part of the object in the concurrent mark phase where the mark has changed because the user thread continues to run. During concurrent marking, The virtual machine records The object changes in The thread Remember Set Logs. In The final marking phase, Remember Set Logs are integrated into Remember Set, which takes longer than The initial marking time but is much shorter than The concurrent marking time. “Stop The World” is required.

4) Filtering collection: First, the collection value and cost of each Region are sorted, and then the collection plan is customized according to the user’s expected GC pause time. Finally, the garbage objects in some high-value regions are collected according to the plan. During the collection, the replication algorithm is adopted to copy the living objects from one or more regions to another empty Region on the heap, and the memory is compressed and freed during this process. This can be done concurrently, reducing pause times, and increasing throughput.

Working Schematic:

4. Run the engine

After the class loader loads the bytecode into memory, the execution engine reads the Java bytecode as a unit. Java bytecode is machine-readable and must be converted to platform-specific machine code. This is done by the execution engine.

The JVM provides four instructions to execute when executing a method

① Invokestatic: invokes the static method of the class.

②invokevirtual: calls the method of an object instance.

③ InvokeInterface: Attributes are defined as interfaces to make calls.

(4) Invokespecial: When the JVM initializes an object (Java constructor method:) and calls a private method of the object instance.

Major execution counts:

Interpretation, real-time execution, adaptive optimization, chip level direct execution.

Interpreters belong to the first generation of JVMS

The just-in-time compilation JIT is a second-generation JVM

Adaptive optimization HotspotJVM using this technology (the sun), to learn the first generation of the JVM and the second generation of the experience of the JVM, with the method of combination of these two, start to explain all the code use the way of execution, and monitor code execution, and for those who are often the method called start a background thread, it is compiled into native code, And optimize it. If the method is no longer used frequently, the compiled code is cancelled and interpreted.