Ali Architect takes you deep into the JVM

In this article, we will talk about the internal structure of the JVM, from multi-threading in components, JVM system threads, local variable arrays, and more

JVM

JVM = Classloader + Execution engine + Runtime data area

The following figure shows the key internal components of a typical JVM (consistent with the JVM Specification Java SE 7 Edition).

Multithreading in components

Multi-threading “or” free-threading “refers to the ability of a program to execute multiple threads of operation simultaneously. As an example of a multithreaded application, a program receives user input on one thread, performs multiple complex calculations on another thread, and updates the database on a third thread. In a single-threaded application, users may spend time waiting for calculations or database updates to complete. In multithreaded applications, these processes can take place in the background, so they don’t waste the user’s time. Multithreading can be a very powerful tool in component programming. By writing multithreaded components, you can create components that perform complex calculations in the background, allowing the user interface (UI) to freely respond to user input during calculations. While multithreading is a powerful tool, it can be difficult to use correctly. Improperly implemented multithreaded code can degrade application performance or even cause the application to freeze. The following topics introduce you to some considerations and best practices for multithreaded programming. The.NET Framework provides several options for multithreading in components. The functionality in the System.threading namespace is an option. Event-based asynchronous mode is another option. BackgroundWorker component is the implementation of asynchronous mode; It provides advanced functionality encapsulated in components for easy use.

JVM system threads

If you use JConsole or any other debug tool, you may see many threads running in the background. These running background threads do not contain the main thread, which is created for the purpose of publicstatic void main(String[]). These background threads are created by the main thread. The main background system threads in HotspotJVM are shown in the following table:

The VM thread	This thread is used to wait for a series of operations to be performed that will take the JVM to a “safe-point.” The reason these operations have to happen on a separate thread is that they all require the JVM to be in a Safepoint that cannot modify the heap. The type of operations performed by this thread are stop-the-world garbage collection, thread stack collection, thread shelving, and biased lock cancellation.
Periodic task threads	This thread is used to respond to timer events (for example, interrupts) that are scheduled to perform periodic operations
The GC thread	These threads support different types of garbage collection within the JVM
Compiler thread	They are used to compile bytecode to local machine code at run time
Signal distributing thread	The thread receives the signal sent to the JVM and processes it by calling the appropriate method for the JVM

A single thread

A single execution of each thread contains the following components

Program counter (PC)

Unless the current instruction or opcode is native, the address of the current instruction or opcode depends on the PC to address it. If the current method is native, the PC is undefined. All cpus have a PC, which is usually added after each instruction to point to the address of the next instruction to be executed. The JVM uses the PC to track the location of the instruction being executed. In fact, PC is used to point to a memory address for MethodArea.

The native stack

Not all JVMS support native methods, but those that do typically create a stack of native methods per thread. If the JNI (JavaNative Invocation of the JVM is implemented using the C-linked model, the native stack will also be a C-implemented stack. In this case, the order and return values of the arguments in the native stack will be the same as in a normal C program. A native method typically generates a callback to the JVM (depending on the JVM’s implementation) and executes a Java method. Such a native to Java call occurs on the stack (usually on the Java stack), and at the same time the thread leaves the stack, usually creating a new frame on the Java stack.

The stack

Each thread has its own stack, which stores a frame for each method executed on the thread. The stack is a lifO data structure, which allows the currently executing method to be at the top of the stack. For each method execution, a new frame is created and pushed to the top of the stack. Frame is removed from the stack when the method returns normally or when an uncaught exception is encountered during method execution. The stack does not operate directly, except on push/ pop frame objects. Thus, frame objects may be allocated on the heap, and memory does not have to be contiguous address space (note the distinction between frame Pointers and frame objects).

The stack limit

A stack can be dynamic or of appropriate size. If a thread requires a larger stack, a StackOverflowError is raised; An OutOfMemoryError is thrown if a thread requests to create a new frame and does not have enough memory to allocate it.

Frame

For each method execution, a new frame is created and pushed to the top of the stack. Frame is removed from the stack when the method returns normally or when an uncaught exception is encountered during method execution.

Local variable array

The local variable array contains all variables used during the execution of a method. Contains a reference to this, all method parameters, and other locally defined variables. For class methods (such as static methods), the stored index of method parameters starts at 0; For instance methods, slots with index 0 are reserved for storing the this pointer.

The operand stack

Operand stacks are used during the execution of bytecode instructions. It is similar to the general-purpose registers used by native cpus. Most bytecodes spend their time working with operand stacks by pushing, pushing, copying, swapping, or performing operations that produce/consume values. Instructions that move values between local arrays of variables and operand stacks are very frequent for bytecodes.

Dynamic link

Each frame contains a reference to the runtime constant pool. This reference points to the constant pool of the class to which the method to be executed belongs. This reference is also used to aid dynamic linking.

When a Java class is compiled, all references to variables and methods stored in the class’s constant pool are treated as symbolic references. A symbolic reference is just a logical reference and not a final reference to a physical memory address. JVM implementations can choose when to resolve symbolic references, which can happen after class files have been validated and loaded. This is called eager or static analysis. The difference is that it can also happen when symbolic references are used for the first time, called lazy or delayed analysis. However, the JVM must ensure that parsing occurs before each reference is first used and that an exception is thrown if an parsing error is encountered at that point in time. Binding is the process of replacing a field, method, or class identified by a symbolic reference with a direct reference. This process happens only once, because the symbolic reference needs to be completely replaced. If a symbolic reference is associated with a class that has not yet been resolved, that class is also loaded immediately. Each direct reference is stored as an offset, a storage structure associated with the runtime location of a variable or method.

Sharing between threads

The heap

The value of a node in the heap is always no greater than or less than its parent;
The heap is always a complete binary tree.

The heap with the largest root node is called the maximum heap or large root heap, and the heap with the smallest root node is called the minimum heap or small root heap. Common heap have binary heap, Fibonacci heap and so on.

A heap is defined as follows: a sequence of N elements {k1,k2,ki… Kn} is called a heap if and only if the following relation is satisfied.

(< = k2i, ki ki < = k2i + 1) or (ki > = k2i, ki > = k2i + 1), (I = 1, 2, 3, 4… n/2)

If the one-dimensional array corresponding to the sequence is regarded as a complete binary tree, then the meaning of heap indicates that the values of all non-terminal nodes in the complete binary tree are not greater than (or less than) the values of its left and right child nodes. Thus, if the sequence {k1,k2… Kn} is the heap, then the top element of the heap (or the root of the complete binary tree) must be the minimum (or maximum) of n elements in the sequence.

Non-heap memory

Some objects are not created in the heap and are logically considered part of the JVM mechanism.

Non-heap memory includes:

The permanent generation contains:
Methods area
Internal string
Code cache: Used to compile and store methods that have been jIT-compiled into native code

Memory management

Objects and arrays are never explicitly freed, so you have to rely on the garbage collector to collect them automatically.

In general, this is done as follows:

New objects and arrays are created in the young generation
The secondary garbage collector will be executed on the young generation. Those that are still alive will be moved from Eden to Survivor
The main garbage collector moves objects from generation to generation, and the main garbage collector usually causes the application’s thread to pause. Those objects that are still alive will be moved from the younger generation to the older
The permanent generation takes place at the same time each time the old generation is recycled, and they are recycled when one or the other is full

The JIT compiler

The JIT works like this: When a type is loaded, the CLR creates an internal data structure and corresponding functions for that type, and when the function is first called, the JIT compiles that function into machine language. When it is encountered again, the compiled machine language is executed directly from the cache.

Methods area

All threads share the same method area. Therefore, access to method area data and handling of dynamic links must be thread-safe. If two threads attempt to access a field or method of a class that has not yet been loaded (the class must be loaded only once), they cannot continue executing until the class is loaded.

Class file structure

A compiled class file contains the following structure:

ClassFile { u4magic; u2minor_version; u2major_version; u2constant_pool_count; Cp_infocontant_pool [constant_pool_count - 1); u2access_flags; u2this_class; u2super_class; u2interfaces_count; u2interfaces[interfaces_count]; u2fields_count; field_infofields[fields_count]; u2methods_count; method_infomethods[methods_count]; u2attributes_count; attribute_infoattributes[attributes_count]; }Copy the code

magic, minor_version, major_version	Specify some information: The version of the current class, and the JDK version for compiling the current class
constant_pool	It’s similar to a symbol table, but it contains more data
access_flags	Provide a set of modifiers for the class
this_class	Provide a constant pool index for the class with fully qualified names, such as org/jamesdbloom/foo/Bar
super_class	Provides a constant pool index for symbolic references to its parent, such as Java /lang/Object
interface	An index of an array in a constant pool that provides symbolic references to all implemented interfaces
fields	An array index in a constant pool that provides a complete description of each field
methods	An array index in the constant pool that provides a complete description of each method signature, if the method is not abstract or native, It will also contain bytecode
attributes	An array of different values that provides additional information about the CLASS, including annotations: retentionPolicy. CLASS and retentionPolicy. RUNTIME

You can use the Javap command to view the bytecode of the compiled Java class.

The following lists the opcodes used in this class of files:

aload_0	This opcode is one of a set of opcodes in the aload_<n> format. Both are used to load an object reference onto the operand stack. The “<n>” is used to indicate the location of the object reference to be accessed in the local variable array, but the value of n can only be 0,1,2, or 3. There are other similar opcodes for loading non-object references, such as ILoad_ <n>,lload_<n>,fload_<n>, and dload_<n> Where I stands for int, L for long, f for float, and d for double, the range of n above also applies to these *load_<n> values. Local variables whose indexes are greater than 3 can be loaded using iload, lload, float, dload and ALOad. These opcodes carry the index in the array of the local variable to be loaded.
ldc	This opcode is used to pull a constant from the runtime constant pool and push it onto the operand stack
getstatic	This opcode is used to push a static value from the static field list of the runtime constant pool into the operand stack
invokespecial invokevirtual	These opcodes are a set of opcodes used to execute methods (Altogether: Invokedynamic, InvokeInterface, Invokespecial, Invokestatic, Invokevirtual). Where, in this example, Invokevirtual is used to execute instance methods of the class; Invokespecial is used to execute the initialization methods of the instance, as well as private methods and methods that belong to the superclass but are inherited by the current class Superclass methods are dynamically bound to subclasses.
return	The operation code is a set of operation code (ireturn lreturn, freturn, dreturn, areturn and return) of one of them. Each opcode is a type-dependent return statement. Where I stands for int, L for long, f for float, D for double and a for a reference to an object. Return statements that do not begin with an identifier only return void

As in other general-purpose bytecodes, these opcodes are primarily used to deal with local variables, operand stacks, and run-time constant pools.

The constructor has two instructions. The first pushes “this” onto the operand stack, and the parent constructor is then executed. This operation causes this to be “consumed”, so this is pushed off the operand stack.

For the sayHello() method, the execution is more complex. Because it has to resolve symbolic references to real references through the runtime constant pool. The first operand getStatic is used to push a reference to the System class’s static field OUT onto the operand stack. The next operand, LDC, pushes a string literal “Hello” onto the operand stack. Finally, the invokevirtual operand executes the System.out println method, which pushes “Hello” off the operand stack as an argument and creates a new frame for the current thread.

Class loader

The JVM is started by loading a class for initialization through the Bootstrap class loader. The class is linked and instantiated before publicstatic void main(String[]) is executed. The main method is executed sequentially through loading, linking, and initialization of additional classes and interfaces as necessary.

Load: Load is the process of finding a class file that represents the type of the class or interface and reading it into a byte array. These bytes are then parsed to see if they represent a Class object and have the correct major and minor version numbers. Any classes or interfaces that are treated as direct superclasses are also loaded. Once this is done, a class or interface object will be created from the binary representation.

Link: The link contains the validation of the class or interface, the preparation type, and the class’s immediate parent and parent interface. In short, links involve three steps: validation, preparation, and parsing (optional)

Validation: This phase verifies that the class and interface representations are structurally correct and meet the requirements of the Java programming language and JVM semantics.

Performing these checks in the validation phase means that these actions can be avoided during the link phase at runtime, which slows down class loading, but it avoids performing these checks during bytecode execution.

Preparation: Contains memory allocation for static storage and any data structures (such as method tables) used by the JVM. Static fields are created and instantiated to their default values. However, no instantiators or code is executed during this phase, because these tasks will occur during the instantiation phase.

Is an optional stage. This stage checks for correct symbolic references by loading the referenced class or interface. If these checks do not occur at this point, parsing of symbolic references is deferred until they are used by bytecode instructions.

Instantiate the class or interface, containing the instantiation method that executes the class or interface: < Clinit >

There are several classloaders with different responsibilities within the JVM. Each class loader proxies its loaded parent (except the Bootstrap class loader, which is the root loader).

Bootstrap Class loader: When a Java program is running, the Java virtual machine needs to load Java classes. This process requires a class loader to complete. The classloader itself is also a Java class, which raises questions similar to how humanity’s first mother came to be.

In fact, the Java VIRTUAL machine has a built-in class loader called Bootstrap, which is implemented using native OS specific code and belongs to the kernel of the Java virtual machine. The Bootstrap class loader does not need to be loaded by a special class loader. The Bootstrap class loader is responsible for loading classes in the Java core package.

Extension class loader: Loads classes from the standard Java Extension API. For example, secure extended feature sets.

System class loader: This is the default class loader for your application. It loads application classes from the classpath.

User-defined class loaders: Additional class loaders can be defined to load application classes. User-defined class loaders can be used in special scenarios, such as reloading classes at run time or isolating special classes into different groups (a requirement commonly found in Web servers, such as Tomcat).

Faster class loading

A feature called Class data sharing (CDS) has been introduced since HotspotJVM 5.0. During JVM installation, the installer loads a series of Java core classes (such as Rt.jar) into a mapped memory area for a shared archive. CDS increases JVM startup speed by reducing the time it takes to load these classes, while allowing these classes to be shared between different JVM instances. This greatly reduces memory fragmentation.

Location of method area

The JVM Specification Java SE 7 Edition clearly states that although the method area is a logical part of the heap, the simplest implementation would probably neither garbage collect nor compress it. Paradoxically, the method area (and CodeCache) for viewing Oracle’s JVM with JConsole is non-heap. The OpenJDK code shows that CodeCache is a separate domain in the VM from ObjectHeap.

Class loader references

Classes are usually loaded on demand, that is, when they are first used. Thanks to the classloader, the Java runtime system does not need to know about files and file systems.

Run-time constant pool

The JVM maintains a constant pool for each type, which is a run-time data structure similar to a symbol table, but containing much more data. Java bytecode requires some data that is often too large to be stored directly in bytecode. Instead, it is stored in a constant pool, and the bytecode contains a reference to the constant pool. The runtime constant pool is primarily used for dynamic linking.

Several types of data are stored in constant pools. They are:

Numeric literal
String literals
The references to classes
Field reference
Method references

If you compile the following simple class:

package org.jvminternals; public class SimpleClass { public voidsayHello() {System.out.println("Hello"); }}Copy the code

The resulting class file’s constant pool should look something like the following:

Constant
 pool: #1 = Methodref #6.#17 // java/lang/Object."<init>":()V#2 = 
Fieldref #18.#19 // java/lang/System.out:Ljava/io/PrintStream; # 3 =
String #20 // "Hello"#4 = Methodref #21.#22 // java/io/PrintStream.println:(Ljava/lang/String;) V#5 = Class #23 // 
org/jvminternals/SimpleClass#6 = Class #24 // java/lang/Object#7 = Utf8 
<init> #8 = Utf8 ()V #9 = Utf8 Code #10 = Utf8 LineNumberTable #11
 = Utf8 LocalVariableTable #12 = Utf8 this #13 = Utf8 
Lorg/jvminternals/SimpleClass; #14 = Utf8 sayHello #15 = Utf8 SourceFile
 #16 = Utf8 SimpleClass.java #17 = NameAndType #7:#8 // 
"<init>":()V#18 = Class #25 // java/lang/System#19 = NameAndType 
#26:#27 // out:Ljava/io/PrintStream; #20 = Utf8 Hello #21 = Class #28 //
java/io/PrintStream#22 = NameAndType #29:#30 // println:(Ljava/lang/String;) V#23 = Utf8 org/jvminternals/SimpleClass #24
 = Utf8 java/lang/Object#25 = Utf8 java/lang/System #26 = Utf8 out#27 = 
Utf8 Ljava/io/PrintStream; #28 = Utf8 java/io/PrintStream #29 = Utf8 
println #30 = Utf8 (Ljava/lang/String;) VCopy the code

The constant pool contains the following types:

Integer	A 4-byte int constant
Long	An 8-byte long constant
Float	A 4-byte float constant
Double	An 8-byte double constant
String	A String literal constant points to another UTF8 record in the constant pool that contains the final byte
Utf8	A byte stream represents a sequence of Utf8 encoded strings
Class	A Class literal constant points to another Utf8 record in the constant pool that contains the fully qualified name of the JVM’s internal format (It is used for dynamic linking)
NameAndType	A colon is used to distinguish a pair of values, each referring to another record in the constant pool. The first value before the colon points to a UTF8 string literal representing the method name or field name. The second value points to a UTF8 string literal representation type. An example of a field is a fully qualified class name; An example of a method is a list in which each argument is a fully qualified class name
Fieldref, Methodref, InterfaceMethodref	A dot-separated pair of values, each pointing to another record in the constant pool. The first value before the dot points to a Class record. The second value points to a NameAndType record

Exception table

The exception table stores information about each exception handler:

The starting point
End point
Handles the PC offset of the code
The constant pool index of the caught exception class

If a method defines a try-catch or try-finally exception handler, an exception table will be created. It contains information about each exception handler or finally block and the type of exception being handled and the location of the handler code.

When an exception is thrown, the JVM looks for a matching handler for the current method. If not, the method ends up unceremoniously pushing the current StackFrame and the exception is thrown back into the call chain (the new frame). If no exception handler is found before all frames are off the stack, the current thread is terminated. Of course, this can also cause the JVM to terminate if the exception is thrown to the last non-background thread, such as the main thread.

Eventually the exception handler matches all exception types and always executes whenever an exception of that type is thrown. In cases where no exception is thrown, the finally block is still executed at the end of the method. Once a return statement is executed, it immediately jumps to the finally block to continue execution.

Character comparison

Character comparison refers to the operation of comparing the size of a single character or string according to the dictionary order. Generally, the size of ASCII code value is used as the standard for character comparison.

The symbol table

The symbol table needs to collect, record and use the types and characteristics of some syntactic symbols in the source program during the compiling process. This information is usually stored in a table in the system. Such as constant table, variable name table, array name table, procedure name table, label table, etc., collectively known as symbol table. The organization, construction and management of symbol table will directly affect the efficiency of compiling system.

In the JVM, internal strings are stored in string tables. A string table is a Hashtable mapping object pointer to a Symbol (e.g. Hashtable<oop,Symbol>) that is stored in a permanent generation.

When the class is loaded, string literals are automatically “internalized” by the compiler and added to the character table. Additionally, instances of String classes can be explicitly internalized by calling String.intern(). When string.intern () is called, a reference to the String is returned if the String is already in the symbol table. If the string is not included in the character table, it is added to the string table and its reference is returned.

Here I recommend an architecture learning exchange group. Exchange learning group number: 744642380, which will share some senior architects recorded video: Spring, MyBatis, Netty source code analysis, high concurrency, high performance, distributed, microservice architecture principle, JVM performance optimization, distributed architecture and so on to become the architect of the necessary knowledge system. Can also receive free learning resources, currently benefit from good

Ali Architect takes you deep into the JVM

Related Posts

Experience oneInstack, a one-click installation tool for PHP/Java environments

How to set up a primary/secondary backup for a dual-M structure?

Ali: Write a feed stream using Redis