Autumn recruitment began. I had been delayed for half a month because I had to do something else. It was easy to forget what I had learned before. So when I reread understanding the JVM, I wanted to make a note. Integrate the fragments of knowledge, convenient for their own future reading, but also to share with everyone. The content will add my own understanding, if there is any mistake, welcome to correct.

Note: This post will continue to be updated. Please stay tuned to GitHub && Blog

1. The Java memory area or memory overflow is abnormal

1.1 Runtime data area

According to the Java Virtual Machine Specification (Java SE 7 Edition), the following figure shows the memory managed by the Java VIRTUAL Machine.

1.1.1 Program counter

Small memory space, thread private. The bytecode interpreter works by changing the value of this counter to select the next bytecode instruction that needs to be executed. Branch, loop, jump, exception handling, thread recovery and other basic functions need to be completed by the counter

If a thread is executing a Java method, this counter records the address of the virtual machine bytecode instruction being executed. If the Native method is being executed, the counter value is (Undefined). This memory region is the only one where the Java Virtual Machine specification does not specify any OutOfMemoryError cases.

1.1.2 Java Virtual Machine Stack

The thread is private and has the same life cycle as the thread. It describes the memory model of Java method execution: each method execution creates a Stack Frame to store information about local variables, operand stacks, dynamic links, method exits, etc. Each method from the call to the end of execution, corresponding to the process of a stack frame from the virtual machine stack to the stack.

Local variable table: stores basic types known at compile time (Boolean, byte, CHAR, short, int, float, long, double), object references (reference types), and returnAddress types (which refer to the address of a bytecode instruction)

StackOverflowError: Stack depth of thread request is greater than the depth allowed by the virtual machine. OutOfMemoryError: If the vm stack can be dynamically expanded and sufficient memory cannot be allocated during the expansion.

1.1.3 Local method stack

It differs from the Java virtual machine stack in that the Java virtual machine stack performs Java methods (that is, bytecode) services for the virtual machine, whereas the Native method stack serves Native methods used by the virtual machine. StackOverflowError and OutOfMemoryError exceptions also occur.

1.1.4 Java heap

For most applications, this area is the largest chunk of memory managed by the JVM. Thread sharing, mainly to store object instances and arrays. Multiple Thread Local Allocation buffers (TLabs) are allocated internally. It can be physically discontinuous, but logically continuous.

OutOfMemoryError: Thrown if there is no memory in the heap to complete instance allocation and the heap can no longer be extended.

1.1.5 method area

It is a shared memory area that stores data such as class information, constants, static variables, and code compiled by the just-in-time compiler that has been loaded by the VIRTUAL machine.

Now use a diagram to illustrate what each area stores.

1.1.6 Runtime constant pool

It is part of the method area and is used to hold various literal and symbolic references generated at compile time. Both the compiler and the runtime (intern() of String) can pool constants. An OutOfMemoryError is raised when memory is limited and cannot be applied.

1.1.7 Direct Memory

Part of the non-running data area of the virtual machine

In JDK 1.4, the NIO (New Input/Output) class introduced a Channel – and buffer-based I/O method, which can use Native libraries to allocate out-of-heap memory directly. It then operates through a DirectByteBuffer object stored in the Java heap as a reference to this memory. Can avoid time-consuming data operations back and forth between the Java heap and Native heap. OutOfMemoryError: Is limited by native memory and occurs when the sum of memory regions is greater than the physical memory limit, resulting in dynamic expansion.

1.2 HotSpot VIRTUAL Machine Object Exploration

It focuses on how data is created, laid out, and accessed.

1.2.1 Object Creation

The creation process is complicated, so it is recommended to read a book. Here is a personal summary.

When you encounter a new instruction, you first check to see if the instruction’s arguments locate a symbolic reference to a class in the constant pool, and check whether the class represented by the symbolic reference has been loaded, parsed, and initialized. If not, perform the appropriate class load. After the class load check passes, the new object is allocated memory (the size of memory can be determined after the class load is complete). Divide an area of free memory in the heap (‘ pointer collision – memory regularization ‘or’ free list – memory interleaving ‘allocation). Each thread described above has a private allocation buffer (TLAB) in the heap, which largely avoids thread insecurity caused by frequent object creation in concurrent situations. Once allocated, the memory space is initialized to 0(excluding the object header), and the object header is populated with information about which class the object is an instance of, how to find the metadata information about the class, the object’s hash code, the object’s GC generation age, and so on. The new directive is followed by the init method before a usable object is actually created.

1.2.2 Memory Layout of objects

In the HotSpot virtual machine, there are three sections: object Header, Instance Data, and Padding.

Header: Contains two parts. The first part is used to store the runtime data of the object itself, such as hash code, GC generation age, lock status flag, thread held lock, biased thread ID, biased timestamp, etc. The value is 32 bits for 32-bit VMS and 64 bits for 64-bit VMS. It’s officially called the ‘Mark Word’. The second part is the type pointer, the metadata pointer that an object points to its class, which the virtual machine uses to determine which class the object is an instance of. In addition, if it is a Java array, there must be a piece of data in the object header that records the length of the array, because normal objects can be sized using Java object metadata, whereas array objects cannot. Instance Data: The contents of various types of fields defined in program code (both inherited from parent classes and defined in subclasses). Padding: Not necessarily required. Padding is used to make sure the object size is an integer multiple of a byte.

1.2.3 Object Access Positioning

When working with objects, specific objects on the heap are manipulated using reference data on the stack.

Access via a handle

A chunk of memory is allocated in the Java heap as a handle pool. Reference stores the address of the handle. See figure for details.

Use direct pointer access

Object addresses are stored directly in reference

Comparison: The biggest advantage of using handles is that reference stores stable handle addresses. In object movement (GC), only the address of the instance data pointer is changed, and Reference itself does not need to be modified. The biggest advantage of direct pointer access is that it is fast and saves the time cost of a pointer location. Handle methods are good if the object is frequently GC, and direct pointer access is good if the object is frequently GC.

1.3 the actual combat

/ / to fill

2. Garbage collector and memory allocation strategy

2.1 an overview of the

The three areas of the program counter, virtual machine stack, and local method stack live and die with the thread (because it is thread private), and the stack frames in the stack methodically perform the exit and exit operations as methods enter and exit. The Java heap and methods area is different, an interface of multiple implementation classes need memory may not be the same, a method of multiple branch need memory may also be different, we only know those objects in the program at runtime can create, this part of the memory allocation and recovery are dynamic, the payback period of waste concern is this part of the memory.

2.2 Is the object dead?

The only thing you need to do before you recycle is determine which objects are ‘dead’ and which are ‘alive’.

2.2.1 Reference counting method

Adds a reference counter to the object. But it is difficult to solve the circular reference problem.





2.2.2 Accessibility analysis

Through a series of ‘GC Roots’ objects as starting points, the path from these nodes is called the reference chain. An object is unavailable when there is no reference link to GC Roots.

Available as objects for GC Roots:

  • The object referenced in the virtual machine stack (the local variable table in the stack frame)
  • The object referenced by the class static property in the method area
  • The object referenced by the constant in the method area
  • Objects referenced by JNI(commonly referred to as Native methods) in the Native method stack

2.2.3 References

The previous two ways of determining survival are both related to ‘references’. However, since JDK 1.2, the reference concept has been expanded, as described below.

The following four kinds of citation intensity gradually decreases at a time

Strong reference

Object obj = new Object(); Created, as long as the strong reference is not recycled.

Soft references

The SoftReference class implements SoftReference. These objects are listed in the recycle range for a second collection before the system is about to run out of memory.

A weak reference

WeakReference class implements WeakReference. Objects can only survive until the next garbage collection. When the garbage collector works, objects associated only with weak references are reclaimed regardless of whether there is enough memory.

Phantom reference

The PhantomReference class implements virtual references. There is no way to get an instance of an object through a virtual reference, and the only purpose of setting a virtual reference association for an object is to receive a system notification when the object is reclaimed by the collector.

2.2.4 To be or not to be

Even the unreachable objects in the reachability analysis algorithm are not “Facebook”, so they are temporarily in the “reprieve” phase, and the actual death of an object must be marked at least twice: If an object is found to have no reference chain connected to GC Roots after the in-process reachabality analysis, it will be marked for the first time and filtered based on whether it is necessary to implement finalize() method. When an object does not overwrite a Finalize () method, or a Finalize () method has already been called by a virtual machine, the virtual machine considers both cases “not necessary to execute”. If the object is determined to be necessary to finalize(), the object is placed ina Queue called f-Queue and is later executed by a low-priority Finalizer thread automatically set up by the virtual machine. By “execution” I mean that the virtual machine starts this method without committing or waiting for it to finish. The Finalize () method is the last chance for objects to escape death. Later, GC will make a second small tag of objects in the F-queue if objects want to save themselves successfully in Finalize () — just re-associate with any object resume on the reference chain. The Finalize () method is called automatically by the system only once.

2.2.5 Recycling method area

In a heap, especially in the new generation, a single garbage collection can typically recover 70% to 95% of the space, while the garbage collection efficiency of the permanent generation is much lower. Permanent generation garbage collection consists of two main parts: discarded constants and useless classes.

Discard constant: Generally, no reference to this constant is determined. Identify useless classes: All three of the following criteria must be met

  • All instances of the class are already recycled, meaning that there are no instances of the class in the Java heap
  • The ClassLoader that loaded the class has been reclaimed
  • The java.lang.Class object corresponding to this Class is not referenced anywhere, and there is no way to access its methods through reflection anywhere

2.3 Garbage collection algorithm

Just provide ideas

2.3.1 Mark – Clear algorithm

Mark clear directly can.

Two disadvantages:

  • The efficiency is not high
  • Space creates a lot of debris

2.3.2 Replication Algorithm

Divide the space into two pieces and GC only one at a time. When a block of memory is used up, the surviving objects are copied onto another block.

Solve the shortcomings of the former method, but it will cause low space utilization. Because most new generation objects don’t survive the first GC. So there’s no need to divide the space 1 to 1. You can split a larger Eden space and two smaller Survivor Spaces, using the Eden space and one Survivor at a time. When recycling is done, the surviving objects in Eden and Survivor are copied to another Survivor at a time, and the Eden and Survivor space is cleaned up. The size ratio is generally 8:1:1, wasting 10% of Survivor space each time. But the question here is what if more than 10% survive? An allocation guarantee strategy is used here: the extra objects go straight into the old age.

2.3.3 Mark-collation algorithm

Different from the replication algorithm for the new generation, this algorithm is created according to the characteristics of the old age. Basically, the live object is moved to one end of memory.

2.3.4 Generational recycling

According to the living object divided into several memory areas, generally divided into new generation and old age. Then according to the characteristics of each age to develop the corresponding recovery algorithm.

The new generation

Since a large number of objects die and only a few survive each garbage collection, the replication algorithm is a reasonable choice.

The old s

In the old age, the object has a higher survival rate and no additional space allocation guarantees it. So you have to use either a mark-sweep or a mark-collation algorithm for recycling.

2.4 HotSpot algorithm implementation

/ / to fill

2.5 Garbage collector

Collection algorithms are the theory of memory collection, while garbage collectors are the practice of memory collection.





2.5.1 Serial Collector

This is a single-threaded collector. This means that it only uses one CPU or one collection thread to complete the collection and must suspend all other worker threads until the collection is complete.

2.5.2 ParNew collector

Think of it as a multithreaded version of the Serial collector.

In Parallel, the Parallel

Refers to multiple garbage collection threads working in parallel while the user thread is in a waiting state

Concurrent: Concurrent

When the user thread and the garbage collector thread execute at the same time (not necessarily in parallel, but possibly in cross), with the user process running and the garbage collector thread running on another CPU.

2.5.3 Parallel Scavenge

This is a new generation collector, also using replication algorithm implementation, but also a parallel multi-threaded collector.

The focus of collectors such as CMS is to minimize the amount of time the user thread is suspended during garbage collection, whereas the Parallel Scavenge collector is to achieve a controlled Throughput. As a throughput first collector, the virtual machine collects performance monitoring information based on current system performance and adjusts pause times dynamically. This is the adaptive tuning strategy for GC (GC Ergonomics).

2.5.4 Serial Old Collector

Older versions of the collector, single-threaded, used the tag, collation.

2.5.5 Parallel Old Collector

The Parallel Old is an older version of the Parallel Exploiter. Multithreading, using markup — collation

2.5.6 CMS Collector

The CMS (Concurrent Mark Sweep) collector is a collector whose goal is to obtain the shortest collection pause time. Implementation based on mark-clear algorithm.

Operation steps:

  1. CMS Initial Mark: Marks objects to which GC Roots can be directly associated
  2. CMS Concurrent Mark: GC Roots Tracing
  3. CMS remark: Corrects changes during concurrent marking
  4. CMS Concurrent sweep

Disadvantages: SENSITIVE to CPU resources, unable to collect floating garbage, tag – remove space debris from algorithms

2.5.7 G1 collector

Service-oriented garbage collector.

Advantages: parallelism and concurrency, generational collection, spatial integration, predictable pauses.

Operation steps:

  1. Initial Marking
  2. Concurrent Marking
  3. Final Marking
  4. Live Data Counting and Evacuation

2.6 Memory Allocation and Reclaiming Policy

2.6.1 Objects are allocated in Eden preferentially

Objects are allocated on the Eden block of the new generation. If a local thread allocation buffer is enabled, threads will be allocated on (TLAB) first. A few cases will be assigned directly to the old age.

In general, the memory model of the Java heap looks like this:

Cenozoic GC (Minor GC)

In the new generation of garbage recycling action, frequent, fast.

Old GC (Major GC/Full GC)

In the old days of garbage collection, the occurrence of a Major GC was often accompanied by at least one Minor (not absolute) GC. Major GC is typically 10 times slower than Minor GC.

2.6.2 Large objects enter the old age directly

2.6.3 Long-term objects will enter the old age

2.6.4 Age determination of dynamic objects

2.6.5 Guarantee of space allocation

3. Java memory model and threads

3.1 Java memory model

Mask differences in memory access across hardware and operating systems.

3.1.1 Interaction between main memory and working memory

operation Function object explain
lock Main memory Identifies a variable as a thread-exclusive state
unlock Main memory Release a variable that is locked before it can be locked by another thread
read Main memory Transfer the value of a variable from main memory to thread working memory for use by the LOAD operation
load The working memory The value of the variable obtained from main memory by the read operation is put into working memory
use The working memory Passing the value of a variable in working memory to the execution engine,

This operation is performed whenever the virtual machine reaches a bytecode instruction that requires the use of variable values
assign The working memory Assign a value received from the execution engine to a variable in working memory,

This operation is performed whenever the virtual machine reaches a bytecode instruction that assigns a value to a variable
store The working memory Pass the value of a variable in working memory to main memory for write operation
write The working memory Puts the value of the variable from working memory for the store operation into the variable in main memory

3.1.2 Special rules for Volatile variables

The keyword volatile is the lightest synchronization mechanism provided by the Java Virtual machine.

A variable is defined as the property of volatile:

  1. Ensure that this variable is visible to all threads. But the operation is not atomic and is not safe in concurrent cases.

If the result does not depend on the current value of the variable, Or to ensure that only a single thread changes the value of a variable and that the variable does not need to participate in immutable constraints with other state variables, atomicity is ensured by locking (using synchronize or the atomic classes in java.util.Concurrent).

  1. Disallow instruction reordering optimization.

Consistency is guaranteed by inserting a memory barrier.

3.1.3 Special rules for Long and double variables

Java requires all eight operations between main and working memory to be atomic, but there is a loose rule for 64-bit data types: This allows the VIRTUAL machine to divide reads and writes of 64-bit data that are not volatile into two 32-bit operations. This allows the virtual machine to implement atomicity of the load, store, read, and write operations that are not volatile. That’s the nonatomic agreement for long and double.

3.1.4 Atomicity, visibility and orderliness

Review what features you should be aware of and understand more about the operation.

  • Atomicity (Atomicity)

Atomic variable operations directly guaranteed by the Java memory model include read, load, assign, use, Store, and write. Operations on basic data types can be thought of as atomic. Both lock and unlock allow atomicity for a wider range of operations. The atomicity of the Synchronize block operation is implicitly operated by the higher-level bytecode instructions Monitorenter and Monitorexit.

  • Visibility (Visibility)

When one thread changes the value of a shared variable, other threads are notified immediately. The main operational detail is to synchronize a modified value to primary memory (volatile values are flushed from primary memory before they are used). In addition to volatile, there is also synchronize and final to ensure visibility. The visibility of synchronized blocks is obtained by the rule that a variable must be synchronized to main memory (store, write) before it can be unlocked. And final visibility means: Once the final modified field is complete in the constructor, and the constructor does not pass a reference to “this” (this reference escape is a dangerous thing because other threads may access the “half-initialized” object through this reference), the value of the final field can be seen in other threads.

  • The differentiation (Ordering)

If observed in the thread, all operations are in order; If you observe another thread in one thread, all operations are unordered. The first part of the sentence refers to the phenomenon of “instruction rearrangement” and “synchronization delay between working memory and main memory”. The Java language uses the keyword volatile and synchronize to ensure that operations between threads are ordered. Volatile itself forbids instruction reordering, whereas Synchronize is obtained by the rule that a variable is allowed to be locked by one thread at a time. This rule determines that two synchronized blocks holding the same lock can only be entered serially.

3.1.5 Principle of prior occurrence

The happens-before principle. This principle is the primary basis for determining whether data is contested and threads are safe. Antecedent is a partial ordering relationship between two operations defined in the Java memory model.

Natural antecedent relationship

The rules explain
Procedural order rule Within a thread, code is executed in the order of written control flow
Pipe lock rules An UNLOCK operation occurs first after a lock operation on the same lock
Volatile variable rule Writes to volatile variables occur first after reads to that variable
Thread start rule The start() method of the Thread object precedes every action of the Thread
Thread termination rule All operations in a thread are preceded by termination detection for that thread

(End with thread.join () and return value of thread.isalive ()))
Thread interrupt rule Code that calls the threadinterrupt () method prior to the interrupt thread detects the occurrence of an interrupt event

(Detected by Thread.interrupted())
Object finalization rule The completion of an object’s initialization (the end of constructor execution) occurs first at the beginning of its Finalize () method
transitivity If operation A precedes operation B and operation B precedes operation C, then operation A precedes operation C

3.2 Java and threads

3.2.1 Implementation of threads

Implemented using kernel threads

Threads directly supported by the operating system kernel that are switched by the kernel. Instead of using kernel threads directly, programs use a high-level interface of kernel threads called lightweight processes (LWP). Lightweight processes are threads in the common sense, and each lightweight process has a kernel level thread.

Implemented using user threads

Broadly speaking, anything that is not a kernel thread can be considered a user thread, and therefore a lightweight process can be considered a user thread. In the narrow sense, it is completely built on user-space thread libraries and is not perceived by the kernel system.

Use a mix of user thread clips and lightweight processes

Look at the picture directly

Java thread implementation

This is implemented differently depending on the platform, but can be thought of as a Java thread mapping to a lightweight process.

3.2.2 Java Thread Scheduling

Collaborative thread scheduling

Thread execution time is controlled by the thread itself, which is simple to implement and can be known by the thread itself, so there is basically no thread synchronization problem. The disadvantage is that the execution time is not controllable, easy to block.

Preemptive thread scheduling

Each thread is allocated execution time by the system.

3.2.3 State Transition

Five states:

  • New (new)

A thread that has not been started since it was created.

  • Run (Runable)

Runable includes Running and Ready in the operating system thread state, where a thread may be executing or waiting for the CPU to allocate time for it.

  • Waiting indefinitely

Threads in this state are not allocated time by the CPU; they wait for other threads to wake up.

The following methods cause the thread to enter an indefinite wait state: 1. The object.wait () method does not set Timeout. 2. Thread.join() method without Timeout parameter. 3. LookSupport. Park () method.

  • Timed Waiting

Threads in this state also do not allocate time, but instead of waiting for other threads to wake up explicitly, they are automatically woken up by the system after a certain amount of time.

The following methods make the Thread enter the finite wait state: 1.Thread.sleep() method. 2. Set the object.wait () method with the Timeout parameter. 3. Set the thread.join () method with the Timeout parameter. 4. LockSupport. ParkNanos () method. 5. LockSupport. ParkUntil () method.

  • Obstruction (Blocked)

A thread is blocked. The difference between the “blocked state” and the “wait state” is that the “blocked state” is waiting to acquire an exclusive lock. This time will occur when another thread gives up the lock. A “wait state” is waiting for a certain amount of time, or wakeup action, to occur. The thread enters this state while the program is waiting to enter the synchronization zone.

  • The end (Terminated)

The thread state of a terminated thread.

4. Thread safety and lock optimization

/ / to fill

5. Class file structure

/ / to fill

A little lazy… Let’s post some urls first.

1. Official: The class File Format 2. Yishan: Java Virtual Machine Principles Diagram 1.1. Basic organization structure of class files

6. Vm class loading mechanism

The virtual machine loads the data describing the Class from the Class file into memory, verifies, parses, and initializes the data, and eventually forms Java types that the virtual machine can use directly.

In the Java language, type loading, concatenation, and initialization are all done during program execution.

6.1 Class loading timing

Life cycle of a class (7 phases)

The order of the five stages of load, validation, preparation, initialization and unload is determined. The parsing phase can start after initialization (run-time binding or dynamic binding or late binding).

There are five cases in which classes must be initialized (and loading, validation, and preparation naturally need to be done before that) :

  1. Initialization is triggered when four bytecode instructions — new, getstatic, putstatic, or Invokestatic — are not initialized. Usage scenarios: Instantiate an object with the new keyword, read a static field of a class (except static fields that are modified by final and have put the result into the constant pool at compile time), and call a static method of a class.
  2. When a reflection call is made to a class using the methods of the java.lang.Reflect package.
  3. When initializing a class, if the parent class is not initialized, the initialization of the parent class is triggered first.
  4. When the virtual machine starts, the user specifies a main class (the one containing the main() method) to load, and the virtual machine initializes this main class.
  5. When using JDK 1.7 dynamic language support, if a Java lang. Invoke. The final analytical results REF_getStatic MethodHandle instance, REF_putStatic, REF_invokeStatic method handles, If the class to which the method handle corresponds has not been initialized, initialization must be triggered first.

The previous five methods are active references to a class. In addition, all methods that reference a class do not trigger initialization. Passive references are preferred. To name a few examples

public class SuperClass {
    static {
        System.out.println("SuperClass init!");
    }
    public static int value = 1127;
}

public class SubClass extends SuperClass {
    static {
        System.out.println("SubClass init!"); }}public class ConstClass {
    static {
        System.out.println("ConstClass init!");
    }
    public static final String HELLOWORLD = "hello world!"
}

public class NotInitialization {
    public static void main(String[] args) {
        System.out.println(SubClass.value);
        /** * output : SuperClass init! * * Static objects that refer to the parent class by subclass do not result in subclass initialization * only classes that directly define this field are initialized */

        SuperClass[] sca = new SuperClass[10];
        /** * output: ** Referencing a class through an array definition does not trigger initialization of the class ** The virtual machine dynamically creates an array class */ at run time

        System.out.println(ConstClass.HELLOWORLD);
        /** * output: ** Constants are stored in the constant pool of the calling class at compile time. In essence, they are not directly referenced to the class that defines the class constant, and therefore do not trigger initialization of the class that defines the constant. * "Hello World" is stored in the NotInitialization constant pool during compile-time constant propagation optimization. * /}}Copy the code

6.2 The loading process of classes

6.2.1 load
  1. Get the binary stream (ZIP package, network, operation generation, JSP generation, database read) that defines the subclasses through the fully qualified name of a class.
  2. Transform the static storage structure represented by this byte stream into the runtime data structure of the method area.
  3. Generate a java.lang.Class object in memory that represents the Class and acts as an access point for methods to the Class’s various data.

Array classes are special: The array class itself is not created by the classloader; it is created directly by the Java virtual machine. However, array classes and class loaders are still closely related, because array classes are ultimately created by the class loader. Array creation is as follows:

  1. If the component type of the array is a reference type, it is recursively class-loaded.
  2. If the component type of the array is not a reference type, the Java virtual machine marks the array for bootstrap classloader association.
  3. The visibility of an array class is the same as that of its component type. If the component type is not a reference type, the visibility of the array class defaults to public.

The java.lang.Class object of the in-memory instance exists in the method area. As an external interface for the program to access these types of data in the method area. Parts of the load and connect phases intersect, but the start times remain sequential.

6.2.2 validation

Is the first step in connecting and ensures that the byte stream in the Class file contains the information required by the current VIRTUAL machine.

File format validation
  1. Does it start with the magic number 0xCAFEBABE
  2. Check whether the major and minor versions are within the processing range of the current VM
  3. Constants in the constant pool have types that are not supported constants (check the constant tag flag)
  4. Is there any index value that points to a constant that does not exist or that does not conform to a type
  5. CONSTANT_Utf8_info whether there is data in the constant that does not conform to UTF8 encoding
  6. Additional information about whether the sections in the Class file themselves have been deleted

Only after passing the verification in this stage, the byte stream will enter the method area of memory for storage, so the following three verification stages are all based on the storage structure of the method area, and will not directly operate the byte stream.

Metadata validation
  1. Does this class have a parent (other than java.lang.object)
  2. Does the parent of this class inherit from a class that is not allowed to inherit (final modified classes)?
  3. If the class is not abstract, does it implement all the methods required by its parent or interface
  4. Whether a field or method in the class conflicts with the parent class (overwriting final fields in the parent class, with improper overloading)

This stage is mainly to semantic verification of metadata information of the class to ensure that there is no metadata information that does not conform to Java language specifications.

Bytecode verification
  1. Ensure that the data type of the operand stack and the sequence of instructions work together at any time (no longer reading an int)
  2. Ensure that jump instructions do not jump to bytecode instructions outside the method body
  3. Ensure that type conversions in the method body are valid (it is safe to assign a subclass object to a superclass data type, and the reverse is illegal)

This is the most complex stage in the validation process, and the main purpose is to determine that the program semantics are legitimate and logical through data flow and control flow analysis. In this phase, the method body of the class is verified and analyzed to ensure that the methods of the verification class do not cause events that harm VM security during running.

Symbolic reference verification
  1. Whether a class can be found for a fully qualified name described by character creation in a symbol reference
  2. Whether or not a field descriptor for a character method exists in the specified class, and the methods and fields described by the simple name
  3. The accessibility of classes, fields, and methods in symbolic references (private, protected, public, default) is accessible to the current class

The final stage of validation occurs when the sprint converts a symbolic reference to a direct reference, which takes place during the parse phase, the third stage of the join. Symbolic reference validation can be thought of as checking the match of information outside the class itself (various symbolic references in the constant pool), as well as the above mentioned. Symbolic reference purpose is to ensure normal execution parsing action, if not through reference symbol verification will throw a Java lang. IncompatibleClass. ChangeError abnormal subclass. Such as Java. Lang. IllegalAccessError, Java. Lang. NoSuchFieldError, Java. Lang. NoSuchMethodError, etc.

6.2.3 preparation

This phase formally allocates memory for the class and sets the initial values of class variables, which are allocated by internal methods (static variables do not contain instance variables).

public static int value = 1127; This code is 0 after the initial value is set because no Java methods have been executed yet. The putStatic instruction that assigns value to 1127 is stored in the clinit() method after the program is compiled, so value is assigned during initialization.

A zero value for the base data type

The data type Zero value The data type Zero value
int 0 boolean false
long 0L float 0.0 f
short (short) 0 double 0.0 d
char ‘\u0000’ reference null
byte (byte) 0

Special case: If the ConstantValue attribute exists in the field attribute table of a class field, the VM will set value to 1127 based on the ConstantValue setting during the preparation phase.

6.2.4 parsing

This stage is where the virtual machine replaces symbolic references in the constant pool with direct references.

  1. Symbolic reference A symbolic reference describes the referenced object as a set of symbols, which can take any form of literal.
  2. Direct reference A direct reference can be a pointer to a target, a relative offset, or a handle that can be indirectly located to the target. Direct references are related to the implementation of the fast memory layout

The parse action is mainly for class or interface, field, class method, interface method, method type, method handle, and call point qualifier 7 class symbol references corresponding to the constant types in the constant pool.

6.2.5 initialization

While the previous process is dominated by the virtual machine, the initialization phase starts executing the Java code in the class.

6.3 Class loaders

Gets the binary byte stream describing a class by its fully qualified name.

6.3.1 Parental delegation model

From the perspective of the Java virtual machine, there are only two types of loaders: a startup class loader (implemented in C++ as part of the virtual machine); The other is a loader for all other classes (implemented in Java, independent of the virtual machine and fully inherited from java.lang.classloader)

  1. Start the classloader to load classes in lib or by -xbootclasspath

  2. The extension classloader loads classes in lib/ext or in the path specified by the java.ext.dirs system variable

  3. The reference ClassLoader is responsible for loading the library specified on the user’s path.





6.3.2 Break the parental delegation model

Keyword: Thread Context ClassLoader

The last

After the first two cursory readings, I can understand the content, but I have trouble remembering the details. Often encounter not knowledge points on the Internet search, so knowledge points too fragmented brain without a system is not only more difficult to remember, and more easy to chaos. However, by recording in this way, I find that I am much clearer. Even if I forget it in the future, the cost of picking up knowledge again is much lower.

Although I have read some chapters this time, I have not finished recording them. Such as their deep understanding of free time to record again, the content here are from zhou Zhiming’s “In-depth understanding of Java virtual machine”, interested in the paper version.

Thank you for reading