Hello, everyone, this is Programmer Cxuan. Welcome to my new article. In this article, I have spent several days summarizing a wave of BASIC knowledge and questions of JVM for you, but the content is not complete.

What are the main functions of the JVM?

The JVM, which stands for Java Virtual Machine (Java Virtual Machine), shields information specific to a particular operating system platform, allowing Java programs to run on different platforms only by generating object code (bytecode) that runs on the Java Virtual Machine.

Describe the memory area of Java?

During the execution of a Java program, the JVM divides the memory it manages into several different areas, some of which are thread private and some of which are thread shared. Java memory areas are also called runtime data areas and are divided as follows:

  • The virtual machine stack: The Java virtual machine stack is the data area that is private to the thread. The Java virtual machine stack has the same life cycle as the thread. The virtual machine stack is also the location where local variables are stored. Method creates one in the virtual machine stack during executionThe stack frame (stack frame). Each method performs a process that corresponds to a stack and a stack.

  • Local method stack: The local method stack is also the data area that is private to the thread. The local method stack stores the area that is mainly stored in Java for methods decorated with native keywords.

  • Program counter: The program counter is also the thread’s private data area. This area is used to store the instruction address of the thread, and is used to determine the branch, loop, jump, exception, thread switch, and recovery functions of the thread, all of which are done by the program counter.

  • Method area: A method area is an area of memory shared by each thread that is used to store data such as virtual machine-loaded class information, constants, static variables, and just-in-time compiler compiled code.

  • Heap: THE heap is the data area shared by threads. The heap is the largest storage area in the JVM, and all object instances are allocated to the heap. After JDK 1.7, the string constant pool was removed from the permanent generation and stored in the heap.

    Memory allocation of heap space (default) :

    • The old days: two-thirds of the heap space
    • Younger generation: one-third of the heap space
      • Eden area: 8/10 of the young generation space
      • Survivor 0:1/10 of the young generation space
      • Survivor 1:1/10 of the young generation space

    To view the default JVM parameters, run the following command on the command line.

    java -XX:+PrintFlagsFinal -version
    Copy the code

    The output is quite large, but only two lines reflect the memory allocation results above

  • Runtime Constant Pool: The Runtime Constant Pool, also known as the Runtime Constant Pool, is part of the method area, which has a funny name and is often referred to as non-heap. It does not require that constants be generated only at compile time, meaning that instead of putting constants in the constant pool at compile time, new constants can be put into the constant pool at run time. A typical example is the String intern method.

Describe the class loading mechanism in Java?

The Java virtual machine is responsible for loading the data describing the Class from the Class file to the system memory, verifying, converting, parsing, and initializing the Class data, and finally forming Java types that can be directly used by the VIRTUAL machine. This process is called the Java Class loading mechanism.

From the moment a class is loaded into the virtual machine memory to the moment it is unloaded.

The class loading mechanism consists of five steps: load, link, initialize, use, and unload, in a determined order.

The link stage will be subdivided into three stages, namely, verification, preparation and analysis. The order of these three stages is uncertain, and these three stages are usually carried out interactively. The parsing phase usually starts after initialization, in order to support the Runtime binding feature of the Java language (also known as dynamic binding).

Let’s talk about these processes.

loading

The Java Virtual Machine Specification doesn’t enforce when to start loading this process, so we’re free to do so. Loading is the first stage of the entire class loading process, and in this stage, the Java virtual machine needs to do three things:

  • Gets the binary byte stream that defines a class by its fully qualified name.
  • Convert a storage structure represented by this byte stream into a data structure in the method area of the runtime data area.
  • An in-memory Class object is generated that represents the access point to the data structure.

The Java Virtual Machine Specification does not specify how fully qualified names are obtained, so there are many ways to obtain fully qualified names in the industry:

  • Read from the ZIP package and eventually change to JAR, EAR, WAR format.
  • The Web Applet is the most common application to obtain it from the network.
  • Dynamic generation at run time is the most used dynamic proxy technique.
  • Generated by other files, such as JSP application scenarios, JSP files generate the corresponding Class files.
  • Reading from a database is a smaller scenario.
  • Can be obtained from encrypted files, which is a typical protection against Class files being decompiled.

The loading phase can be completed using either the built-in boot class loader of the VM or a user-defined class loader. Programmers can control the access to byte streams by defining their own class loaders.

Array loading is not created by the class loader, it is allocated directly in memory, but the element types of the array (the array excludes all dimension types) are ultimately loaded by the class loader.

validation

The next step after loading is validation. Since we described in the previous step that a Class object was generated in memory as an entry point to its representative data structure, this step is to ensure that the contents of the byte stream in the Class file comply with the Requirements of the Java Virtual Machine Specification. Ensure that this information does not threaten vm security after it is run as code.

The verification stage is mainly divided into four stages of inspection:

  • File format verification.
  • Metadata validation.
  • Bytecode verification.
  • Symbol reference validation.

File format validation

This phase may include the following verification points:

  • Magic number whether to0xCAFEBABEAt the beginning.
  • Check whether the major and minor versions are acceptable for the current Java VM.
  • Is there any unsupported constant type in the pool constant?
  • Is there any index value that points to a constant that does not exist or that does not conform to a type?
  • CONSTANT_Utf8_info whether there is data in the constant that does not conform to UTF8 encoding.
  • Whether any other information has been deleted or added to parts of the Class file and the file itself.

In fact, there are more verification points than that, and these are just a few excerpts from HotSpot source code.

Metadata validation

This stage is mainly to conduct semantic analysis of the information described by bytecode to ensure that the described information complies with the Java Language Specification. Verification points include

  • Verify that the class has a parent (all classes except the Object class should have a parent).
  • Verify that the parent of a class inherits from a class that is not allowed to inherit.
  • If the class is not abstract, whether the class implements all the methods required in the parent class or interface.
  • Whether final fields are overwritten, whether improper overloading occurs, and so on.

Keep in mind that this stage is just a validation of the Java Language Specification.

Bytecode verification

Bytecode verification is the most complex phase, which is mainly to determine whether the program semantics are legal and logical. In this phase, the method body of the Class (the Code attribute in the Class file) is checked and analyzed. This part of the verification includes

  • Ensure that the data type of the operand stack is the same as the data type of the actual execution.
  • Ensure that any jump instructions do not jump to bytecode instructions outside the method body.
  • Ensure that type conversions in the method body are valid, such as assigning a subclass object to a parent data type but not a parent data type to a subclass.
  • Other validations.

If bytecode validation fails, there is a validation problem. But bytecode verification does not guarantee that the program is secure.

Symbolic reference verification

The validation behavior in the last phase occurs when the virtual machine converts symbolic references to direct references, which occurs in the third phase of the join, the parse phase. Symbolic reference verification can be regarded as the matching verification of all kinds of information other than the class itself, which mainly includes

  • Whether a class can be found for a string fully qualified name in a symbol reference.
  • Specifies whether a field descriptor that matches a method exists in the class and the methods and fields described by the simple name.
  • Whether the class referenced by the symbol and the accessibility of the field method are accessible to the current class.
  • Other validations.

This stage is mainly concerned with ensuring that the parsing behavior can be performed properly. If symbolic reference validation fails, errors such as IllegalAccessError, NoSuchFieldError, and NoSuchMethodError will occur.

The validation phase is very important for the virtual machine, and if you can pass it, your program will run without any impact.

To prepare

The preparation phase is the phase where variables in a class are allocated and their initial values are set. The memory used by these variables should be allocated in the method area. Prior to JDK 7, HotSpot implemented the method area using persistent generation, which was a logical concept. After JDK 8, variables are stored in the Java heap along with Class objects.

The following are the usual initial values for the base and reference types

In addition to the “usual” cases, there are “exceptions” to the ConstantValue attribute. If the class field attribute has a ConstantValue, the value of the variable is initialized to the initial value specified by the ConstantValue attribute, for example

public static final int value = "666";
Copy the code

Value is set to 666 at compile time.

parsing

The parsing phase is the process by which the Java virtual machine replaces symbolic references in the constant pool with direct references.

  • Symbolic referenceSymbol reference Describes the referenced object as a group of symbols. A symbolic reference can be any literal, as long as it is used to unambiguously locate the target, regardless of the layout of the virtual machine.
  • Direct reference: Direct reference A pointer that can point directly to a target, a relatively cheap quantity, or a handle that can be indirectly located to the target. The layout of a direct reference is related to the virtual machine, and the translation of a direct reference for the same symbolic reference is generally different for different virtual machines. If there is a direct reference, the target of the direct reference must be loaded into memory.

If you don’t understand, let me put it another way:

At compile time, each Java class is compiled into a class file, but the virtual machine does not know the address of the referenced class at compile time, so symbolic references are used instead, and the parsing phase is used to convert the symbolic reference to the actual address.

The Java Virtual Machine Specification does not specify when the parsing phase occurs, I’m just asking for the Anewarray, checkcast, getField, getstatic, Instanceof, InvokeDynamic, InvokeInterface, Invokespecial, Invokestatic, Invokevirtual , LDC, LDC_w, LDC2_W, multianewarray, new, putField, and PUTStatic, the 17 bytecode instructions for manipulating symbolic references are parsed before they are used.

Parsing is also divided into four steps

  • Class or interface resolution
  • Field analytical
  • Method resolution
  • Interface method parsing

Initialize the

Initialization is the last step in the class-loading process, and while the Java virtual machine has taken the lead in the previous stages, this step passes the initiative to the application.

For the initialization phase, the Java Virtual Machine Specification strictly states that class initialization can only be triggered under these six circumstances.

  • When four bytecode instructions — new, getstatic, putstatic, or Invokestatic — are encountered, initialization is triggered first if it has not already been initialized. Judging from the names of the four bytecodes, these four bytecodes are actually two scenarios: initialization when calling the new keyword, reading or setting a static field, and calling a static method.
  • When initializing a class, if the parent class has not already been initialized, the parent class needs to be initialized first.
  • When a reflection call is made using the methods of the java.lang.Reflect package.
  • When a virtual machine starts, the user needs to specify the main class to execute, which means the virtual machine initializes the main class first.
  • When using the new dynamic language support in JDK 7, If a jafva lang. Invoke. MethodHandle instance analytical results of the final REF_getstatic, REF_putstatic, REF_invokeStatic, REF_newInvokeSpecial Four types of method handles, and the corresponding class of this method handle has not been initialized, it needs to be initialized first.
  • When an interface defines a new default method in JDK 8 (an interface method decorated with the default keyword), if the implementation class with this excuse is initialized, the interface should be initialized before it.

In fact, there are only the first four that you need to know, the last two are less popular.

This is enough for class loading, but for completeness, let’s talk about the next two processes as well.

use

There is nothing left to say at this stage, except that the code after initialization is invoked dynamically by the JVM.

uninstall

When the Class object representing a Class is no longer referenced, the life of the Class object ends and the corresponding data in the method area is unloaded.

⚠️ However, it is important to note that classes loaded by JVM classloaders are not unloaded, and classes loaded by user-defined classloaders can be unloaded.

How are objects created in the JVM?

When it comes to how objects are created, the answer we usually think of is just to come out new. This answer is not limited to programming, but also applies to all aspects of our lives.

But you can’t just say “new comes out “in an interview, because the interview is more likely to ask you to explain what happens behind the scenes when the program executes the new command.

So you need to explain this from the PERSPECTIVE of the JVM.

When the virtual machine reaches a new instruction (in essence, bytecode), it first checks to see if the instruction’s arguments locate a symbolic reference to a class in the constant pool, and to see if the class represented by the symbolic reference has been loaded, parsed, and initialized.

Symbolic references are used because you probably don’t know what the specific class is at this point.

If it is found that the class did not go through the above classloading process, the corresponding classloading process is performed.

After the class check is complete, the virtual machine will then allocate memory for the new objects, the size of which can be determined after the class is loaded (which I will cover in the interview questions below).

Allocating memory is like dividing a fixed block of memory from the heap. Once partitioned, the virtual machine initializes all allocated memory to zero. If TLAB is used, this initialization can be done in advance at TLAB allocation time. This step ensures that the object instance fields can be used directly in Java code without assigning values.

Next, the Java virtual machine performs necessary Settings on the object, such as determining which class the object is an instance of, the object’s Hashcode, and the object’s GC generation age information. This information is stored in the Object Header of the Object.

If all the above work is done, a new object is created from the virtual machine’s perspective. But for programmers, object creation is just beginning, because the constructor, the

() method in the Class file, has not yet been executed, and all fields have default zero values. The new instruction then executes the

() method and initializes the object as the programmer wishes so that an object can be constructed completely.

What are the memory allocation methods?

After the class is loaded, the virtual machine needs to allocate memory for new objects. Allocating memory for objects is like dividing a certain area of the heap, which involves the question of whether the heap to be divided is tidy.

Assume that the Java heap is tidy, with all used memory on one side, unused memory on the other, and a pointer in the middle that acts as a boundary indicator. Allocating memory for a new object is equivalent to moving The Pointer an equal distance to The free space. This method of allocating memory is called Bump The Pointer.

If the memory in the Java heap is not tidy, and used memory and unused memory are interleaved, there is no way to use pointer collisions in this case. Here is another way to record memory usage: The Free List maintains a List of which memory blocks are available, finds a large enough space from the List to allocate to object instances, and updates the List of records.

So, the choice between the above two allocation methods depends on whether the Java heap is tidy. In some implementations of garbage collectors, collectors such as Serial and ParNew, with compacting processes, use pointer collisions; The CMS collector, which is based on the cleanup algorithm, uses the free list, which we’ll talk about later.

Can you describe the memory layout of objects?

In the hotspot VIRTUAL machine, objects are laid out in memory in three areas:

  • Object head (Header)
  • Instance Data
  • Align Padding

The memory distribution of these three areas is shown in the figure below

Let’s take a closer look at what’s in the above object.

The Header object head

The object Header contains the MarkWord, the Klass Pointer, and, if it is an array, the length of the array.

In a 32-bit virtual machine, the MarkWord, Klass Pointer, and array lengths each take up 32 bits, or 4 bytes.

On a 64-bit vm, the MarkWord, Klass Pointer, and array lengths each take up 64 bits, or 8 bytes.

The size of a Mark Word on a 32-bit VM is different from that of a 64-bit VM. A Mark Word and a Klass Pointer on a 32-bit VM take up 32 bits of bytes respectively. A 64-bit VM uses a Mark Word Pointer and a Klass Pointer to allocate 64 bits of bytes.

It is translated in Chinese

  • Stateless is equal tounlockedThe object header allocates 25 bits of space for storing the hashcode of the object, 4 bits for storing the generation age, 1 bit for storing the identification bit of whether it is biased to lock, and 2 bits for storing the identification bit of lock 01.
  • Biased lockingThe space of 25 bits is still opened, in which 23 bits are used to store the thread ID, 2 bits are used to store the epoch, 4 bits are used to store the generational age, 1 bit is used to store whether there is bias lock identifier, 0 means no lock, 1 means bias lock, and the identifier bit of the lock is still 01.
  • Lightweight lockTo directly open up 30 bit space to store the pointer to the lock record in the stack, 2bit to store the lock flag bit, its flag bit is 00.
  • Heavyweight lockAs with lightweight locks, 30 bits of space is used to store Pointers to heavyweight locks, and 2 bits are used to store the identification bit of the lock, which is 11
  • The GC tagOpen up 30 bit memory space but not occupied, 2 bit space to store the lock flag bit 11.

The lock flag bit of both no-lock and biased lock is 01, but the 1 bit in front distinguishes whether the state is no-lock or biased lock.

The enumeration in the Markoop.hpp class in the OpenJDK provides a clue as to why memory is allocated this way

To explain

  • Age_bits is what we call a generational collection identifier, occupying 4 bytes
  • Lock_bits is the flag bit of a lock, occupying two bytes
  • Biased_lock_bits indicates whether to bias the lock. It takes 1 byte.
  • Max_hash_bits specifies the number of hashcode bytes that can be calculated without locking. 32-4-2 -1 = 25 bytes for a 32-bit VM or 57 bytes for a 64-bit VM. But there are 25 bytes left unused, so a 64-bit Hashcode takes up 31 bytes.
  • Hash_bits: For 64-bit virtual machines, 31 is used if the maximum number of bytes is greater than 31; otherwise, the actual number of bytes is used
  • Cms_bits specifies whether a 64-bit vm uses 0 bytes. If yes, a 64-bit vm uses 1byte
  • Epoch_bits is the size of the epoch in bytes, which is 2 bytes.

In the vm object header allocation table above, we can see that there are several lock states: Weightless (stateless), biased locking, lightweight locking, and heavyweight locking, among which lightweight locking and biased locking are newly added after the optimization of SYNCHRONIZED lock in JDK1.6, its purpose is to greatly optimize the lock performance, so in JDK1.6, Synchronized is also less expensive. In fact, there are no locks and heavyweight locks in terms of whether or not the lock is locked. The appearance of biased lock and lightweight lock is to increase the performance of the lock, and there is no new lock.

So our focus is on synchronized heavyweight locks, which are locked when a monitor is held by a thread. In the HotSpot virtual machine, the underlying code of monitor is implemented by ObjectMonitor, and its main data structure is as follows (located in the HotSpot virtual machine source ObjectMonitor. HPP file, implemented by C++)

There are several attributes to note in this C++ section: _WaitSet, _EntryList, and _Owner. Each thread waiting to acquire a lock is encapsulated as an ObjectWaiter object.

_Owner is the thread that points to the ObjectMonitor object, and _WaitSet and _EntryList are the lists that hold each thread.

So what’s the difference between these two lists? This problem I talk to you about the lock acquisition process you will be clear.

Two lists of locks

When multiple threads access a synchronized code at the same time, they will first enter the _EntryList collection. When the thread obtains the object’s monitor, it will enter the _Owner area and point the _Owner of the ObjectMonitor object to the current thread. If a lock release (such as wait) is called, the currently held monitor will be released, owner = NULL, _count-1, and the thread will enter the _WaitSet list and wait to be woken up. If the current thread completes, the monitor lock is also released, but instead of entering the _WaitSet list, the _count value is reset.

A Klass Pointer represents a type Pointer, that is, a Pointer to an object’s class metadata that the virtual machine uses to determine which class the object is an instance of.

If you don’t really understand what a pointer is, you can simply say that a pointer is an address that points to some data.

Instance Data Instance Data

The instance data portion is the valid information that the object actually stores and is also the byte size of the individual fields defined in the code, such as 1 byte for a byte and 4 bytes for an int.

Alignment Padding

Alignment doesn’t have to be there, it just acts as a placeholder (%d, %c, etc.)**. This is what the JVM requires, because HotSpot JVM requires that the object’s starting address be an integer multiple of 8 bytes, that is, the object’s byte size should be an integer multiple of 8 bytes.

What are the ways in which objects can be accessed and positioned?

We create an object, of course, in order to use it, but once an object is created, how is it accessed in the JVM? There are generally two ways to access: through a handle and through a direct pointer.

  • If the handle access method is used, there may be a block of memory in the Java heap as a handle pool. The reference stores the handle address of the object, and the handle contains the specific address information of the instance data and the type data of the object. As shown in the figure below.

  • , if use direct Pointers to access objects in the Java heap memory layout will be different, the stack area reference indication is the address of the instance data of pile, if only access to the object itself, wouldn’t be much of a direct access to the overhead, and a pointer to the object type data is method exists in the area, if positioning, need more directly positioning overhead at a time. As shown in the figure below

The two object access methods have their own advantages. The biggest advantage of using a handle is that the address of the handle is stored in the reference. When the object is moved, only the address of the handle can be changed without changing the object itself.

Using direct Pointers for access is faster, and it saves the time overhead of a pointer location, which is also worth optimizing because object access is so frequent in Java.

The object’s type data is the type of the object, its parent class, interface and method implemented, etc.

How do I determine if an object is dead?

As we all know, almost all objects are distributed in the heap, and when we no longer use objects, the garbage collector collects them ♻️, so how does the JVM determine which objects are “garbage”?

There are two ways to do this. Let’s start with the first one: reference counting.

Reference counting is judged by adding a reference counter to an object, which increases in value by one each time it is referenced. When a reference is invalid, the value of the counter is reduced by one; Any object whose counter is zero at any time is no longer used. While this is a very crude and often useful way to do this, it is not used in the mainstream Hotspot VIRTUAL machine implementations in the Java world because reference counting does not solve the problem of circular references between objects.

The circular reference problem simply means that two objects depend on each other, and there are no other references, so the virtual machine cannot determine whether the reference is zero and therefore garbage collect.

Another way to tell if an object is useless is the reachability analysis algorithm.

The current mainstream JVMS all adopt the reachability analysis algorithm to make judgment. The basic idea of this algorithm is to use a series of root objects called GC Roots as the starting node set. From these nodes, search down according to Reference relationship, and the search path is called Reference Chain. If there is no reference chain between an object and GC Roots, or if there is no reachable link from GC Roots to the object, then the object is considered useless and needs to be garbage collected.

This reference is as follows

As shown in the figure above, the traversal starts from the enumeration root node GC Roots. Objects 1, 2, 3, and 4 are the objects with reference relationships, while objects 5, 6, and 7 are not large enough between them and GC Roots, so they are considered recyclable objects.

In Java technology system, objects that can be retrieved as GC Roots mainly include

  • The object referenced in the virtual machine stack (the local variable table in the stack frame).

  • An object referenced by a class static attribute in a method area, such as a Java class reference type static variable.

  • An object referenced by a constant in a method area, such as in a string constant pool.

  • Objects referenced by JNI in the local method stack.

  • References within the JVM, such as Class objects corresponding to basic data types, exception objects such as NullPointerException, OutOfMemoryError, and system Class loaders.

  • All objects held by synchronized.

  • There are also internal JVMS such as JMXBeans, callbacks registered in JVMTI, native code caches, and so on.

  • Depending on the garbage collector chosen by the user and the memory region currently being reclaimed, objects may also be added temporarily to form a COLLECTION of GC Roots.

Although we have mentioned two methods to judge object recycling above, neither reference counting method nor GC Roots can be judged without reference.

This article deals with strong citation, soft citation, weak citation and virtual citation. You can read this article by the author

Be careful not to end up in the trash.

How do YOU identify a class that is no longer in use?

Determining a type as a “class that is no longer in use” requires the following three criteria

  • All instances of this class have been reclaimed, that is, there are no instances of this class or any of its strings in the Java heap
  • The classloader that loaded this class is already recycled, but classloaders are generally difficult to recycle unless the classloader was designed for this purpose, such as OSGI, JSP reloading, etc.
  • The corresponding Class object of this Class is not referenced anywhere, and the properties and methods of this Class cannot be accessed through reflection at any time.

The virtual machine can reclaim useless classes that meet the above three conditions.