The runtime data area

This series of articles addresses common JVM questions during job interviews. These problems are common because they are basic and require familiarity with the features and principles of the JVM to work for a programmer who is a bit of a pusher. I’ve had my fair share of setbacks and failures during the interview process, and I’ve summarized some common points that can be used both for interviews and to give readers insight into the JVM’s offering outline.

In a programming language like C, the programmer needs to manually allocate and release memory himself. Unlike Java, it has a garbage collector, which is responsible for freeing memory.

During the execution of Java programs, the Java VIRTUAL machine divides the memory it manages into different data areas. Let’s take a look at the process of Java program execution:

Images from https://www.cnblogs.com/dolphin0520/p/3613043.html

Java source files (with the.java suffix) are first compiled into bytecode files (with the.class suffix) by the Java compiler. The bytecode files for each class are then loaded by the JVM’s classloader and sent to the JVM execution engine for execution. The Runtime Data Area (Runtime Data Area) is used by the JVM to store Data and information needed during the execution of the program. Therefore, the memory management we often talk about in Java is the management of this space (how to allocate and reclaim memory space).

Main contents of this paper:

JVM memory partitioning
- The heap
- Methods area
- Run-time constant pool
- Java virtual machine stack
- Local method stack
- Program counter
- The stack and the heap
Direct memory
- Out-of-heap memory garbage collection mechanism
The JVM class loading
- Class loading process
- Class loaders predefined by the JVM
- Parental delegation pattern
  - Parent delegation mechanism
  - Parental delegation
- Object creation
  - Object memory layout
- Object access location

JVM memory partitioning

Runtime data is divided into thread-private and shared data areas. The thread private data area contains the program counter, virtual machine stack, local method area, and the data area shared by all threads contains the Java heap, method area, and a constant pool in the method area.

Let’s take a look at each of these data areas in turn.

The heap

The heap is used to hold object instances, and all objects and arrays are allocated on the heap. Is the largest area of memory managed by the JVM. The Java heap is an area of memory shared by all threads and is created when the virtual machine is started. The sole purpose of this memory area is to hold object instances, and almost all object instances and arrays are allocated memory here. The Java Heap is the primary area managed by the Garbage collector and is therefore also known as the Garbage Collected Heap. From a garbage collection point of view, the Java heap can also be subdivided into the new generation and the old generation, since collectors now use generational garbage collection algorithms. The new generation is divided into Eden space, From Survivor space, To Survivor space, etc. The purpose of further division is To better reclaim memory or allocate memory faster.

Methods area

The method area, like the Java heap, is an area of memory shared by threads to store data such as class information that has been loaded by the virtual machine, constants, static variables, and code compiled by the compiler. The method areas in the HotSpot VIRTUAL machine are also often referred to as persistent generations, which are not essentially equivalent. It’s just that the HotSpot VIRTUAL machine design team implements the method area with persistent generations so that the HotSpot VIRTUAL machine garbage collector can manage this part of memory as well as the Java heap. This is not a good idea, however, because it is more likely to run into memory overflow problems. Garbage collection is relatively rare in this area, but it is not permanent once data enters the method area.

Run-time constant pool

The runtime constant pool is part of the method area. In addition to the description of the Class version, fields, methods, interfaces, etc., the Class file also contains constant pool information (used to store various literal and symbolic references generated at compile time).

Java virtual machine stack

The Java virtual machine stack is thread-private and has the same lifecycle as a thread, describing an in-memory model of the execution of Java methods. Java memory can be roughly divided into Heap memory and Stack memory, where the Stack is now referred to as the virtual machine Stack, or the local variable scale part of the virtual machine Stack. Stores information such as local variable tables, operand stacks, dynamic links, and method exits. The local variable table mainly stores various data types and object references known by the compiler.

Local method stack

This function is very similar to that of the virtual machine stack, except that the virtual machine stack performs Java methods (that is, bytecode) services for the virtual machine, while the Native method stack serves Native methods used by the virtual machine. A Native Method is an interface through which a Java program calls non-Java code. When you define a Native method, you do not provide an implementation body (somewhat like defining a Java interface) because the implementation body is implemented outside of a non-Java language. The identifier Native can be used with all other Java identifiers except Abstract.

We know that when a class is first used, its bytecode is loaded into memory and only loaded back once. The entry to the loaded bytecode maintains a list of all the method descriptors of the class, which contain information about where the method code is stored, what parameters it takes, the method descriptors (public, etc.), and so on.

If a method descriptor has a native, the descriptor block will have a pointer to the implementation of the method. These implementations are in some DLL files, but they are loaded by the operating system into the Address space of the Java program. When a class with a local method is loaded, its associated DLL is not loaded, so Pointers to the method implementation are not set. These DLLS are loaded only before the local methods are called, which is done by calling java.system.loadLibrary().

It should be noted that there is an overhead to using the native approach, which loses many of the benefits of Java. If there is no alternative, we can choose to use the local method.

Program counter

A program counter is a small memory space that can be viewed as a line number indicator of the bytecode being executed by the current thread. The bytecode interpreter works by changing the value of this counter to select the next bytecode instruction to be executed. Branches, loops, jumps, exception handling, thread recovery and other functions rely on this counter to complete. In addition, in order to restore the thread to the correct execution position after switching, each thread needs to have an independent program counter, which does not affect each other and is stored independently. We call this kind of memory area “thread private” memory.

The stack and the heap

The stack takes care of the execution of the program, that is, how the program executes, or how it processes data; The heap solves the problem of data storage, how and where data is stored. In Java, a thread has a corresponding thread stack, which is easy to understand because different threads execute logic differently and therefore require a separate thread stack. The heap is shared by all threads. The stack is the unit of execution, so the information stored in it is related to the current thread (or program) information. Including local variables, program running state, method return values, and so on; The heap is only responsible for storing object information. The Java heap is a run-time data area from which the class’s (objects are allocated space. These objects are created by directives such as New, Newarray, anewarray, and multianewarray, and do not require program code to explicitly release them. The heap is taken care of by garbage collection, and the advantage of the heap is that it can allocate memory dynamically. The lifetime of the heap does not have to be told to the compiler, because it allocates memory dynamically at run time, and the Java garbage collector automatically collects data that is no longer used. The downside is that access is slow because memory is allocated dynamically at run time. The advantage of the stack is that access is faster than the heap, second only to registers, and stack data can be shared. However, the disadvantage is that the size and lifetime of the data in the stack must be determined, which lacks flexibility. The stack contains basic class variables (int, short, long, byte, float, double, Boolean, char) and object handles.

Direct memory

In Java, ByteBuffer is often used when we want to manipulate data in lower levels, usually in the form of bytes. ByteBuffer provides two static instances:

public static ByteBuffer allocate(int capacity)  
public static ByteBuffer allocateDirect(int capacity) 
Copy the code

Why offer two ways? This has to do with Java’s memory usage mechanism. Heap ByteBuffer is a class of objects that are allocated to the JVM’s heap memory and are directly garbage collected by the Java VIRTUAL machine. One is that direct ByteBuffer is allocated in memory outside the virtual machine via JNI. The NIO(New Input/Output) class added in JDK1.4 introduces an I/O mode based on Channel and Buffer, which can directly allocate out-of-heap memory using Native function library. It then operates through a DirectByteBuffer object stored in the Java heap as a reference to this memory. This can significantly improve performance in some scenarios because it avoids copying data back and forth between the Java heap and Native heap. Native direct memory allocation is not limited by the Java heap, but since it is memory, it is limited by the total native memory size and processor addressing space. You cannot view the usage of this fast memory using Jmap. You can only look at its memory usage by looking at top.

Direct memory is not part of the virtual machine’s run-time data area, nor is it defined in the virtual machine specification, but it is frequently used. It may also cause an OutOfMemoryError to occur. DirectMemory capacity can be specified by -xx :MaxDirectMemorySize. If this is not specified, it defaults to the maximum value of the Java heap.

Out-of-heap memory garbage collection mechanism

Direct ByteBuffer recycles memory through full GC. The direct ByteBuffer checks for the condition and calls System.gc (). However, if the -disableexplicitGC parameter is used, it cannot reclaim the fast memory. The -xx :+DisableExplicitGC flag automatically converts the system.gc () call into an empty operation, so we need to reclaim memory manually.

    @Test
    public void testGcDirectBuffer(a) throws NoSuchFieldException, IllegalAccessException {
        ByteBuffer buffer = ByteBuffer.allocateDirect(1024);
        Field cleanerField = buffer.getClass().getDeclaredField("cleaner");
        cleanerField.setAccessible(true);
        Cleaner cleaner = (Cleaner) cleanerField.get(buffer);
        cleaner.clean();
    }
Copy the code

In addition, the CMS GC also recycles the memory of the Direct ByteBuffer. CMS is mainly garbage collection for old chronospaces.

The JVM class loading

In Java, the loading, wiring, and initialization of types are done at runtime. This strategy adds some performance overhead to class loading, but provides a high degree of flexibility. Java’s dynamically extensible language relies on runtime dynamic loading and wiring.

The VIRTUAL machine loads the data describing the Class from the Class file to the memory, verifies, transforms, and initializes the data, and finally forms Java types that can be directly used by the VIRTUAL machine. This is the Class loading mechanism of the Java VIRTUAL machine. A Class file is a binary stream of bytes. In fact, each Class file may represent a Class or interface in the Java language.

Class loading process

The entire life cycle of a class from when it is loaded into virtual machine memory to when it is unloaded from memory includes: 2, Loading, Verification, Preparation, Resolution, Initialization, Using and Unloading are two stages. Preparation, validation and parsing are collectively referred to as Linking.

Load the binary data that finds and loads the class. Loading is the first stage of the class loading process, where the virtual machine does three things:
- Get the binary byte stream defined by the class’s fully qualified name;
- Transform the static storage structure represented by the byte stream into the runtime data structure of the method area;
- A Java.lang. Class object representing this Class is generated in the Java heap as an access point to the data in the method area.
Validation ensures the correctness of the classes being loaded. This stage is to ensure that the information contained in the byte stream of the Class file conforms to the current virtual machine specifications and does not compromise the security of the virtual machine itself. Contains four validation actions: file format validation, metadata validation, bytecode validation, symbol reference validation.
- File format check Verifies that the byte stream complies with the Class file format specification and can be processed by the current version of the VIRTUAL machine. The check may include whether it starts with a magic number, whether the primary or secondary version number is within the scope of the virtual machine, whether constants in the constant pool are unsupported, whether files are deleted or additional information is attached, and so on. Only binary byte streams that pass the file format verification can be stored in the method area of memory, so the next three verification phases are based on the storage structure of the method area and will not operate the byte stream.
- Metadata validation is a semantic analysis of the information described by bytecode to ensure that the content described conforms to the requirements of the Java language specification. Verification points include: Whether there is a parent class (other than object), whether the parent class inherits an uninheritable class (a class modified by final), if the class is not abstract, Whether all methods required by the parent class or interface are implemented, whether methods and fields in the class conflict with the parent class (overwriting final fields of the parent class, improper method overloading, etc.). Metadata verification refers to semantic verification of metadata information of classes to ensure that metadata information that does not conform to Java language specifications does not exist.
- Bytecode verification determines that program semantics are legitimate and logical by analyzing data flow and control flow. The second stage is to verify the data types in the metadata information. This stage is to verify and analyze the method body of the class to ensure that the method of the verified class will not harm the security of virtual machine when running. Check points include ensuring that the data type of the operand stack and the instruction code sequence work together at any time, ensuring that the instruction jump does not jump outside the method body, and ensuring that the type conversion inside the method body is valid. In fact, even bytecode-validated method bodies are not necessarily secure.
- Symbolic reference test
  
  The final check occurs when the virtual machine converts symbolic references to direct references, which takes place during the third phase of the connection, the parse phase. Symbolic reference checking can be thought of as checking the matching of information outside the class itself (the various symbolic references in the constant pool). Check points: whether the corresponding class can be found for the fully qualified name described by string in the symbol reference, whether there is a field descriptor matching the method in the specified class, whether the method and field described by the simple name, whether the access permission of the class, field and method in the symbol reference can be accessed by the current class, etc. The purpose of the symbolic reference check is to ensure that the parse action is properly executed. If it fails the symbolic reference check, it will be thrownjava.lang.IncompatibleClassChangeErrorA subclass of exceptions, such asIllegalAccessError,NoSuchfiledError,NoSuchMethodErrorAnd so on.
Prepare to allocate memory for static variables of the class and initialize them to default values. The preparation phase is the phase that formally allocates memory and sets initial values for class variables, all of which will be allocated in the method area.
Parsing converts symbolic references in a class to direct references. In the parsing phase, the VIRTUAL machine replaces symbolic references in the constant pool with direct references. The parsing action is mainly performed for class or interface references, fields, class methods, interface methods, method types, method handles, and call point qualifiers.
Assigns correct initial values to static variables of a class. The JVM is responsible for initializing classes, primarily class variables.

Class loaders predefined by the JVM

The Bootstrap class loader is a class loader implemented in native code that is responsible for loading the class libraries under < JavaRuntimeHome >/lib into memory. Because bootstrap classloaders involve the details of the virtual machine’s local implementation, the developer cannot directly get a reference to the bootstrap classloaders.
The Extension class loader is responsible for loading libraries into memory at < Java_Runtime_Home >/lib/ext or at the location specified by the system variable java.ext.dir. Developers can use the standard extension classloader directly.
The Application ClassLoader is responsible for loading the class libraries on the user’s classpath.

In addition, there are user-defined class loaders, which are subclasses of java.lang.ClassLoader. During the program running, the class file is dynamically loaded by a subclass of java.lang.ClassLoader to reflect the Java dynamic real-time class loading feature.

Parental delegation pattern

The parental delegation model works like this: if a class loader receives a request for a class load, it does not load the class itself, but delegates the request to the parent loader, up the chain. Therefore, all class loading requests should eventually be passed to the top level of the boot class loader, and the child loader will only try to load the class if the parent doesn’t find the desired class.

Parent delegation mechanism

When an AppClassLoader loads a class, it first does not attempt to load the class itself. Instead, it delegates the classloading request to the parent class loader, ExtClassLoader.
When ExtClassLoader loads a class, it doesn’t try to load the class itself in the first place. Instead, it delegates the class loading request to BootStrapClassLoader.
If the BootStrapClassLoader fails to load, the ExtClassLoader is used to try loading.
If the ExtClassLoader also fails to load, AppClassLoader is used to load it. If the AppClassLoader also fails to load, ClassNotFoundException is reported.

Parental delegation

Reloading of classes can be avoided through hierarchy with priority; To ensure the safe and stable running of Java programs, Java core API defined types will not be arbitrarily replaced.

Object creation

When a virtual machine arrives at a new instruction, it first checks to see if the instruction’s arguments locate the symbolic reference to the class in the constant pool, and to see if the class represented by the symbolic reference has been loaded, parsed, and initialized. If not, the corresponding class loading process must be performed first.

After the class load check passes, the virtual machine next allocates memory for the new objects. The size of memory required by an object is determined after the class is loaded, and the task of allocating space for an object is equivalent to dividing a certain size of memory from the Java heap. There are two allocation modes: Pointer Collision and Free List:

Pointer collisions move a pointer to a free object by the same amount of memory that the object occupies.
Free list The VM maintains a list of available memory blocks that are allocated to a large enough memory space in the object list.

The choice of allocation depends on whether the Java heap is clean, which in turn depends on whether the garbage collector used has collation capabilities. The CENTRAL Authentication Service (CAS) configuration fails to retry to ensure atomicity of update operations.

Object memory layout

In the Hotspot VIRTUAL machine, the layout of objects in memory can be divided into three areas: object headers, instance data, and aligned padding.

Object header. The object header in the Hotspot VIRTUAL machine contains two parts of information. The first part stores the object’s own runtime data (hash, GC generation age, lock status flags, etc.). The other part is a type pointer, a pointer to an object’s class metadata that the virtual machine uses to determine which class the object is an instance of.
Instance data is the valid information that an object actually stores and the content of various types of fields defined in the program.
Alignment of the padding part is not necessarily there, nor does it have any special meaning, but serves as a placeholder. Because the Hotspot VIRTUAL machine’s automatic memory management system requires that the object’s starting address be an integer multiple of 8 bytes, in other words, the object’s size must be an integer multiple of 8 bytes. The object header is exactly a multiple (1 or 2) of 8 bytes, so when the object instance data part is not aligned, it needs to be filled by alignment.

Object access location

Objects are created to use objects, and our Java program operates on specific objects on the heap using reference data on the stack. The object access mode is determined by the VIRTUAL machine. Currently, the mainstream access mode includes handle and direct pointer:

If a handle is used, a chunk of memory will be allocated to the Java heap as the handle pool. Reference stores the handle address of the object, and the handle contains the specific address information of the instance data and type data of the object.
Direct pointer access, then the layout of the Java heap object must consider how to prevent access to the type of data related to the information, the direct store in reference is the object’s address.

Both object access methods have their own advantages. The biggest benefit of using handles for access is that reference stores a stable handle address and only changes the instance data pointer in the handle when the object is moved. Reference itself does not need to be modified. The biggest advantage of using direct pointer access is fast speed, it saves the time cost of a pointer location.

summary

This article focuses on the partitioning of runtime data areas and the classloading mechanism in the JVM. Once objects are created in the JVM, how do you reclaim useless objects? What about the JVM’s garbage collection algorithm and multiple garbage collectors? The next article will explain it in detail.

Subscribe to the latest articles, welcome to follow my official account