The Art of the JVM - JAVA Memory Management (Part 4)

Like the article, move to give a thumbs-up

The introduction

Dear readers, in this serialization of the JVM chapter, the first three chapters cover class loaders, and this article will dive into another topic in the JVM world, the Java memory model. A thorough understanding of the Java memory model is essential. Once you understand Java’s memory model and the regions of memory space, you can better understand how Java creates objects and allocates space for them. A solid foundation for subsequent JVM tuning. For the current Internet industry, high concurrency, high availability has been essential, and learning THE JVM tuning, not only in the enterprise work for the high concurrency scenario of the system optimization, in the daily system error screening, system optimization also plays a vital role. I hope this article will teach you real skills. Thank you for your continued attention and recognition.

One: JDK architecture

The relationship between JDK, JRE, and JVM

JDK: Java Development Kit, which contains the JRE and Development toolkits, such as Javac and Javah, which generate C header files and source files needed to implement native methods. JRE: Java Runtime Environment (JRE), which contains the JVM and class libraries. JVM: Java Virtual Machine (Java Virtual Machine), responsible for executing conforming Class files.

The cross-platform nature of the Java language

The location of the JVM

(1) Usually work in the basic contact with Java libraries and applications and Java core class libraries, know how to use it, but in the final analysis, the code is to be compiled into class files loaded and executed by the Java virtual machine, the results or phenomena generated can be explained by the operation mechanism of the Java virtual machine. Some of the same code can produce different results depending on the implementation of the virtual machine.

(2) In the structure of the Java platform, it can be seen that the Java Virtual Machine (JVM) is at the core of the program and the underlying operating system and hardware independent of the key. Below it is the migration interface, which consists of two parts: the adapter and the Java operating system, where the platform-dependent part is called the adapter; The JVM is implemented on specific platforms and operating systems through porting interfaces; On top of the JVM are Java’s base and extension libraries and their apis, Applications and Java applets written using the Java API can run on any Java platform regardless of the underlying platform because the Java Virtual Machine (JVM) separates the program from the operating system, thus realizing Java platform independence.

(3) An abstract Specification of The JVM Specification is a collection of concepts described in detail in The Java Virtual Machine Specification; JVM implementations are either software or a combination of software and hardware, which have been implemented by many manufacturers and exist on multiple platforms. The task of running a Java program is undertaken individually by a runtime instance of the JVM.

(4) The JVM can be implemented by different vendors. Due to the different manufacturers will inevitably lead to some differences in the implementation of JVM, such as the famous TaobaoVM in China; However, the JVM can still be cross-platform, thanks to the architecture in which it was designed.

(5) The JVM has an explicit task during its lifetime, which is to load the bytecode file. Once the bytecode enters the virtual machine, it is interpreted and executed by the interpreter, or optionally converted into machine code by the real-time code generator, i.e. Java programs are executed. So when a Java program starts, an instance of the JVM is created; When the program ends, the instance disappears.

The Class bytecode

Compiled code that is executed by the Java virtual machine is represented in a platform-neutral (hardware – and operating system-independent) binary format and is often (but not always) stored as a file, hence the name Class file format. The Class file format defines exactly how classes and interfaces are represented, including some of the detailed conventions in the platform-dependent object file format. As the concept goes, Java has developed its own binary format to achieve platform independence, and is often stored as a file called a Class file. In this way, on different platforms, as long as the Java VIRTUAL machine is installed and the Java runtime environment (JRE) is available, the same Class file can be run.

The Java platform is built by Java virtual machines and Java application programming interfaces. The Java language is the gateway to the platform. Programs written and compiled in The Java language can run on this platform. Bytecode files are compiled from Java source files. This process is very complex. Those who have learned “Principles of Compilation” know that they must go through lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, etc. Likewise, Java source file to bytecode generation is expected to go through these steps. Last task is to invoke the con Javac compiler. Sun. View Javac. JVM. Gen class will compile this class syntax tree into Java bytecode file. In fact, compiling bytecode is nothing more than converting Java syntactically compliant Java code into jVM-compliant bytecode files. The ARCHITECTURAL model of the JVM is stack-based, and most of it is done through the stack. Bytecode structure is special, its internal does not contain any separator, not artificial distinction between paragraphs (bytecode file itself is read machine), so both in byte order, quantity have strict rules, all 16, 32, 64 – bit length data will be constructed as 2, 4, 8 — — — — — 8 byte units, Multibyte data items are always stored in big-endian order (the highest byte is the lowest bit of the address, and the status byte is the highest bit of the address). As described in the Java Virtual Machine Specification Java SE7 edition, each bytecode actually corresponds to a globally unique class or interface definition. The bytecode file uses a pseudo-structure similar to the C language structure to describe the bytecode file format. The corresponding basic types in the bytecode file u1, U2, U4, and U8 are unsigned 1, 2, 4, and 8 bytes respectively.

Class file —- general format

It is worth mentioning that the first four bytes of a valid class bytecode file, 0xCAFEBABE, are all fixed and are called “magic”, i.e. magic. This is what the JVM uses to verify that the read object file is a valid and valid bytecode file. As you can see, the JVM does not validate file suffixes to prevent manual modification.

JVM underlying architecture diagram

The diagram above, which I put a lot of thought into, basically covers the structure of the Java memory model. B: I’ll serve it today. This article will make the above picture clear.

Runtime data area:

1, the heap

The Java heap is created at virtual machine startup and is used to allocate memory for class instance objects and arrays. The Java Virtual Machine specification does not specify the form of objects in the heap. In Java, the heap is divided into two distinct regions: Young and Old; This is the “generational collection algorithm” used by the JVM. In short, it means that different strategies are used for storing and collecting Java objects with different characteristics. Naturally, the allocation mechanism and collection algorithm are different. Young is divided into three regions: Eden, From Survivor, and To Survivor.

Generational collection algorithm: Different algorithms are used to process [store and recycle]Java transient and persistent objects. Most Java objects are transient objects, which are born and die and live for a short time. They are usually stored in the Young generation and recycled by the replication algorithm. Older objects tend to have long lifetimes, which in extreme cases are consistent with JVM lifetimes; A mark-compression algorithm is usually used for garbage collection of old ages. The purpose of this partition is to enable the JVM to better manage objects in heap memory, including memory allocation and reclamation. The following exceptions can occur to the Java heap: If the actual heap required exceeds the maximum capacity that the automatic memory management system can provide, the Java virtual machine will throw an OutOfMemoryError. OOM for short.

Heap size = New generation + old age. The size of the heap can be specified with arguments -xms (the initial size of the heap), -xmx (the maximum size of the heap).

Among them, Young is subdivided into Eden and two Survivor regions, which are named from and to respectively to distinguish them. By default, Edem: from: to = 8:1:1. (This can be set with the parameter -xx :SurvivorRatio.

That is, Eden = 8/10 Cenozoic space size, from = to = 1/10 Cenozoic space size.

The JVM only uses Eden and one of the Survivor regions at a time to serve objects, so at any given time one of the Survivor regions is always free.

The actual memory available for the new generation is 9/10 (90%) of the new generation space.

The Java heap is the primary area for GC garbage collection. There are two types of GC: Minor GC and Full GC (also called Major GC)

Minor GC is a garbage collection action that occurs in the new generation using a copy algorithm. GC is typically where garbage collection occurs in some area of heap space, and Young is where almost all Java objects are born. This is where Java objects are allocated and stored. Most objects in Java usually don’t live long, and they die quickly. When an object is judged to be “dead”, it is the GC’s responsibility to reclaim the object’s memory space. The Cenozoic era is a frequent area for garbage collection.

2. Method area (meta-space)

The method area is created at virtual machine startup and stores structural information for each class, such as runtime constant pools, field and method data, bytecode content for constructors and common methods, and special methods used during class, instance, and interface initialization. The following exceptions may occur in the method area: If the memory space of the method area does not meet the memory allocation request, the Java virtual machine will throw an OutOfMemoryError.

3. JVM stack space

Each Java virtual machine thread has its own Java virtual machine stack. Java virtual machine stack is used to store stack frames, and stack frames mainly include: local variable table, operand stack, dynamic link. The Java virtual machine stack can be implemented as a fixed size or dynamically scalable memory size. The Java virtual machine uses a local variable table for parameter passing during method calls. The length of the local variable table is determined at compile time and stored in the binary representation of the class and interface. A local variable can hold data of types Boolean, byte, CHAR, short, float, Reference, and returnAddress. Two local variables can hold data of type long and double. The Java virtual machine provides bytecode instructions for copying constant or variable values from the fields of a local variable table or object instance to the operand stack. It also provides instructions for fetching data from the operand stack, manipulating data, and pushing the results of operations back onto the stack. At the time of a method call, the operand stack is also used to prepare the parameters of the calling method and to receive the results of the method return. Each stack frame contains a reference to the runtime constants area to support dynamic linking of the current method. In a Class file, method calls and access to member variables are represented by symbolic references, and the purpose of dynamic linking is to convert symbolic references to direct references to the actual method or to run access variables with the correct offset of memory location. In general, the Java virtual machine stack is where local variables and procedure results are stored. The following exceptions may occur on the Java virtual machine stack: If the Java virtual machine stack is implemented with a fixed size of memory, the Java virtual machine will raise a StackOverflowError if the stack size allocated by thread requests exceeds the maximum size allowed by the Java virtual machine stack. An OutOfMemoryError will be raised if the Java virtual machine stack is implemented to dynamically expand the memory size, and the expansion has been attempted, but there is not enough memory to complete the expansion, or there is not enough memory to create the corresponding virtual machine stack when a new thread is created.

1. Symbolic References:

A symbolic reference is a set of symbols that describe the referenced object. The symbol can be any literal, as long as it is used unambiguously to locate the object. For example, it appears in Class files as constants of the types CONSTANT_Class_info, CONSTANT_Fieldref_info, CONSTANT_Methodref_info, etc. Symbolic references are independent of the memory layout of the virtual machine, and the target of the reference is not necessarily loaded into memory. In Java, a Java class will be compiled into a class file. At compile time, the Java class does not know the actual address of the referenced class, so it has to use symbolic references instead. For example, org.simple.People refers to org.simple.Language. At compile time, People does not know the actual memory address of Language, so it can only use the symbol org.simple.Language (assuming this, Of course, it is actually represented by a constant like CONSTANT_Class_info) to represent the address of the Language class. The memory layout of various virtual machine implementations may differ, but they all accept the same symbolic references because the literal form of symbolic references is explicitly defined in the Class file format of the Java Virtual Machine specification.

2. Direct quotes:

A direct reference can be

(1) A pointer to a target (for example, a direct reference to a Class object, a Class variable, or a Class method may be a pointer to the method area)

(2) Relative offsets (e.g., direct references to instance variables and instance methods are offsets)

(3) A handle that can be indirectly located to the target

A direct reference is related to the vm layout. The direct reference translated from the same symbol reference on different VM instances may not be the same. If there is a direct reference, the target of the reference must already be loaded into memory.

4. Local method stack

For a running Java program, it may also use some data areas related to local methods. When a thread calls a local method, it enters a whole new world that is no longer constrained by the virtual machine. A local method can access the virtual machine’s runtime data area through the local method interface, but more than that, it can do whatever it wants.

Native methods are by nature implementation-dependent, and the designers of virtual machine implementations are free to decide what mechanism to use to make Java programs call native methods.

Any local method interface will use some kind of local method stack. When a thread calls a Java method, the virtual machine creates a new stack frame and pushes it onto the Java stack. However, when it calls a local method, the virtual machine keeps the Java stack unchanged and no longer pushes new frames into the thread’s Java stack. The virtual machine simply connects dynamically and calls the specified local method directly.

If a virtual machine implements a local method interface that uses the C connection model, its local method stack is the C stack. When a C program calls a C function, the stack operation is deterministic. The arguments passed to the function are pushed in a certain order, and its return value is passed back to the caller in a certain manner. Again, this is the behavior of the local method stack in the virtual machine implementation.

It is likely that the local method interface will need to call back Java methods in the Java virtual machine, in which case the thread will save the state of the local method stack and go to another Java stack.

The following diagram depicts a scenario where a thread calls a local method, which in turn calls back to another Java method in the virtual machine.

This diagram shows a panoramic view of threads running inside the JAVA Virtual machine. A thread might execute Java methods and manipulate its Java stack throughout its life cycle; Or it might jump seamlessly between the Java stack and the local method stack.

The thread first calls two Java methods, and the second Java method calls a local method, causing the virtual machine to use a local method stack. Suppose this is a C language stack with two C functions in between. The first C function is called as a local method by a second Java method, which in turn calls the second C function. The second C function then calls back a Java method (the third Java method) through the local method interface, which eventually calls a Java method (which becomes the current method in the figure).

Navtive method is Java directly calls the Native C/C++ library through JNI, which can be considered as a Native method equivalent to an interface exposed to Java by C/C++. Java calls the C/C++ method by calling this interface. When a thread calls a Java method, the virtual machine creates a stack frame and pushes it onto the Java virtual machine stack. However, when it calls a Native method, the virtual machine keeps the Java virtual machine stack unchanged and does not push a new stack frame into the Java virtual machine stack. The virtual machine simply dynamically connects and directly calls the specified Native method.

5. Program counter

A program counter is a line number indicator that records the bytecode executed by the current thread.

Bytecode compiled by JAVA code is interpreted by a bytecode interpreter until it is JIT compiled. The simple way it works is that the interpreter reads the bytecode loaded into memory and reads the bytecode instructions in sequence. After reading an instruction, the instruction is “translated” into fixed operations, and branches, loops, jumps and other processes are carried out according to these operations.

From the above description, it may arise that the program counter is redundant. Because the order of execution along the instructions, even if it is a process such as branch, jump to the specified instructions and continue to execute in order is completely able to ensure the execution order of the program. Assuming that the program always has only one thread, there is nothing wrong with this question, that is, no program counter is required. But the program is actually executed cooperatively by multiple threads.

First we need to understand how the JVM implements multithreading. JVM multithreading is implemented using a CPU time slice rotation algorithm, where threads switch in turn and allocate processor execution time. That is, one thread may be suspended during execution because the time slice runs out, while another thread gets the time slice and starts executing. When a suspended thread retrieves the timeslice, it must know where it last executed in order to resume execution from where it was suspended. In the JVM, a program counter is used to record a thread’s bytecode execution position. Therefore, program counters are thread-isolated, that is, each thread has its own independent counter.

Program counter features

1. Thread isolation. Each thread has its own independent counter when working. 2. When a Java method is executed, the program counter is valued and records the address of the bytecode instruction being executed (see the description in the previous section). 3. When executing native methods, the program counter value is null (Undefined). Because Native method is Java directly calls the local C/C++ library through JNI, it can be approximately considered that native method is equivalent to an interface exposed to Java by C/C++, and Java calls C/C++ methods by calling this interface. Since this method is implemented in C/C++ rather than Java. The bytecode is naturally not generated, and the memory allocation for C/C++ execution is determined by the language itself, not by the JVM.

4. The program counter footprint is so small that it can be ignored during JVM memory calculations.

5. Program counters, the only area that does not have any OutofMemoryErrors specified in the Java Virtual Machine specification.

6. Thread stack

Thread stack, also known as thread call stack, is a snapshot of thread (including lock) status in a virtual machine, that is, the running status of all threads in the system at a certain time, including the call stack of each thread, lock holding situation. Although the printed format varies from virtual machine to virtual machine, the thread stack information contains:

Thread name, ID, number of threads, etc.

2, the running state of the thread, the lock state (which thread holds the lock, which thread is waiting for the lock, etc.)

The call stack contains the complete class name, the method executed, and the number of lines of the source code.

Because thread stacks are instantaneous snapshots of thread states and call relationships, stack information can be used to analyze issues such as thread deadlocks, lock contention, endless loops, identifying time-consuming operations, and so on. Thread stack is instantaneous record, so there is no backtracking of historical messages. Generally, we need to track the program log. Generally, thread stack can analyze the following performance problems:

1. The CPU of the system is too high for no reason

2. The system hangs and does not respond

3. The system is running slower and slower

4. Performance bottlenecks (e.g., insufficient CPU utilization)

5. Thread deadlocks, loops, etc

6. Memory overflow due to too many threads (e.g. unable to create threads, etc.)

Thread stack state

There are several thread stack states

1, the NEW

2, a RUNNABLE

3, BLOCKED

4, WAITING

5, TIMED_WAITING

6, TERMINATED

The following describes the six thread stack states in turn.

1, the NEW

The thread has just been created, that is, new, but has not called the start() method. This state is almost invisible when we use jStack to do thread stack dump, because it is the state when the thread was created.

2, a RUNNABLE

From the point of view of the VIRTUAL machine, the thread is in the running state, which means that the thread is running normally. Of course, there may be some time-consuming operation /I/O wait /CPU time slice switch, etc. In this state, the wait is usually other system resources, but not locks, sleeps, etc.

A thread in the RUNNABLE state is not necessarily going to consume CPU, not necessarily. Like socket IO, the thread is reading data from the network. Even though the thread is RUNNABLE, the network IO is actually suspended most of the time, and only wakes up when the data arrives. Suspension occurs in native code, and the VIRTUAL machine is not consistent at all. Unlike explicitly calling sleep and wait methods, the virtual machine can only know the true state of the thread. However, when suspending in native code, the virtual machine cannot know the true state of the thread, so it is displayed as RUNNABLE.

3, BLOCKED

The thread is blocked and is waiting for a Monitor Lock. Usually, this is because this thread shares a lock with another thread. The lock is being used by other concurrent threads to access a synchronized method block or method, and this thread also needs the lock to access the synchronized code block, resulting in the local thread blocking.

Real life examples:

You have an interview with Ali today. It’s your dream job, and you’ve been eyeing it for years. You get up in the morning, get ready, put on your best coat, and do it in front of the mirror. You walk into the garage to find your friend has taken the car. In this scenario, you only have one car, so what do you do? In real life, there might be fights over cars. And now you’re BLOCKED because your friend took the car. You can’t go to the interview.

This is BLOCKED. In technical terms, you are thread T1, your friend is thread T2, and the lock is the car. T1 is BLOCKED on the lock (the car in the example) because T2 has already acquired the lock.

4, WAITING

When a thread holds a lock, it calls its wait method and waits for notify/notifyAll to be called by another thread/lock owner. The thread is BLOCKED and WATING. One is WAITING for entry outside the critical point, the other is WAITING for notify in the understanding point. When the thread calls the join method and joins another thread, it will also enter the WAITING state, WAITING for the completion of the join thread. A thread in waiting consumes almost no CPU.

Real life examples:

Watch a few minutes later as your friend drives home, the lock is released, and now you realize it’s almost time for your interview, but it’s a long drive away. So you hit the gas as hard as you can. The speed limit is 120KM/H and you’re going 160KM/H. Unfortunately, a traffic cop caught you speeding and pulled you over. Now you’re in WAITING. You stop your car and you sit there and you wait for a cop to come and check the ticket and let you through. Basically, you just have to wait for him to let you go. You’re stuck in WAITING.

In technical terms, you are thread T1 and the traffic cop is thread T2. You release your lock (in this case you stopped the car) and enter WAITING until the policeman (in this case T2) lets you go and you are stuck in WAITING.

5, TIMED_WAITING

The thread is waiting by using the sleep, wait, Join, or Park methods. WAITING is a thread WAITING for a specified amount of time. WAITING can be resolved by time or external changes.

Real life examples:

Despite the drama of the interview process, you did a great job, amazed everyone and landed the high-paying job. You go home and tell your neighbors about your new job and express your excitement. Your friend tells you that he works in the same building. He suggested that you take his car to work. That’s good, you think. So on your first day at Ali’s, you walk to your neighbor’s house and park your car in front of his house. You wait for him for 10 minutes, but your neighbor doesn’t show up. You then continue to drive to work in your car so that you are not late on the first day. This is TIMED_WAITING.

In technical terms, you are thread T1 and your neighbor is thread T2. You release the lock (in this case, stop driving) and wait a good 10 minutes. If your neighbor T2 doesn’t show up, you keep driving (old drivers watch their speed, other passengers remember to buy tickets).

6, TERMINATED

A thread terminates, and we rarely see a thread stack in this state when we use jStack for thread dump.

1. Local variation scale

Local Variable Table ** is a set of Variable value storage space used to store method parameters and Local variables defined within a method. The capacity of a local Variable table is based on Variable slots. The Java Virtual Machine specification does not define the memory space that a Slot should occupy, but specifies that a Slot should be able to hold up to 32 bits of data type.

When the Java program is compiled as a Class file, the max_locals data item in the Code attribute of the method determines the maximum size of the local variable table that the method needs to allocate. (Maximum number of slots)

A local variable can hold data of types Boolean, byte, CHAR, short, int, float, Reference, and returnAddress. The reference type represents a reference to an object instance. The returnAddress type, which serves JSR, JSR_W, and RET directives, is rarely used anymore.

The VM searches for local variables by indexing them. The index ranges from 0 to the maximum capacity of the table. If Slot is 32-bit, a variable of 64-bit data type (such as long or double) is encountered, and two consecutive slots are used to store it.

2. Operand stack

** Operand Stack ** also known as operation Stack, it is a LIFO Stack. As with the local variable table, the maximum depth of the operand stack is written to the max_stacks data item in the Code attribute of the method at compile time.

Each element of the operand stack can be any Java data type, with 32-bit data types accounting for one stack and 64-bit data types accounting for two stacks, and the depth of the operand stack cannot exceed the maximum set in max_stacks at any point in the method execution.

When a method just starts, its operand stack is empty, as the method of execution and bytecode instruction execution, from local variables or object instances in the field of reproduction constant or variable written to the operand stack, again with the calculation of the elements in the stack to the local variables or method is returned to the caller, that is out of/into the stack operation. A complete method execution usually consists of several such out/on/off processes.

3. Dynamic connection

In a class file, for a method to call other methods, symbolic references to those methods need to be converted into direct references to their memory addresses, and symbolic references exist in the run-time constant pool in the method area.

In the Java virtual machine stack, each stack frame contains a symbolic reference to the method that the stack belongs to in the runtime constant pool. This reference is held to support Dynamic Linking during method invocation.

Some of these symbolic references are converted directly to direct references during class loading or the first time they are used, which is called static resolution. The other part is converted to a direct reference during each run, which is called a dynamic join.

4. Static links

The static linking process already links the content to the generated executable file, even if you are trying to delete the static library does not affect the execution of the executable; The dynamic linking process does not link content in. Instead, it searches for the linked content during execution. The resulting executable file does not contain the linked content, so when you delete the dynamic library, the executable program will not run.

Statically linked libraries link the code used in the (lib) file directly to the target program, and the program runs without the need for other library files. Dynamic linking is to link the file module (DLL) where the function is called and the location of the function in the file into the target program, and then look for the corresponding function code from the DLL when the program runs, so the support of the corresponding DLL file is needed.

This article introduces the concepts shown below. In the next article I will string together these concepts, such as the process of creating objects and how memory space works. Thank you for your continued attention.

The art of the JVM, Class loader Part 1, is complete

The art of the JVM, Class loader Part 2, is complete

The Art of the JVM – Class loader Part 3 is complete

The Art of the JVM — JAVA Memory Management (Part 4)(Current article)

In addition, I have written a series of introductions to the JVM in my public account. For more information, please follow my public account: Geek Time

The Art of the JVM — JAVA Memory Management (Part 4)

The introduction

The relationship between JDK, JRE, and JVM

The cross-platform nature of the Java language

The location of the JVM

The Class bytecode

JVM underlying architecture diagram

Runtime data area:

1, the heap

2. Method area (meta-space)

3. JVM stack space

1. Symbolic References:

2. Direct quotes:

4. Local method stack

5. Program counter

Program counter features

6. Thread stack

Thread stack state

1, the NEW

2, a RUNNABLE

3, BLOCKED

4, WAITING

5, TIMED_WAITING

6, TERMINATED

1. Local variation scale

2. Operand stack

3. Dynamic connection

4. Static links

Related Posts

SynchronousQueue, LinkedTransferQueue

The most detailed Hadoop environment setup ever (Part 2)

Getting started with Go is simple: Go from Goroutine to concurrency