The JVM is a must-have skill for both Java programmers and big data developers. It is often asked in interviews, but also in the real business of the application tuning, troubleshooting such as memory overflow, stack overflow, memory leaks and other problems. I will elaborate on the JVM in several articles as follows:

This article focuses on JVM memory management, direct memory, garbage collection, and common garbage collection algorithms:

Runtime data area

When the JVM executes programs that run on the JVM, typically Java or Scala programs, it divides the memory it manages into different data areas. These regions have various roles, creation times, and destruction times. Some regions have a lifecycle that depends on the start and end of the user thread, while others exist with the start of the virtual machine. The following figure shows how the JVM divides data regions at runtime:

1. The method of area

A method area is an area of memory shared by threads that holds things that “never change”, such as constants defined by final, class information (class instances), static variables, and method information. Because these things rarely get GC once they are loaded, the method section is also called the permanent generation (note that the two are not equivalent in nature).

A section of the method area is called the constant pool, which is used to store compile-time literal variables, symbolic references, and run-time constants (such as the String constant pool). The static section of the method area is used to hold class variables, static blocks, and so on.

The method area, also known as the non-heap, is limited in size. If the method area uses more memory than allocated, an error like OutOfMemory: PermGen Space is reported.

2. Java VM stack

The Java virtual machine stack is thread-private and has the same life cycle as a thread. It is the memory model that describes the execution of Java methods, called bytecode services, for the virtual machine.

Each method execution creates a stack frame to store local variables (such as basic data types known at compile time, object references, etc.), operation stacks, dynamic links, method exits, etc. Each method is called to the completion of the execution process, corresponding to a stack frame in the virtual machine stack from the stack to the stack process.

If the thread requests a stack depth greater than the virtual machine allows, a StackOverFlowError is reported. OutOfMemoryError is reported if the vm stack cannot allocate enough memory.

You can resize the VM stack by running the -xss command.

3. Local method stack

The Native method stack is used to service the Native method. The Native method interface will use some kind of Native method stack.

When a thread calls a Java method, the virtual machine creates a new stack frame and pushes it onto the Java stack. However, when it calls a local method, the virtual machine keeps the Java stack unchanged and does not push a new stack frame into the thread’s Java stack. Instead, it dynamically connects and calls the specified local method directly.

4. The heap

The heap is the largest area of JVM managed memory, shared by Java threads, used to store new objects and arrays, and is created with virtual machine startup. The heap can be in a logically contiguous but physically discontiguous memory space.

The heap is the main area managed by garbage collector. It can be divided into new generation and old generation, and the new generation can be divided into Eden zone, from survivor zone, and to survivor zone.

When objects are created, they are first allocated in the new generation. The Eden area stores the newly generated objects, and the two survivor areas are used to store the objects that still survive each garbage collection in the new generation. But when the newly created object is very large, the object will go straight to the old age.

5. Program counter

The program counter is thread private, that is, each thread has its own program counter, which is used to record the byte code position executed by the thread. It is an area without OOM.

Direct memory

Direct memory is not part of the JVM runtime data area, it is off-heap memory and is frequently used, so it is easy to throw OutofMemoryErrors if you do not set up physical memory for each memory range.

Garbage collection

Garbage collection, or GC, is the process by which the JVM collects memory.

Developers are more concerned with the implementation of business requirements, and memory management is left to the JVM, which can lead to program instability and even crash if garbage collection is not done or done incorrectly. The GC function provided by Java can automatically monitor whether objects are out of scope, so as to achieve the purpose of automatic memory reclamation, which can effectively prevent memory leaks and effectively use the available memory.

There are three main types of GC: Minor, Major, and Full.

Minor GC occurs in the new generation and major GC occurs in the old generation. For full GC, there are many reasons to start. For example, due to insufficient space in the old era, it will start stop world. If it is not handled properly, the stability of the whole program will be affected and the system will be unavailable, which requires special attention.

Common garbage collection algorithms

1. Mark clearing algorithm

All marked objects are marked first, and all marked objects are recycled after marking is complete. There are two disadvantages as follows:

1. The efficiency is low

The objects to be reclaimed need to be marked first and then uniformly cleared. However, both the marking and clearing processes are inefficient.

2. Memory fragmentation problem

A large number of discontinuous memory fragments will be generated after mark clearing. Too much space fragment may cause that when large objects need to be allocated in the running process of the program, sufficient continuous memory cannot be found and another garbage collection action has to be triggered in advance, affecting the performance.

2. Replication algorithm

First, divide the available memory into two pieces of equal size and use only one piece at a time. When a block of memory is used up, surviving objects are copied to another block, and the used memory space is cleaned up again.

Advantages: In this way, the whole half of the memory is reclaimed every time, and there is no need to consider the complex situation of memory fragmentation when allocating memory. As long as the heap top pointer is moved, the memory can be allocated in order, simple implementation and efficient operation.

Disadvantages: Not suitable for scenarios with high object survival rate, because this scenario requires a lot of replication operations, affecting efficiency; The actual available memory becomes half of the allocated memory because only half of it is used at a time.

3. Mark sorting algorithm

Mark (the marking process is the same as the mark clearing algorithm), moving all surviving objects towards one end, and then directly cleaning up memory beyond the end boundary. This will solve the memory fragmentation problem.

4. Generational collection algorithm

Different garbage collection algorithms are used for new generation and old generation of Java heap memory. If only a few objects survive in the new generation (eventually entering the old age), then the replication algorithm is suitable. However, in the old age, the object has a higher survival rate and there is no extra space to allocate it, so the marker clearing algorithm is used.

In practice, of course, the algorithm used depends on the garbage collector used.


Follow wechat official account: Learn and share big data, get more technical dry goods