1. Relationship between JVM and operating system

Java Virtual Machine

The FULL name of JVM is Java Virtual Machine, also known as Java Virtual Machine. It recognizes files with the.class suffix and can parse its instructions, eventually calling functions on the operating system to do what we want.

translation

Unlike Java programs, after compiling into.class files using Javac, you need to use Java commands to actively execute them. The operating system does not recognize these.class files. So the JVM is a translation.

As you can see from the figure, with the JVM as an abstraction layer, Java can be cross-platform. The JVM just needs to execute the.class file correctly to run on platforms such as Linux, Windows, and MacOS.

From cross-platform to cross-language

Cross-platform: The Person class we wrote runs the same on different operating systems (Linux, Windows, MacOS, etc.). This is the cross-platform nature of the JVM.

In order to realize the cross-platform type, different operating systems have different JDK version of www.oracle.com/java/techno…

Cross-language: The JVM only recognizes bytecodes, so the JVM is decoupled-that is, not directly related to the language. Instead of translating Java files, the JVM recognizes class files, which are commonly referred to as bytecodes. There are also languages like Groovy, Kotlin, Jruby, etc., which are actually compiled into bytecode and therefore run on the JVM. This is the cross-language feature of the JVM.

2. Relationship between JVM, JRE, and JDK

The JVM is just a translation that translates a Class into machine-aware code, but note that the JVM does not generate code itself, it has to be written by other people, and many dependent libraries are required. In this case, JRE is needed.

What is JRE? In addition to containing the JVM, it provides a number of libraries (jar packages, which provide plug-and-play functions such as reading or manipulating files, connecting to the network, using I/O, etc.) that are the basic libraries provided by the JRE. The JVM standard, along with a bunch of base libraries implemented, makes up the Java Runtime Environment, commonly known as the JRE (Java Runtime Environment).

But for programmers, JRE is not enough. When I write it, I compile it, I debug it, I package it, and sometimes I decompile it. That’s why we use the JDK, because the JDK also provides some very useful gadgets, such as Javac (compiled code), Java, JAR (packaged code), Javap (decomcompiled < disassembled >), etc. This is the JDK.

Specific can be downloaded documents can go through the official website: www.oracle.com/java/techno…

3. The JVM as a whole

A Java program, first compiled by javac into a.class file, then loaded by the JVM into the method area, will be executed by the execution engine. When executed, it is translated into operating system-specific functions. The JVM exists as a translation of.class files, entering bytecode and calling operating system functions.

The process is as follows: Java file -> Compiler > bytecode ->JVM-> machine code.

Explain execution with GIT:

When we say JVM, we mean HotSpot in a narrow sense (because there are many versions of JVM, but HotSpot is the most used). We follow HotSpot unless otherwise specified. Java is cross-platform because of the JVM. Java bytecode is a bridge between the Java language and the JVM, as well as between the JVM and the operating system.

4. Run time data area

Java prides itself on its automatic memory management mechanism. Java programs are much easier to write than C++ ‘s manual memory management, complex Pointers, and so on.

In Java, JVM memory is divided into the heap, program counters, method areas, virtual machine stacks, and local method stacks.

Program counter

Small memory space, line number indicator of bytecode executed by the current thread; Each thread is stored independently and does not affect each other.

A program counter is a small memory space that is used to record the addresses of bytecodes executed by individual threads. For example, branches, loops, jumps, exceptions, thread recovery, and so on all depend on the counter.

Because Java is a multithreaded language, when the number of threads executing exceeds the number of CPU cores, threads compete for CPU resources based on time slice polling. If a thread runs out of time or is robbed of CPU resources prematurely for other reasons, the exiting thread needs a separate program counter to record a running instruction.

The program counter is also the only memory area in the JVM that is not OOM(OutOfMemory)

The virtual machine stack

What data structure is a stack? FILO data structures,

The virtual machine stack stores data, instructions, and return addresses required by the current thread’s execution method while the JVM is running.

The Java virtual machine stack is thread-based. Even if you only have a main() method, it runs as a thread. In the life cycle of a thread, the data involved in the calculation is frequently pushed on and off the stack, and the life cycle of a stack is the same as that of a thread.

Each piece of data in the stack is a stack frame. Each time a Java method is called, a stack frame is created and merged into the stack. Once the corresponding call has been made, the stack is removed. When all frames are off the stack, the thread terminates.

The stack size is 1 MB by default. You can adjust the stack size by using -xss, for example, -xss256K

Each stack frame contains four areas :(local variable table, operand stack, dynamic link, return address)

  • Local variables table: as the name implies is a table of local variables, used to store our local variables. First of all, it is a 32 bit length, main store our Java eight basic data types, generally under 32 bits can be stored, if it is 64 the use of high and low bits occupy two can also be stored, if it is local objects, such as our Object, we only need to store it a reference address.
  • Operation data stack: The operand stack, which holds the operands that our method executes, is a stack, a first-in, last-out stack, and the operand stack, which is used to operate on, can be any Java data type, so we know that when a method starts, the operand stack is empty, The operand stack run method is where the JVM runs on/off the stack all the time
  • Dynamic wiring: Java language feature polymorphism (requires a class runtime to determine specific methods).
  • Return address: return normally (address in the calling program counter as return), exception (determined by the exception handler table < in the non-stack frame >)

Effect of stack frame execution on memory area

Mnemonic explain address: bytecode cloud.tencent.com/developer/a…

In the JVM, interpretation based execution is a stack-based engine, the operand stack.

Local method stack

The local method stack is similar to the Java virtual machine stack, which is used to manage calls to Java functions, and the local method stack, which is used to manage calls to local methods. But native methods are not implemented in Java, they are implemented in C.

The native method stack is an area very similar to the virtual machine stack that serves native methods. You can even think of the virtual machine stack and the local method stack as the same area.

It is not mandated by the VIRTUAL machine specification and can be implemented on all versions of the virtual machine. HotSpot directly blends the local method stack with the virtual machine stack.

Areas shared by threads: method go, heap

Method area/permanent generation

Many developers tend to refer to method areas as “permanent generations”, but the two are not equivalent.

The HotSpot VIRTUAL machine uses persistent generation to implement method areas, but other virtual machines, such as Oracle’s JRockit and IBM’s J9, do not have persistent generation. Therefore, the method area is only part of the specification in the JVM, so to speak, in the HotSpot VIRTUAL machine the designers implemented the method area of the JVM specification using persistent generation.

The method area is used to store class-related information that has been loaded by virtual machines, including class information, static variables, constants, runtime constant pool, and string constant pool.

When the JVM executes a class, it must be loaded first. When a class is loaded (loading, validating, preparing, parsing, and initializing), the JVM first loads a class file. The class file contains information about the class version, fields, methods, and interfaces, as well as the Constant Pool Table. Used to hold various literal and symbolic references generated during compilation.

Literals include strings (String a= “b”), constants of primitive types (final modified variables), and symbolic references include fully qualified names of classes and methods (for example, the String class, Its fully qualified name is Java/lang/String), the field name and descriptor, and the method name and descriptor.

When a class is loaded into memory, the JVM stores the contents of the class file’s constant pool into the runtime constant pool. During the parsing phase, the JVM replaces symbolic references with direct references (the index value of the object).

For example, a string constant in a class file is stored in the class file constant pool. After the JVM loads the class, the JVM puts the string constant into the runtime constant pool and, during the parsing phase, specifies the index value of the string object. The runtime constant pool is shared globally. Multiple classes share the same runtime constant pool. Only one copy of the same string in the constant pool will exist in the class file.

The method area, like the heap space, is also a shared memory area, so the method area is shared by threads. If two threads are trying to access the same class information in the method area, and the class has not yet been loaded into the JVM, only one thread is allowed to load it, and the other thread must wait. In the HotSpot virtual machine, the pool of static variables and runtime constants for the persistent generation has been moved to the heap in the Java7 release, the rest is stored in the JVM’s non-heap memory, and the Java8 release has removed the persistent generation implemented in the method area. Class metadata is used to replace the previous persistent generation, and metadata is stored locally

Meta space size parameters:

  • Jdk1.7 and earlier (initial and maximum) : -xx :PermSize; – XX: MaxPermSize;
  • After jdk1.8 (initial and maximum) : -xx :MetaspaceSize; -XX:MaxMetaspaceSize
  • After JDk1.8, the size is only limited by the total native memory (if no parameter is set)

The JVM parameter reference: docs.oracle.com/javase/8/do…

Why does Java8 use meta-spaces instead of persistent generations, and what are the benefits of doing so?

The official explanation:

The removal of persistent generations is an effort to merge HotSpot JVM with JRockit VM, as JRockit does not have persistent generations and therefore does not need to be configured.

Memory often enough or permanent generation of memory, an exception is thrown. Java lang. OutOfMemoryError: PermGen. This is because in JDK1.7, the size of the specified PermGen area is 8M. Since metadata information of classes in PermGen may be collected in each FullGC, the recovery rate is always low, and the results are hardly satisfactory. Also, how much space to allocate for PermGen is difficult to determine. The size of PermSize depends on many factors, such as the total number of classes loaded by the JVM, the size of the constant pool, and the size of the methods.

The heap

The heap is the largest area of memory on the JVM, where almost all of the objects we request are stored. When we say garbage collection, the object of operation is the heap.

Heap space is usually claimed at startup, but not always used.

As objects are created frequently, the heap space becomes more and more occupied, and objects that are no longer in use need to be irregularly reclaimed. This is called Garbage Collection (GC) in Java.

When an object is created, is it allocated on the heap or on the stack? This has to do with two things: the type of the object and its location in the Java class.

Java objects can be divided into basic data types and ordinary objects.

For normal objects, the JVM creates the object on the heap first, and then uses references to it elsewhere. For example, store this reference in a local variable table in the virtual machine stack.

For basic data types (byte, short, int, long, float, double, char), there are two cases. When you declare an object of primitive data type in the method body, it is allocated directly on the stack. In other cases, it’s all on the heap.

Heap size parameter:

parameter explain
-Xms The minimum value of the heap
-Xmx Maximum heap size
-Xmn Cenozoic size
-XX:NewSize Cenozoic minimum
-XX:MaxNewSize Generation maximum

For example – Xmx256m

5. Direct memory

It is not part of the virtual machine run-time data area, nor is it an area of memory as defined in the Java Virtual Machine specification;

  • If NIO is used, this area is frequently used and can be directly referenced and manipulated within the Java heap using directByteBuffer objects;
  • This memory is not limited by the Java heap size, but is limited by the total native memory, which can be set by -xx :MaxDirectMemorySize (the default is the same as the maximum heap memory), so an OOM exception will occur.

6. Understand the run-time data area from the ground up

Start the HSDB tool

When JHSDB is started in Jdk1.8, sawindbg. DLL must be copied to the jre in the corresponding directory

C: \ Program Files \ Java \ jdk1.8.0 _101 \ lib

Execute Java -cp.\ sa-jdi.jar sun.jvm.hotspot.hsdb

When we run the above code through Java, the entire processing of the JVM is as follows:

  1. The JVM requests memory from the operating system. The first step for the JVM is to request memory from the operating system using configuration parameters or default configuration parameters.
  2. Once the JVM has acquired memory, it allocates heap, stack, and method area memory sizes based on configuration parameters.
  3. After completing the previous step, the JVM first executes the constructor, and the compiler collects the initialization code for all classes, including static variable assignment statements, static code blocks, static methods, and static variables and constants into the method area, when the.java file is compiled into a.class file
  4. Execution method. Start the main thread, execute the main method, and start executing the first line of code. A Teacher object is created in the heap, and the object reference student is stored on the stack.

When performing other methods, specific operations: stack frame execution effect on memory area. Effect of stack frame execution on memory area

Differentiate between heap and stack in depth

function

  • In the way of stack frame storage method call process, and storage method call process of basic data type variables (int, short, long, byte, float, double, Boolean, char, etc.) and object reference variables, its memory allocation on the stack, variables out of scope will be automatically released;
  • Heap memory is used to store objects in Java. Whether a member variable, a local variable, or a class variable, the object they point to is stored in heap memory;

Thread exclusive or shared

  • The stack memory belongs to a single thread, and each thread has a stack memory. The variables stored in the stack memory can only be seen in its owning thread, that is, the stack memory can be understood as the private memory of the thread.
  • Objects in heap memory are visible to all threads. Objects in heap memory can be accessed by all threads.

The size

  • The stack memory is much smaller than the heap memory, and the stack depth is limited, which can cause StackOverFlowError problems.

7. Memory overflow

Stack overflow

Parameters: – Xss1m, specific defaults need to check the website: docs.oracle.com/javase/8/do…

The HotSpot version has a fixed stack size and does not support extension.

Java. Lang. StackOverflowError method invocation is difficult to appear commonly, if appear may be infinite recursion.

The takeaway from the virtual stack is that the execution of a method is inherently slower than a loop that implements the same function, so both recursion and non-recursion (loop implementation) make sense in tree traversal. Recursive code is concise, while non-recursive code is complex but fast.

OutOfMemoryError: The machine does not have enough memory because threads are being created and the JVM is requesting stack memory. (Generally can not demonstrate, demonstrate the machine also died)

Stack overflow

Memory overflow: Request memory space, exceeds the maximum heap memory space.

If there is a memory overflow, increase the -xms, -xmx parameters.

If it is not a memory leak, that is, the object in the memory is must survive, so long you should check the JVM heap parameters Settings, compared with the memory of the machine, to see whether there is any can adjust the space, and out of the code to check whether there is a certain object life cycle is too long, hold time is too long, storage structure design is unreasonable, and so on and so forth, Minimize memory consumption while the program is running.

Method area overflow

  1. Runtime constant pool overflow
  2. The Class object saved in the method area was not reclaimed in time or the Class information took up more memory than we configured.

Note that the Class has a very strict condition when it can be recycled:

  1. All instances of the class have been reclaimed, that is, there are no instances of the class in the heap.
  2. The ClassLoader that loaded the class has been reclaimed.
  3. The java.lang.Class object corresponding to this Class is not referenced anywhere, and the methods of this Class cannot be accessed anywhere through reflection.

Code examples:

Cglib is a powerful, high-performance, high-quality code-generating library that extends Java classes and implements Java interfaces at run time.

At the bottom of the CGLIB package is the ability to transform bytecode and generate new classes by using ASM, a small and fast bytecode processing framework. In addition to the CGLIB package, scripting languages such as Groovy and BeanShell also use ASM to generate Java bytecode. Direct use of ASM is discouraged, of course, because it requires that you be familiar with the JVM’s internal structure, including the class file format and instruction set.

Native direct memory overflow

The direct memory size can be set using MaxDirectMemorySize (the default is the same as the maximum heap memory size), so an OOM exception will occur.

Memory overflow caused by direct memory overflow, a relatively obvious feature is that there is no obvious exception in the HeapDump file. If OOM occurs, and the Dump file is very small, you can consider to focus on the direct memory cause.

8. Virtual machine optimization technology

Compiler optimization techniques – method inlining

Method inlining optimizations simply “copy” the code of the target method into the calling method, avoiding the actual method call.

Stack optimization techniques – data sharing between stack frames

In the general model, the memory area of the two different stack frames is independent, but most JVMS implement optimizations such that the two stack frames overlap. (mainly embodied in the method of parameter passing), let the following frame of the operand stack and the stack frame part of the local variable overlap together, do not only save a part of the space, the more important when making method calls can be directly to the public part of the data, the parameters of the need for additional copy passed.

Use the HSDB tool to view the stack space as well.

9. Objects in the VM

Object allocation

When a virtual machine receives a new instruction, it first checks to see if it is loaded by the class loader. If not, it must perform the corresponding class loading procedure first.

Class loading is the process of loading classes into the JVM’s runtime data area (a special topic on class loading is covered later).

1. Check the loading

First check to see if the argument to the directive can locate a symbolic reference to a class in the constant pool (symbolic references: symbolic references describe the referenced object as a set of symbols), and check to see if the class has been loaded, parsed, and initialized.

2. Allocate memory

The virtual machine will then allocate memory for the new objects. The task of allocating space for an object is equivalent to dividing a certain size of memory from the Java heap.

Pointer to the collision

If the Java heap memory is absolutely neat, all used memory aside and free memory on the other side, there is a middle pointer as a cut-off point indicator, the allocated memory is just put the pointer to the free space there move a and the object is equal to the size of the distance, this way of distribution collision is called a pointer.

The free list

If memory in the Java heap is not neat, used memory and free memory cross each other, that is simply no way pointer collision, the virtual machine, you must maintain a list of records on which memory blocks are available, at the time of distribution from the list to find a large enough space division to the object instance, and update the list of records, This type of allocation is called a free list.

The choice of allocation method depends on whether the Java heap is clean, which in turn depends on whether the garbage collector used has collation capabilities.

For Serial, ParNew and other garbage collectors with compacted collation, the system uses pointer collision, which is simple and efficient.

If you are using a garbage collector like CMS without compression (collation), you can theoretically only use a more complex free list.

Concurrent security

Except how to divided space available, there is another issue to consider is the object creation in A virtual machine is very frequent behavior, even if just modify the position of A pointer is pointing to, in the case of concurrent is not thread-safe, is possible to allocate memory object A, pointer could change, Object B also uses the original pointer to allocate memory.

Mechanism of the CAS

There are two solutions to solve this problem. One is to synchronize the operation of allocating memory space. In fact, the VIRTUAL machine uses CAS and retry to ensure atomization of the update operation.

Allocate the buffer

In other words, each Thread allocates a small block of private memory in the Java heap, which is called the Thread Local Allocation Buffer (TLAB). In this way, each thread has an independent Buffer. If it needs to allocate memory, it can allocate it to its own Buffer. In this way, there is no competition and the allocation efficiency can be greatly improved. Then apply for a piece from Eden area to continue to use.

The goal of TLAB is to allow each Java application thread to use its own dedicated allocation pointer to allocate memory for new objects, reducing synchronization overhead.

TLAB only allows each thread to have a private allocation pointer, but the memory space underneath the object is still accessible to all threads, but other threads cannot allocate it in this area. When a TLAB is full, a new TLAB is applied.

Parameters:

-XX:+UseTLAB

Allows the use of thread-local allocation blocks (TLabs) in young generation Spaces. This option is enabled by default. To disable TLAB, specify -xx: -usetlab.

Docs.oracle.com/javase/8/do…

3. Initialize the memory space

After memory allocation is complete, the virtual machine needs to initialize all allocated memory space to zero (int 0, Boolean false, etc.). This step ensures that the instance fields of the object can be used in Java code without initial values, and the program can access the zero values corresponding to the data types of these fields.

4. Set the

Next, the virtual machine sets up the object as necessary, such as which class instance the object is, how to find the metadata information about the class (Java classes are represented as class metadata within Java hotspot VM), the object’s hash code, the object’s GC generation age, and so on. This information is stored in the object header of the object.

5. Initialize the object

After the above work is done, a new object has been created from the virtual machine’s point of view, but from the Java program’s point of view, object creation has just begun, with all fields still having zero values. So, in general, the new instruction is followed by the initialization (constructor) of the object as the programmer wishes, so that a usable object is fully generated.

Object memory layout

In the HotSpot virtual machine, the layout of objects stored in memory can be divided into three areas: object headers, Instance Data, and alignment Padding.

The object header contains two parts of information. The first part is used to store the runtime data of the object itself, such as HashCode, GC generation age, lock status flag, thread held lock, bias thread ID, bias timestamp, etc.

The other part of the object header is the type pointer, the pointer to the object’s class metadata, which the virtual machine uses to determine which class the object is an instance of.

If the object is a Java array, there is also a piece of data in the object header that records the length of the array.

The third alignment padding does not necessarily exist and has no special meaning. It serves only as a placeholder. Due to HotSpot VM’s automatic memory management system, the object size must be an integer multiple of 8 bytes. When the rest of the object’s data is not aligned, it needs to be completed by alignment padding.

Object access location

Objects are created to use objects, and our Java program needs reference data on the stack to manipulate specific objects on the heap. At present, there are two main access methods: handle and direct pointer.

handle

If handle access is used, the Java heap will allocate a chunk of memory as a handle pool. Reference stores the handle address of the object, and the handle contains the specific address information of the instance data and the type data respectively.

Direct Pointers

If direct pointer access is used, the direct stored in Reference is the address of the object.

These two methods of object access have their own advantages. The biggest advantage of using handles for access is that reference stores a stable handle address, and only changes the instance data pointer in the handle when the object is moved (which is a very common behavior in garbage collection). Reference itself does not need to be modified.

The biggest benefit of using direct pointer access is that it is faster. It saves the time cost of a pointer location, and since objects are accessed very frequently in Java, this overhead can add up to a very significant execution cost.

For Sun HotSpot, it uses direct pointer access for object access.

10. Determine the survival of the object

Almost all object instances are stored in the heap, and the garbage collector needs to determine which of these objects are “alive” and which are “dead” (dead being objects that can no longer be used in any way) before collecting them.

Reference counting method

Adds a reference counter to the object, incrementing it every time a reference is made to it, and decrement it when the reference is invalid.

Python is used, but it is not used by mainstream VMS because objects refer to each other, which requires an additional mechanism to handle. This affects efficiency.

You can see in the code that only objects that keep references to each other are still recycled, indicating that reference counting is not used in the JVM.

Accessibility analysis

(Important points to remember during an interview)

To determine whether an object is alive or not. The basic idea of this algorithm is to search down from a series of objects called “GC Roots” as the starting point, and the search path is called the Reference Chain. When an object is not connected to GC Roots by any Reference Chain, it is proved that the object is unavailable.

Objects as GC Roots include the following:

  1. The object referenced in the virtual machine stack (the local variable table in the stack frame).
  2. The object referenced by the class static property in the method area.
  3. The object referenced by the constant in the method area.
  4. Objects referenced by JNI (commonly referred to as Native methods) in the Native method stack.
  5. Internal references to the JVM (class objects, exception objects NullPointException, OutofMemoryError, system classloader).
  6. All objects held by synchronized keys.
  7. Jmxbeans within the JVM, callbacks registered in JVMTI, native code caches, and so on
  8. “Ephemeral” objects in JVM implementations, objects referenced across generations (when recycling is only partially recycled using generational models)

All of the above are the conditions for the collection of objects and classes:

Note that the Class must satisfy the following conditions in order for it to be recycled:

  1. All instances of the class have been reclaimed, that is, there are no instances of the class in the heap.
  2. The ClassLoader that loaded the class has been reclaimed.
  3. The java.lang.Class object corresponding to this Class is not referenced anywhere, and the methods of this Class cannot be accessed anywhere through reflection.
  4. Parameter control:

There is also a deprecated constant, which is very similar to object recycling, for example: if a string “king” enters the constant pool.

The Finalize method

Even if the unreachable object is judged by the reachability analysis, it is not necessarily dead, and it will still be in the stage of “probation”. To truly declare an object dead, it needs to go through two marking processes, one is that no reference chain with GCRoots is found, and it will be marked for the first time. Then we do a filter (if the object overwrites Finalize), we can save in Finalize.

Code demo:

The results

As can be seen, objects can be saved once (Finalize executes first, but not second)

Let’s change the code. Let’s do it again.

The results

Objects are not saved, and finalize method is executed slowly. Garbage collector has already collected objects before finalize method is saved.

Therefore, we suggest you not to use Finalize as much as possible, because this method is too unreliable. In production, you can hardly control the execution of methods or the call order of objects, so we recommend forgetting finalize method! Because what can be done with Finalize methods, there are better ones in Java, such as try-finally or other ways to do it

11. Reference type of the object

Strong reference

Object obj = new Object() is a strong reference. In any case, the garbage collector will never reclaim the referenced object as long as there is a strong reference association (reachable to the root).

Soft reference SoftReference

Objects associated with soft references that are useful but not necessary are recycled just before the system runs out of memory (or is thrown out if there is still not enough space after the recycle). See code:

VM parameters -xMS10m -XMx10m -xx :+PrintGC

The results

For example, a program is used to process user-supplied images. If all images are read into memory, the images can be opened quickly, but the memory space is huge, and some less used images waste memory space and need to be manually removed from memory. If every time an image is opened, it is read into memory from the disk file and displayed again, although the memory footprint is small, some frequently used images need to access the disk every time they are opened, which is costly. At this point you can build the cache with soft references.

A weak reference WeakReference

Objects associated with weak references that are useful (to a lesser degree than soft references) but not necessary will only survive until the next garbage collection, when GC occurs and will be collected regardless of memory availability.

See the code:

Note: SoftReference SoftReference and WeakReference can be used to create less important data caches when memory resources are tight. When the system runs out of memory, the contents of the cache can be freed.

Practical application (WeakHashMap, ThreadLocal)

Phantom reference PhantomReference

Ghost references, weakest (can be recycled at any time)

A notification is received during garbage collection to monitor whether the garbage collector is working properly.

12. Learn the meaning of garbage collection

The biggest technical difference between Java and languages like C++ : automated garbage collection (GC)

Why do you need to know about GC and memory allocation policies

  1. Interview needs;
  2. GC has an impact on application performance;
  3. It’s good to write code

Stack: The life cycle in a stack follows threads, so it is generally not a concern

Heap: Objects in the heap are the focus of garbage collection

Method area/meta space: Garbage collection also occurs in this area, but this area is less efficient and generally not our focus

13. Object allocation policy

Object allocation rules

  • Objects are allocated in Eden first
  • Space allocation guarantee
  • Big object goes straight to the old age
  • Long-lived objects enter the old age
  • Dynamic object age determination

On the stack

There is no escape

That is, the object in the method does not escape.

Escape analysis principle: dynamic analysis object scope, when an object is defined in the method, it may be referenced external methods, such as: call arguments passed to the other methods, this method called escape, and even may be accessed by external thread, such as: assigned to other threads access variables, this is called a thread to escape.

From never escape to method escape to thread escape, this is called the different escape degrees of objects from low to high.

Having an object allocate memory on the stack can improve the efficiency of the JVM if it is certain that an object will not escape the thread.

Escape analysis code

public class EscapeAnalysisTest {
   public static void main(String[] args) throws Exception {
       long start = System.currentTimeMillis();
       for (int i = 0; i < 50000000; i++) {
           allocate();
      }
       System.out.println((System.currentTimeMillis() - start) + " ms");
       Thread.sleep(600000);
  }

   static void allocate(a) {
       MyObject myObject = new MyObject(2020.2020.6);
  }

   static class MyObject {
       int a;
       double b;

       MyObject(int a, double b) {
           this.a = a;
           this.b = b; }}}Copy the code

This code calls myboject as a global escape object that the JVM can allocate on the stack

Then observe the difference by turning DoEscapeAnalysis on and off.

Enable Escape analysis (JVM default)

View execution speed

Shut-off escape analysis

View execution speed

Test results show that enabling escape analysis has a significant impact on the performance of the code! So why does it have that effect?

Escape analysis

If the escape parsed object can be allocated on the stack, then the life of the object follows the thread, no garbage collection is required, and if this method is called frequently, there is a significant performance improvement.

After using escape analysis, objects satisfying escape are allocated on the stack

Without escape analysis enabled, objects are allocated on the heap, and garbage collection is frequently triggered (which affects system performance), causing code to run slowly

Code validation

Enable GC log printing: -xx :+PrintGC

Open escape analysis

You can see that there is no GC log

Shut-off escape analysis

As you can see, with escape analysis turned off, the JVM is doing a lot of garbage collection (GC), and it is this part of the operation that makes a big difference in performance.

Objects are allocated in Eden area first

Vm parameters:

-xms20m -XMx20m -xmn10m -xx :+PrintGCDetails -xx :+PrintGCDetails Prints garbage collection logs. The current memory allocation information is displayed when the program exitsCopy the code

Note: The Cenozoic began with size

In most cases, objects are allocated in the Eden region of the new generation. When the Eden area does not have enough space to allocate, the virtual machine will initiate a Minor GC.

Big object goes straight to the old age

-Xms20m
-Xmx20m
-Xmn10m
-XX:+PrintGCDetails
-XX:PretenureSizeThreshold=4m
-XX:+UseSerialGC
Copy the code

The PretenureSizeThreshold parameter is only valid for Serial and ParNew collectors.

The most typical large objects are long strings and arrays. The purpose of this method is: 1. Avoid massive memory replication,2. Avoid garbage collection in advance when there is space to allocate memory.

The long term survivor enters the old age zone

If the object survives after Eden is born and passes the first Minor GC and can be accommodated by Survivor, it is moved to Survivor space and the object age is set to 1. Each time the object survives a Minor GC in Survivor, its age increases by 1. When its age increases to a certain level (concurrent garbage collector defaults to 15) and CMS is 6, it is promoted to the old age.

– XX: MaxTenuringThreshold adjustment

Object age dynamic determination

In order to better adapt to the memory conditions of different programs, the virtual machine does not always require that the object age must reach MaxTenuringThreshold to advance to the old age. If the sum of all object sizes of the same age in the Survivor space is greater than half of the Survivor space, Objects older than or equal to this age can go directly to the old age without waiting until the age specified in MaxTenuringThreshold

Space allocation guarantee

Before Minor GC occurs, the virtual machine checks to see if the maximum available contiguous space of the old generation is greater than the total space of all objects of the new generation. If this condition is true, then Minor GC is guaranteed to be safe. If this is not true, the virtual machine checks the HandlePromotionFailure setting to see if the guarantee failure is allowed. If so, it continues to check whether the maximum available contiguous space of the old age is greater than the average size of the objects promoted to the old age. If so, a Minor GC is attempted, although this Minor GC is risky, and a Full GC is attempted if the guarantee fails. If less than, or if the HandlePromotionFailure setting does not allow risk, then do a Full GC instead.

Local Thread Allocation buffer (TLAB)

See section Allocating buffers for details

Garbage collection algorithm

The implementation of garbage collection algorithm is designed to a large number of program details, and each platform of virtual machines operate memory in a different way, so it is not necessary to understand the implementation of the algorithm, we focus on the generation collection theory and the idea of the three algorithms.

Generation collection theory

Garbage collectors for current commercial virtual machines are mostly designed according to the “generational collection” theory, which is described roughly as follows:

  1. The vast majority of the objects are life and death
  2. Objects that have survived multiple garbage collections are harder to recycle.

According to the above two theories, the objects of life and death are put in one area, and the objects that are difficult to recover are put in another area, which constitutes the Cenozoic era and the old age.

GC type

There are a lot of names for garbage recycling in the market. I have sorted them out in general:

  1. Minor GC/Young GC: Refers to the collection of only the new generation.
  2. Major GC (Old GC) : Refers to collecting only Old GC. Currently only the CMS garbage collector has this separate behavior of collecting older generations. The Major GC is a collection of the entire heap. It is a collection of the entire heap.
  3. Full GC: Collects the entire Java heap and method area (note the inclusion of the method area)

Copying algorithms

Divide the available memory into two equally sized pieces by capacity and use only one piece at a time. When this area of memory is used up, the surviving objects are copied to the other area, and the used memory space is cleaned up again. In this way, the memory is reclaimed for the whole half area every time, and the complicated situation of memory fragmentation is not considered when allocating memory. As long as the memory is allocated in order, the implementation is simple and the operation is efficient. But this algorithm comes at the cost of reducing memory by half.

Note: memory moves must be real moves (copies) and cannot be played with Pointers.

The copy-recovery algorithm is suitable for the new generation, because most objects die overnight, so there are fewer objects copied in the past, and the efficiency is naturally high. The one-time cleaning of the other half is fast.

The characteristics of

  • Simple implementation and efficient operation
  • Memory replication, no memory fragmentation
  • Utilization rate is only half

Appel type recycling

A more optimized replication recycle generation strategy: allocate a large Eden region and two small Survivor Spaces (you can call them From or To, or Survivor1 and Survivor2)

Special studies have shown that 98% of the objects in the new generation are “live and die”, so it is not necessary to divide the memory space according to the 1:1 ratio, but to divide the memory into a large Eden space and two small Survivor Spaces, and use Eden and one Survivor each time [1]. When recycling is done, the surviving objects in Eden and Survivor are copied to another Survivor space once and for all, and Eden and the Survivor space that was just used are cleaned up.

By default, the HotSpot VIRTUAL machine has an 8:1 ratio of Eden to Survivor, which means that each new generation has 90% (80%+10%) of its available memory, and only 10% of its memory is “wasted”. Of course, 98% of the objects can be collected only in the general scenario. There is no way to ensure that no more than 10% of the objects can survive each collection. When Survivor space is insufficient, we need to rely on other memory (here refers to the old years) for allocation guarantee (Handle Promotion).

Mark-sweep algorithm

The algorithm is divided into “mark” and “clear” two stages: first, mark all the objects that need to be recycled, after the completion of marking all the marked objects.

The recovery efficiency is not stable. If most objects are dead overnight, the recovery efficiency is reduced, because a large number of marked objects and recovered objects are needed, and the recovery efficiency is very low compared to copy.

The main problem of insufficient space is that a large number of discontinuous memory fragments will be generated after the mark is cleared. Too much space fragment may cause that when the program needs to allocate large objects in the future, it cannot find enough contiguous memory and has to trigger another garbage collection action in advance.

The more objects need to be recovered, the more work needs to be done to mark and clear, so the mark clearing algorithm is suitable for the old age. The copy-recovery algorithm is suitable for the new generation.

Features:

  • Execution efficiency is unstable
  • Memory fragmentation causes premature GC

Mark-compact algorithm

The first step is to mark all objects that need to be reclaimed. After the mark is done, the next step is not to clean up the reclaimed objects directly, but to move all surviving objects towards one end and then clean up the memory directly beyond the end boundary. Although there is no memory fragmentation, the efficiency of tag defragmentation algorithm is low.

We see that the difference between tag cleaning and tag clearing algorithms is mainly the movement of objects. Not only does object movement burden the system, it also requires suspending the user thread throughout, and all references to the object need to be updated.

Features:

Therefore, it can be seen that the old tag sorting algorithm and tag clearing algorithm have their own advantages and disadvantages.

15. Garbage collectors common in JVMS

The idea of generational collection

In the new generation, a large number of objects are found dead and only a few survive in garbage collection, so the replication algorithm is selected, and only a small amount of the replication cost of the surviving objects can be collected.

In the old days, because the object has a high survival rate and there is no extra space to allocate it, it has to use the “mark-clean” or “mark-tidy” algorithm for recycling.

Remember the connection between garbage collector and. Specific see website JVM parameters: docs.oracle.com/javase/8/do…

Parallelism: Garbage collection of multiple threads at the same time.

Concurrency: Garbage collection multithreading and application multithreading at the same time.

Note: Throughput = run user code time /(Run user code time + garbage collection time)

Garbage collection time = Garbage collection frequency x Single garbage collection time

16. Schematic diagram of garbage collector working

Serial/Serial Old

The oldest, single-threaded, exclusive, mature, suitable for single-CPU servers

-xx :+UseSerialGC Serial collector -xx :+UseParNewGC Serial Old :+UseParallelGC Serial Old :+ useparallergc Serial Old :+UseParallelGC Serial Old :+UseParNewGC Serial Old :+UseParallelGC Serial Old :+ useparallergc Serial Old is used in older generationsCopy the code

ParNew

It is basically the same as Serial, except that it is multi-threaded, multi-cpu, and has less pause time than Serial

-xx :+UseParNewGC UseParNew for new generation, use Serial Old for Old generation

Avenge (ParallerGC) /Parallel Old

A garbage collector that focuses on throughput. High throughput makes efficient use of CPU time to complete a program’s computation tasks as quickly as possible. It is mainly suitable for tasks that are performed in the background without much interaction.

Throughput is the ratio of CPU time spent running user code to total CPU consumption, i.e. Throughput = time spent running user code/(time spent running user code + garbage collection time). If the virtual machine runs for 100 minutes and garbage collection takes 1 minute, then the throughput is 99%.

Concurrent Mark Sweep (CMS)

A collector is a collector whose goal is to obtain the shortest collection pause time. At present, a large part of Java applications are concentrated on the server side of Internet sites or B/S systems. These applications pay special attention to the response speed of services and hope to have the shortest system pause time to bring users a better experience. The CMS collector is a good fit for such applications.

As the name (which includes “Mark Sweep”) suggests, the CMS collector is based on a “mark-sweep” algorithm, which is more complex than the previous collectors. The process is divided into four steps, including:

  • Initial tagging – short, just tagging objects that GC Roots can be directly associated with, very fast.

  • Concurrent marking – The process of GCRoots tracing, carried out simultaneously with the user’s application, marking all objects associated from GCRoots to traverse the entire reachable analysis path. This takes a long time, so use concurrent processing (garbage collector thread and user thread working at the same time)

  • Relabelling – transient. In order to correct the mark record of the part of the object whose mark changed during the concurrent marking as the user program continued to operate, the pause time in this phase is generally slightly longer than the initial marking phase, but much shorter than the concurrent marking phase.

  • Concurrent cleanup – simultaneous

Because the collector thread, which takes the longest concurrent markup and concurrent cleanup, can work with the user thread, the CMS collector’s memory reclamation process is, in general, executed concurrently with the user thread.

-xx :+UseConcMarkSweepGC, indicating that the new generation uses ParNew and the old generation uses CMS

CPU sensitive: CMS is sensitive to processor resources. After all, it adopts concurrent collection. When the number of processing cores is less than 4, CMS has a great impact on users.

Floating garbage: Since the CMS concurrent cleanup phase user threads are still running, new garbage is naturally generated as the program runs. This part of garbage is generated after the marking process, and the CMS cannot dispose of it in the current collection, so it has to be cleaned up in the next GC. This part of garbage is called “floating garbage”.

The need to set aside some memory because of floating garbage means that CMS collections cannot wait until the old age is nearly full, as other collectors do.

Old age space usage threshold in version 1.6 (92%)

If the memory set aside is not enough to hold floating garbage, a Concurrent Mode Failure occurs, at which point the VM temporarily enables Serial Old to replace the CMS.

Space debris is generated: The mark-clear algorithm results in discontinuous space debris

In general, CMS is the first concurrent garbage collector from the JVM, so it’s pretty typical.

However, the biggest problem is that CMS adopts the mark clearing algorithm, so there will be memory fragmentation. When the fragmentation is large, it will bring great trouble to the allocation of large objects. To solve this problem, CMS provides a parameter: – XX: + UseCMSCompactAtFullCollection, generally is open, if the allocation of large objects, memory fragments of the sorting process.

This is where Serial Old is used, because Serial Old is a single thread, so if you have a lot of memory and a lot of objects, this can be very difficult for a CMS.

17.Stop The World

Any GC collector will suspend The business thread, which is STW, Stop The World, so our goal in GC tuning is to minimize The time and frequency of STW.

G1

-XX:+UseG1GC

Memory layout: While other collectors prior to G1 collected the entire Cenozoic or old age, this is no longer the case with G1. When using the G1 collector, the memory layout of the Java heap is very different from that of other collectors. It divides the entire Java heap into independent regions of equal size. While the concept of new generation and old generation is retained, the new generation and old generation are no longer physically separated. They are all collections of parts of regions (which do not need to be continuous). Each region can be set with the parameter -xx :G1HeapRegionSize=size.

A Region contains a special Humongous Region for storing large objects. An object that exceeds the Region capacity is considered a large object. If the object is extremely large, N Humongous regions are used to store the object.

Parallelism and concurrency: G1 can make full use of The hardware advantages of multi-CPU and multi-core environment, using multiple cpus (CPU or CPU core) to shorten The stop-the-world pause time, some of The other collectors originally need to pause Java threads to perform GC actions, G1 collector can still make Java programs continue to execute in a concurrent way.

Generational collection: As with other collectors, the generational concept remains in G1. Although G1 can manage the entire GC heap independently without the cooperation of other collectors, it can work differently with newly created objects and old objects that have been around for a while and have survived multiple GC’s for better collection results.

Spatial integration: With CMS “tag – cleaning” algorithm is different, the G1 as a whole is based on “tag – finishing” collector algorithm implementation, from the perspective on the local () between the two Region is based on the “copy” algorithm, but in any case, this means that both algorithms G1 does not produce memory space debris during operation, after collection can provide neat available memory. This feature helps programs run for a long time and allocate large objects without triggering the next GC prematurely because contiguity memory space cannot be found.

Pursue pause time:

-xx :MaxGCPauseMillis specifies the maximum pause time of the target. G1 tries to adjust the ratio of new generation to old generation, heap size, and promotion age to achieve this target time.

-xx :ParallerGCThreads: Sets the number of GC worker threads.

Generally, the balance point between G1 and CMS is 6-8GB, and G1 can play an advantage only when it has more memory.

18. Pool and String

Constant pools have many concepts, including runtime constant pools, class constant pools, and string constant pools.

The virtual machine specification only defines the above areas as method areas, not virtual machine vendor implementations.

Strictly speaking, the static constant pool holds string literals, symbol references, and class and method information, and the runtime constant pool holds runtime direct references. Run-time constant pooling is the transfer of symbolic reference values from the static constant pool to the run-time constant pool after the class is loaded. After the class is parsed, symbolic references are replaced with direct references. After JDK1.7, these two constant pools were moved to the heap memory, which is the physical space, and still logically belong to the method area (which is the logical partition).

Literal:

The way we assign values to primitive variables is called literals or literals

For example, int I =120; long j=10L;

Symbolic references: Include fully qualified names of classes and methods (such as the String class, whose fully qualified name is Java/lang/String), field names and descriptors, and method names and descriptors.

Direct reference: The index value of a concrete object.

How is a String object implemented?

When you look at the String implementation, do you notice that the String class is modified with the final keyword and the char array is also modified with final? We know that a class is final to indicate that the class is not inheritable, while char[] is final+private to indicate that the String cannot be changed. Java implements this feature called immutability of Strings, which means that once a String has been created, it cannot be changed.

In Java, there are usually two ways to create string objects,

One is created as a String constant, such as String STR = “ABC”;

This method first checks if the object is in the string constant pool, and if so, returns a reference to the object, otherwise a new string will be created in the constant pool. This saves memory by reducing repeated creation of string objects with the same value.

The other is that String variables are created with a new form, such as String STR = new String(” ABC “).

In this way, the “ABC” constant string will be put into the constant structure when the class file is compiled, and the “ABC” will be created in the constant pool when the class is loaded. Second, when calling new, the JVM command will invoke the String constructor and create a String in heap memory by referring to the “ABC” String in the constant pool. Finally, STR will refer to String.

If you call the intern method, it checks to see if there are any references to strings equal to the object in the string constant pool. If there are no references to strings encountered for the first time, it adds them to the constant pool. If so, it returns a string reference from the constant pool. (this version is based on JDK1.7 and later)

19. See you often

JVM memory structure say!

Open questions, which can be found in the runtime data area of the section

It generally starts from two dimensions: thread private and thread shared. To the details of each memory area.

The Java virtual machine stack is thread-based. Even if you only have a main() method, it runs as a thread. In the life cycle of a thread, the data involved in the calculation is frequently pushed on and off the stack, and the life cycle of a stack is the same as that of a thread.

Each piece of data in the stack is a stack frame. Each time a Java method is called, a stack frame is created and merged into the stack. Once the corresponding call has been made, the stack is removed. When all frames are off the stack, the thread terminates. Each stack frame contains four areas:

  • Local variable scale
  • The operand stack
  • Dynamic connection
  • The return address

The native method stack is an area very similar to the virtual machine stack that serves native methods.

A program counter is a small memory space that acts as a line number indicator of the bytecode being executed by the current thread.

The heap is the largest area of memory on the JVM, where almost all of the objects we request are stored. When we say garbage collection, the object of operation is the heap.

Method area, this area stores content, including: class information, constant pool, method data, method code can be.

When does a memory stack overflow occur?

Java. Lang. StackOverflowError if appeared may be infinite recursion.

OutOfMemoryError: The machine does not have enough memory because threads are being created and the JVM is requesting stack memory.

Describe the process of new an object!

See section Assignment of Objects for details

Will Java objects be allocated on the stack?

Yes, if the object does not satisfy escape analysis, then the virtual machine will be allocated on the stack in certain cases.

What algorithms are available to determine whether an object is reclaimed, and what is used most by the actual virtual machine?

Reference counting and root reachability analysis are the two most commonly used methods.

What are the GC collection algorithms? What are their characteristics?

Copy, mark clearing, mark finishing. Copy fast, but to waste space, no memory fragmentation. Tag clearance space utilization is high, but there is memory fragmentation. The markup algorithm has no memory fragmentation but has low performance when moving objects. The three algorithms have their own advantages and disadvantages.

What does a complete GC flow look like in the JVM? How do objects advance to the old age?

Objects are allocated in the Cenozoic region first. If there is not enough space, Minor GC. Large objects (requiring a large amount of continuous memory space) directly enter the old state; Long-lived objects enter the senile state.

If the object is still alive after the new generation is born and passes the first MGC, the age is +1, and if the age exceeds a certain limit (15), it is promoted to the senility state.

What are the differences between reference relationships in Java?

Strong reference

Object obj = new Object() is a strong reference. In any case, the garbage collector will never reclaim the referenced object as long as there is a strong reference association (reachable to the root).

Soft reference SoftReference

Objects associated with soft references that are useful but not necessary are recycled just before the system runs out of memory (or is thrown out if there is still not enough space after the recycle).

A weak reference WeakReference

Objects associated with weak references that are useful (to a lesser degree than soft references) but not necessary will only survive until the next garbage collection, when GC occurs and will be collected regardless of memory availability.

Phantom reference PhantomReference

Ghost references, weakest (can be recycled at any time)

A notification is received during garbage collection to monitor whether the garbage collector is working properly.

What is the difference between final, Finally and Finalize?

In Java, final can be used to modify classes, methods, and variables (member variables or local variables)

When you modify a class with final, it indicates that the class cannot be inherited by other classes. We use final when we want a class never to be inherited, but be careful:

All member methods ina final class are implicitly defined as final methods.

There are two main reasons to use the final method:

(1) Lock the method to prevent it from being changed by inherited classes.

(2) Efficiency. In earlier Versions of Java, final methods were converted to inline calls. However, if the method is too large, there may not be much performance improvement. As a result, the final method is no longer required for these optimizations in recent releases.

Final member variables represent constants and can only be assigned once, after which their value does not change.

Finally as part of exception handling, it can only be used in try/catch statements with a statement block indicating that the statement must eventually be executed (with or without an exception thrown). It is often used in situations where resources need to be released

Finalize method in Object

Even if the unreachable object is judged by the reachability analysis, it is not necessarily dead, and it will still be in the stage of “probation”. To truly declare an object dead, it needs to go through two marking processes, one is that no reference chain with GCRoots is found, and it will be marked for the first time. Then we do a filter (if the object overwrites Finalize), we can save in Finalize.

Therefore, we suggest you not to use Finalize as much as possible, because this method is too unreliable. In production, you can hardly control the execution of methods or the call order of objects, so we recommend forgetting finalize method! Because what can be done with Finalize methods, there are better ones in Java, such as try-finally or other ways to do it

String s = new String(” XXX “); How many objects are created?

2

1. In the beginning the string “XXX” creates a string object in the constant pool when the class is loaded.

2. When we call new, we create a String in the heap, and the char array in the String will reference the String in the constant pool.