1. What is JVM

What is a JVM

Java Virtual Machine (JVM) It simulates a computer to achieve the computing function of a computer. The JVM is able to execute Java bytecode across computer architectures primarily because the JVM shields the differences between the software or hardware associated with each computer platform, leaving platform-specific coupling to the JVM provider.

2. Basic structure of JVM

What is the basic structure of the JVM

The basic structure of the JVM falls into three main categories:

  1. Class loading subsystem

    The required classes are loaded into the JVM when the JVM starts or when the class runs

  2. Runtime data area

    The memory is divided into several areas to simulate the storage, recording and scheduling function modules on the actual machine, such as registers of various functions on the actual machine or the recorder of PC Pointers

  3. Execution engine

    The execution engine is responsible for executing the bytecode instructions contained in the class file, which is equivalent to the CPU on the actual machine

3. Run time data area

What does the runtime data area consist of, and what exactly does each area hold

Runtime data is divided into two categories: == thread private == and == shared data area ==

Thread private: program counters, virtual machine stacks, and local method stacks

Shared data area: Java heap, method area (Java 8 meta space)

  • Program counter: Records the position of the current thread specified instruction

  • Virtual machine stack: stack frame structure, each method is pressed into a stack frame, the stack frame contains the operand stack, local variable table, dynamic link and method exit, where the local variable table stores 8 basic types and a reference type

  • Native method stack: It has similar features and functions as a virtual machine stack and serves Native methods

  • Heap: Holds all object instances and arrays

  • Method area: virtual machine loaded class information, constants, static variables, real-time compiler compiled code and other data

4 Hotspot method area implementation

What does the hotspot VIRTUAL machine’s method area hold, and what is the difference between before 1.7 and after 1.8

For common hotspot virtual machines, methods are divided into versions 1.7 and 1.8:1.7 and prior, method areas are also known as persistent generations. It stores class information, constants, static variables, and code compiled by the just-in-time compiler. After 1.8, the method area was implemented using a meta-space. The permanent generation was deprecated and the meta-space was stored in local memory. The class information storage space, constant pools and static variables are placed in the Java heap

The biggest difference between a meta-space and a permanent generation is that the meta-space is not in the virtual machine, but uses local memory.

Why does JDK1.8 want to move the method area from the JVM (permanent generation) to direct memory (meta-space)

Cause one: From the perspective of data flow, indirect memory: local IO > direct memory > Direct memory > local I/O Direct memory: local IO > direct memory > local I/O Cause two:

The permanent generation have a JVM itself set fixed size limit, cannot adjust, and yuan space using the direct memory, restricted by the machine available memory, and never get Java. Lang. OutOfMemoryError.

The structure of 5 stacks

What are the partitions of the heap like and what are their characteristics

JVM thread sharing areas can be divided into three areas: new generation, old generation, and permanent generation. The JVM heap is divided into new generation and old generation

New generation:

Eden space, From Survivor space, and To Survivor space

Where a new object is allocated memory, a minor GC will clear the Eden and Survival region and move the surviving object to another Survival region

(Understand the memories of Eden, Survivor)

Old age:

Objects created in the new generation that survive many collections will enter the old age.

6 Why does the New generation have two survivor zones

Why did the New generation set up two survivor zones outside Eden

Survivor area is in order to achieve convenient replication algorithm: the original memory space is divided into two pieces, using only one piece at a time, at the time of garbage collection, copies are using live objects of memory to another memory area, and then clear memory area is used, the exchange of two area role, complete garbage collection.

Replication algorithm, why to use replication algorithm in the new generation: because the generation of gc is more frequent, the survival rate of objects is low, the replication algorithm in the collection is more efficient, and there is no memory fragmentation. However, the cost of the replication algorithm is to cut the memory in half. In order not to waste too much memory, two survivor from and survivor to regions of the same size are divided. After each GC, the surviving object is copied to another survivor, and Eden and the survivor that was just used are emptied.

7 Object access location

Object access locates what methods are available

  • Handle access

    Easy to move objects, fast GC

  • Direct pointer access (Hotspot selected)

    Faster access to objects, saving one addressing time

8 Determine the object survival mode

How do you determine if an object is alive

  • Reference counting method

    There is a counter in the instance object, which is +1 if it is referenced somewhere, -1 if it is invalidated, and reclaimed when it is 0. The JVM doesn’t use it because it can’t solve the circular reference problem

  • Reachability analysis (JVM selected)

    Searching down from GC Roots, unreachable objects are reclaimed

    GC Roots object:

    • Reference objects in the virtual machine stack
    • Class static properties in the method area reference objects
    • A constant reference object in a method area
    • JNI(Native method) reference objects in the Native method stack

9 GC safety point

What is safepoint and how do you select a safepoint

HotSpot determines which objects to recycle through GC Roots enumeration. Find out which objects are GC Roots. There are two methods: one is to traverse the method area and stack area lookup (conservative GC). One is to record the position of GC Roots through OopMap data structure (exact GC). Conservative GC is too expensive. So in HotSpot, the structure of OopMap is used to mark the location of object references. OopMap records references between variables in the stack and objects on the heap. With OopMap, HotSpot can quickly and accurately locate GC Roots for GC.

STW is required to perform GC operations (stop the world, all worker threads must pause)

The == safe point means that at this point, the state of all worker threads is determined and the JVM can safely perform GC. = =

Too many safety points, too frequent GC, increase the runtime load; Too few safety points and too long GC wait time.

Generally, safety points are selected in the following locations:

1. The end of the loop

2, method before return

3. After calling the method

4, throw the position of the anomaly

Why these locations were selected as safety points:

This prevents the program from accessing Safe Point for a long time. For example, the JVM waits for all application threads to enter the safe point before GC. If one thread never enters the safe point, the JVM pauses during GC will be prolonged

How do all threads run to the nearest Safe Point and stop when GC occurs?

There are two main ways:

Snapped-interrupt: When GC occurs, all threads are first interrupted, and if it is found that the thread has not executed to Safe Point, the thread is restored to run on Safe Point.

Active interrupts: Instead of handling thread interrupts directly when GC occurs, simply set a flag and have individual threads actively poll this flag while executing, and interrupt and suspend themselves when they find the interrupt flag to be true.

The JVM takes the == active interrupt ==. Polling marks overlap with safety points.

10 GC

Let’s talk about GC

There are three main problems:

  1. What is garbage
  2. Where can I recycle my rubbish
  3. How to Recycle

1. There are numerous objects in the memory, and these objects need to be accurately marked, divided into live objects and garbage objects. The marking method is reachability analysis, as described earlier.

2. Occurs in the run-time data area. As a thread dies, thread-exclusive memory (stacks, program counters, and local stacks) is reclaimed.

3. Currently, there are three main GC algorithms:

  • Mark-clear algorithm

    First, reachable objects are marked by GC Roots, and then unmarked objects are cleaned.

    Disadvantages: Memory fragmentation, low efficiency

  • Replication algorithm

    When one block of memory is used up, objects are copied to another block

    Disadvantages: space for time, sacrifice part of memory

  • Mark-collation algorithm

    The surviving objects were labeled with GC Roots

    The surviving objects are moved to one end, sorted by memory address, and then the memory beyond the end boundary is cleaned up directly.

    Less efficient, even than tag cleanup

    Legend:

Mark-clear

Replication algorithm:

Mark-tidy

How are these algorithms used in the JVM

It can be seen from the above three GC algorithms that there is no perfect algorithm with both spatial and temporal efficiency, so these algorithms are used in a generational way

The JVM divides objects into new generation and old generation based on their lifetime. Generally, new objects will be preferentially allocated to the new generation. If the new generation objects survive longer than a certain threshold, they will be moved to the old age.

The New generation can recycle a large number of objects after each GC, so it is suitable for replication algorithms. The memory partition is not 1:1. By default, it will be divided into Eden and two Survivor Spaces at 8:1:1. Each time Eden and a Survivor are co-stored, the surviving objects are copied to another free Survivor region for GC, and the two Survivor functions are switched, and so on. When the Survivor space cannot hold the remaining surviving objects, they are moved to the old age by allocating the guaranteed base.

In the old age, object survival would be particularly high and there would be no extra space for allocation guarantees, so mark-clean or mark-tidy algorithms would be used.

11 Memory reclamation and allocation policies

When do objects enter the old age

  1. Big object goes straight to the old age

    • What are the big objects?

    In general, large objects are long strings and arrays, or static objects.

    • So how big is a big object?

    This virtual machine provides a parameter – XX: PretenureSizeThreshold = n, only need to be greater than this parameter set value, can directly into old age.

  2. Long-lived objects will go into the old age

    Objects that survive this Minor GC grow one year, with a default threshold of 15 years into the old age

  3. Dynamic age judgment

    Survivor space The sum of the size of all objects of the same age is greater than half of the size of Survivor space

What is the space allocation guarantee strategy

Before the Minor GC, check whether the maximum available continuous space of the old age is larger than the total space of all objects of the new generation, and then check whether the space is larger than the average size of objects promoted to the old age. If the size is larger than the Minor GC, then check whether the space is larger than the average size of objects promoted to the old age

12 GC Collector

Introduce the JVM garbage collector

The younger generation garbage collector is Serial, ParNew, Parallel Insane, and the older generation garbage collector is CMS, Parallel Old, Serial Old, and G1. A wire connection indicates that they can be used together.

Throughput: Ratio of execution time (excluding GC time) to total time 1-1 /(1+ N)

CMS and G1 are focused and analyzed separately

The collector Serial, parallel or concurrent New generation/old age algorithm The target Applicable scenario
Serial serial The new generation Replication algorithm Speed of response priority Client mode in a single-CPU environment
Serial Old serial The old s Mark-tidy Speed of response priority Client mode in single CPU environment, backup scheme of CMS
ParNew parallel The new generation Replication algorithm Speed of response priority In a multi-CPU environment, it works with the CMS in Server mode
Parallel Scavenge parallel The new generation Replication algorithm Throughput priority Computations in the background without much interaction
Parallel Old parallel The old s Mark-tidy Throughput priority Computations in the background without much interaction

A few words about the CMS garbage collector

CMS (Concurrent Mark Sweep) collector objective: shortest recovery pause time, “mark-sweep” implementation, wide application scenarios, relatively mainstream

  • The working process

    1. Initial mark: Mark only objects that GC Roots can be directly associated with, very fast, “Stop The World”

    2. Concurrent marking: Starting with the object marked in the first step, all reachable objects are marked concurrently.

    3. Relabelling: Corrects the marking record of the part of the object that is marked during concurrent marking because the user program continues to operate. The pause time in this phase is generally slightly longer than in the initial marking phase, but much shorter than in concurrent marking. “Stop The World”.

    4. Concurrent cleanup.

  • The advantages and disadvantages:

    Excellent: Concurrent collection, short pause time

    Missing:

    1. Flag a fragmentation problem with the cleanup algorithm

    2. Concurrent mode failure. == the CMS GC and the business thread are both executing ==, and there are two conditions :(1) after the completion of the Minor GC, some live objects need to be put into the old age, and the old age has not been cleared, so there is not enough space. (2) When doing Minor GC, the new generation won’t let it go, and the old generation won’t let it go either

    3. promotion failed

      During the Minor GC, Survivor Spaces cannot be placed, and objects can only be placed in the old age, which cannot be placed in the old age. This is mostly due to the fact that there is enough free space in the old days, but there is too much fragmentation to find a contiguous area to store the object.

  • CMS shortcomings solutions

    1. Garbage debris problem:

      Set parameters: – XX: CMSFullGCsBeforeCompaction = n last CMS concurrent GC after the execution, how many times have to perform full GC will do compression. The default is 0, which means that compression is done every time the CMS GC fails and goes to full GC.

    2. Concurrent mode failure

      Set parameters – XX: + UseCMSInitiatingOccupancyOnly – XX: CMSInitiatingOccupancyFraction = 60: refers to the set = = CMS in the memory footprint began to GC = = rate of 60%

      Because enough memory needs to be reserved for user threads during the CMS GC process, the CMS collector cannot wait until the old age is almost completely filled before collecting, as other collectors do.

    3. Promotion failed problem

      The CMS provides the following parameters to control the tokenization algorithm for a certain number of Full GC runs:

      -XX:UseCMSCompactAtFullCollection -XX:CMSFullGCBeforeCompaction=5
      Copy the code

      In other words, THE CMS carries out a marker sorting algorithm after 5 Full GC, so as to control the fragments of the aged band within a certain number

    Summary: Use tag cleanup to remove debris and CMS operations early.

Introduce the G1 collector

Traditional GC collectors divide contiguous memory space into new generation, old generation, and permanent generation (JDK 8-space Metaspace), which is characterized by sequential logical storage addresses for each generation. However, the storage addresses of G1 generations are discontinuous. Each generation uses n discontinuous regions of the same size, and each Region occupies a contiguous virtual memory address.

Some regions are marked with H for Humongous, indicating that these regions store Humongous objects (H-obj), that is, the objects that are stored in half of >= Region.

H – obj features:

  • H-obj is directly assigned to the old age, preventing repeated copy movement
  • H-obj is recycled in the cleanup and full GC phases of the Global Concurrent marking phase
  • The H-OBJ is checked to see if the Java heap usage threshold is exceeded before allocation, and if so, the concurrency flag is initiated in order to prevent Evacuation Failures and Full GC early

== To reduce the impact of continuous H-objs allocation on GC, large objects need to be turned into ordinary objects. It is recommended to increase Region size. = =

  • The GC process

    The G1 offers two GC modes, Young GC and Mixed GC, both of which completely Stop The World.

    • Young GC: Selects regions in all Young generations. The time overhead of the young GC is controlled by controlling the number of regions in the young generation, i.e. the memory size of the young generation.
    • Mixed GC: Select all regions in the young generation, plus some regions in the old generation with high revenue according to global Concurrent marking statistics. Select the old Region with high income as far as possible within the cost target range specified by the user.

    Detailed process Reference

How is G1 better than CMS

  • G1 is a garbage collector with a defragmenting memory procedure that does not generate much memory fragmentation.
  • G1’s Stop The World(STW) is more controllable, and G1 adds a predictive mechanism for pause times, allowing users to specify desired pauses.
  • G1 is a collector based on a mark-tidy algorithm as a whole, and a copy algorithm locally (between two regions)

13 Class loading process

Describe how the JVM loads Class files

All classes in Java need to be loaded into the JVM by the class loader to run. The classloader itself is a class that reads a class file from hard disk into memory. With the exception of explicitly loaded classes like reflection, there is little need to care about class loading, which is implicitly loaded

Java classes are loaded dynamically, ensuring that the base classes (such as the base class) that the program runs on are fully loaded into the JVM, while other classes are loaded as needed. Save memory overhead.

Talk about the PROCESS of JVM class loading

Class loading process: load -> Connect -> initialize. The connection process can be divided into three steps: verify -> prepare -> parse.

The whole process: through the fully qualified name to load the generated class object into memory, and then verify the class file, including file format verification, metadata verification, bytecode verification, etc. Preparation is to allocate memory for this object. Parsing is the conversion of symbolic references to direct references (pointer references), and initialization is the code that starts executing the constructor

What class loaders are available in the JVM

Class loaders predefined by the JVM

There are three important classloaders built into the JVM, all of which are implemented by Java and inherit from java.lang.ClassLoader except BootstrapClassLoader:

  • BootstrapClassLoader is responsible for loading system classes%JAVA_HOME%/libJar packages and classes in the directory
  • The ExtensionClassLoader is responsible for loading directories%JRE_HOME%/lib/extJar packages and classes under the directory, or byjava.ext.dirsJar package in the path specified by the system variable.
  • AppClassLoader A user-oriented loader that loads all jar packages and classes in the current application classpath

In addition, you can also customize class loaders

Talk about the parental delegation model

During class loading, the system first checks whether the current class has been loaded. Classes that have already been loaded are returned directly, otherwise an attempt will be made to load. First it delegates the class request to the parent class loader, recursively all the way to the top, and when the parent can’t complete the request, the subclass will try to load it. The mother is the parent of the child. When the parent class loader is null, BootstrapClassLoader is used as the parent class loader.

The parental delegation model is not a mandatory constraint, but a way that JAVA designers recommend using class loaders.

What are the benefits of the parental delegation model

(1) Security, avoid user-written classes to dynamically replace some core Java classes, such as String.

(2) Avoiding double loading of classes, because the JVM distinguishes between classes, not only by class name, but also by the same class file being loaded by different classloaders.

Is there a way to break the parental delegation model

In some cases, class files need to be loaded by the subclass loader, and the parent delegate model needs to be broken. To avoid parental delegation, you can define a custom class loader and override loadClass().

Classloaders ** Each Tomcat webappClassLoader loads a class file in its own directory without passing it to the parent classloader. Objective: to * *

  1. The classes and lib in each WebApp should be isolated from each other so that a class library loaded in one application will not affect another application. For many applications, a shared lib is needed so that resources are not lost. (For example, if log4j is used by both Webapp1 and Webapp2, you can add log4j to tomcat/lib to indicate that all applications share this library. Imagine if Log4j is large and all 20 applications are loaded separately. That’s really unnecessary.
  2. Use a separate classloader to load Tomcat’s own libraries to prevent other malicious or unintentional damage
  3. Hot deployment, periodically checking whether hot deployment is needed and, if so, reloading the class loader as well, and to reload other related classes