The JVM is a virtual machine and a specification that follows the design principles of the von Neumann architecture. Von neumann architecture, and points out that the computer processing data and instructions are binary number, the adoption of indiscriminate stored program stored in the same memory, and sequentially, instruction is composed of operation code and address code, the code determines the operation type and operation by the number of numeric type, address code, address code and the operands are pointed out. From DOS to Window8, from Unix to Ubuntu and CentOS, as well as MAC OS, different operating systems have different instruction sets and data structures. However, by establishing virtual machines on the operating system, THE JVM defines a unified set of data structures and operation instructions by itself. Translate the same set of language to each major mainstream operating system, realizing cross-platform running, it can be said that THE JVM is the core of Java, is the essence of Java can be compiled and run everywhere.

First, the composition and operation principle of JVM.

After all, THE JVM is a virtual machine and a specification. Although it conforms to von Neumann’s concept of computer design, it is not a physical computer, so it is not composed of any memory, controller, arithmetic device, input and output device. In my opinion, the JVM behaves more like an application or process when running in a real operating system. Its composition can be understood as the functional modules of the JVM process, and the operation of these functional modules can be seen as the operating principle of the JVM. There are many implementations of JVMS, such as Oracle JVMS, HP JVMS, AND IBM JVMS, but the most widely used Oracle HotSpot JVM studied in this article.

1. The position of the JVM in the JDK.

JDK is a necessary toolkit for Java development. One part of JDK is JRE. JRE is the Java runtime environment, and JVM is the most core part of JRE. I took a picture of the composition of the JDK Standard tion from Oracle.com.

The bottom line is how important the JVM is, and the performance optimization of JAVA applications in real projects, OOM exception handling, etc., will ultimately be handled from the JVM. HotSpot is a trademark of Oracle’s JVM, which is distinct from JVMS developed by IBM, HP, etc. Java HotSpot Client VM and Java HotSpot Server VM are two different implementations of the JDK for the JVM, the former to reduce startup time and memory footprint, while the latter to provide better application speed (see: http://docs.oracle.com/ javase/8/docs/technotes/ guides/ VM /index.html, which provides information about each version of the JVM). On the command line, you can view information about the current machine JVM through Java -version. Here is a screenshot of the command I executed on Win8.

As you can see, I installed build 20.13-B02, a HotSpot Server mode JVM.

2. The composition of the JVM

The JVM consists of four parts: ClassLoader, Runtime Data Area, Execution Engine, and Native Interface.

I found a diagram from CSDN that describes the general structure of the JVM:

2.1. The ClassLoader is responsible for loading a class file. The class file is marked at the beginning of the file, and the ClassLoader is only responsible for loading the class file.

2.2.Native Interface is responsible for calling the local Interface. It calls interfaces of different languages for JAVA. It records the corresponding local methods in the Native Method Stack, and then loads the corresponding local lib through the Execution Engine when calling the Method. Originally more than using some professional fields, such as JAVA drivers, map making engines, and so on, now on this local method interface call has been similar to Socket communication, WebService and other ways to replace.

Execution Engine is called Interpreter. Once loaded, the Class file puts instructions and data information into memory, and the Execution Engine interprets these commands to the operating system.

2.4.Runtime Data Area stores Data, which is divided into five parts: Stack, Heap, Method Area, PC Register, Native Method Stack. Almost all of the questions about Java memory focus on this area. The following is a description of Run-time Data Areas from Javapapers.com:

Javapapers.com considers Method Area to be a logical Area of the Heap, but this depends on the JVM implementer, whereas HotSpot JVM classifies Method Area as non-heap memory. Clearly not included in the Heap. NonHeap contains both PermGen and Code Cache. NonHeap contains both PermGen and Code Cache. PermGen includes Method Areas and is no longer used in JAVA SE 8. Check data (https://abhirockzz.wordpress.com/ / 18 / Java – 2014/03 se – 8 – is – knocking – are you there /), PermGen has been removed from the JVM and replaced by MetaSpace in Java8. There will be no OOM:PermGen Space exception in Java8. The Runtime Data Area is composed of:

2.4.1.Stack is the Java Stack memory, which is equivalent to the Stack in C language. The memory address of the Stack is discontinuous, and each thread has its own Stack. Inside the stack is the StackFrame, which is translated into Java virtual machine framework in the Chinese version of JVM Specification (also called StackFrame). StackFrame contains three types of information: local variables, execution environment, and operand stack. Local variables are used to store local variables used in a class’s methods. The execution environment is used to hold information needed by the parser to interpret Java bytecode, including the last method called, Pointers to local variables, and Pointers to the top and bottom of the operand stack. Operand stacks are used to store the operands and results needed for operations. A StackFrame is created when a Method is called, and at a certain point in time in a thread, only one Frame is active, called Current Frame, and the methods in the Frame are called Current Method, where the Class is defined as Current Class. Operations on the local variable and operand stack always refer to the current frame. When a StackFrame method is executed or another StackFrame method is called, the Stack becomes another StackFrame. Stack sizes are of two types, fixed and dynamic. Dynamic stacks can be allocated according to the needs of the thread. The following two pictures about relationship between the stack and the relationship between the stack and the heap memory basic description (from http://www.programering.com/ a/MzM3QzNwATA. HTML) :

Heap is used to hold object information, unlike Stack, which represents a run-time state. In other words, the stack is the run-time unit that solves the problem of how the program should execute, while the heap is the unit of storage that solves the problem of data storage. Heap is created with JVM startup and is responsible for storing all object instances and arrays. The storage space of a heap, like that of a stack, does not need to be continuous and can be divided into Young Generation and Old Generation (also known as Tenured Generation). Young Generation is divided into Eden and Survivor, and Survivor is divided into From Space and ToSpace.

The concept often mentioned with Heap is PermanentSpace, which is a dedicated memory area for loading class objects, non-heap memory, which together with Heap makes up JAVA memory, and which contains MethodArea (in HotSpotJVM implementations without CodeCache, MethodArea is equivalent to a GenerationSpace. During JVM initialization, we can specify parameters such as the size of PermanentSpace, heap size, ratio of Young Generation to Old Generation, ratio of Eden to From Space. Thus fine-grained to accommodate the memory requirements of different JAVA applications.

2.4.3. The PC Register is a program count Register. Each JAVA thread has a separate PC Register, which is a pointer to the next instruction read by the Execution Engine. If the thread is executing a Java method, PC Register stores the address of the instruction being executed; if it is a local method, the value of PC Register is undefined. The PC register is very small, occupying only one word width, and can hold a returnress or a pointer to a particular platform.

2.4.4. In the implementation of HotSpot JVM, Method Area belongs to non-heap Area. Non-heap Area includes two parts: Permanet Generation and Code Cache, while Method Area belongs to Permanert Generation. Permanent Generation is used to store class information such as Class Definitions, Structures, Methods, Field, Method (Data and code), and constants. Code Cache is used to store Compiled Code, that is, local Code, which is generated In HotSpot JVM with JIT(Just In Time) Compiler. JIT is a just-in-time Compiler that improves instruction execution efficiency. Compile bytecode files into local machine code as shown below:

Sring a=”xx”; Sring a=”xx”; Stirng b=”xx”, ask if a==b? The Method Area Runtime Constant Pool is the same as the Method Area Runtime Constant Pool. The Method Area Runtime Constant Pool is the same as the Method Area Runtime Constant Pool. When a is assigned to “xx”, a String Constant is generated in the Runtime Contant Pool. When b is assigned to “xx”, the Constant Pool checks for the presence of a Constant with the value “xx”. If so, the pointer to B also points to the address “xx”. Instead of generating a new String Constant. I looked up some of the things on the web about how String Constant is stored, and there’s a slight difference in where it’s stored. There’s a Constant pool allocated to the Heap for Constant storage, which is shared by all threads. A constant pool is part of a Method Area that is not in the heap.

In my opinion, both interpretations are true. Method areas can logically be part of the Heap, and in some JVM implementations it is appropriate to create a storage space on the Heap to record constants, so the former is fine. For the latter claim, the HotSpot JVM implementation does partition the method into non-heap memory, meaning it is not on the heap. I did a simple experiment on HotSpot JVM. After defining several constants, the application raised OOM: PermGen Space exception, confirming that constant pool is in Permanent Space in JVM implementation. However, my JDK version is 1.6. InternedStrings in JDK1.7 are no longer stored in PermanentSpace, but in Heap; PermanentSpace has been removed completely in JDK8, and InternedStrings have been added to MetaSpace (see OOM:MetaSpace). http://blog.csdn.net/zhyhang/article/details / 17246223). The JVM Runtime Data Area is the same as the Java Runtime Data Area. The JVM Runtime Data Area is the same as the Java Runtime Data Area. Or to see the corresponding version of the JDK source code is more reliable, or refer to different versions of the JVM Specification into the group 617434785 can be free access to the article knowledge points free video

2.4.5.Native Method Stack is a Stack used by Native methods (non-Java). Each thread holds a Native Method Stack.

3. Introduction to the running principle of JVM

After the Java program is compiled by the Javac tool as a. Class bytecode file, we execute the Java command and the class file is loaded by the JVM class Loader. We can see that the JVM is started by java.exe or Java under the Java Path. The initial, run-to-end steps of a JVM are as follows:

Call the OPERATING system API to determine the CPU architecture of the system, search for the /lib/jvm. CFG file in the JRE directory according to the CPU type, and then find the corresponding JVM. DLL file based on the configuration file (if -server or -client is included in the parameter). JNIENV class files can be loaded and processed using JNIENV instances. JNIENV class files can be loaded and processed using JNIENV instances. Class files are bytecode files that define variables, methods, and other details according to the JVM’s specifications, which the JVM manages and allocates memory to execute programs and manage garbage collection. Until the program ends, either all non-daemon threads of the JVM stop, or the program calls System.exit(), and the JVM life cycle ends.

I learned how the JVM manages allocated memory in two parts, class files and garbage collection.

JVM memory management and garbage collection

Memory management in the JVM mainly refers to Heap management by the JVM, because stacks, PC registers, and Native Method stacks have the same life cycle as threads and can be reused at the end of a thread. The management of the Stack is not the focus, but it is not entirely unfussy.

1. Stack management

The JVM allows the stack size to be fixed or dynamically variable. In Oracle about the parameter Settings of the official document about the setting of the Stack (http://docs.oracle.com/cd/E13150_01/ jrockit_jvm/more about jrockit over/jrdocs / Refman/optionx.html #wp1024112), is set to size by -xss. The default Stack size varies from machine to machine and JVM implementation to vendor or version. The following is the default Stack size for HotSpot:

We usually reduce stack growth by reducing the number of constants and parameters. In programming, we define constants in an object and then refer to them. In addition, fewer recursive calls can reduce stack usage.

The stack does not need garbage collection, although garbage collection is a hot topic in Java memory management. Objects on the stack are always live and reachable from the garbage collection point of view, so they do not need to be reclaimed. The space they occupy is freed as Thread ends. (reference: http://stackoverflow.com/ questions / 20030120 / Java – default – stack – size)

Two exceptions generally occur with respect to stacks:

1. A StackOverflowError is raised when the stack required for calculations in a thread exceeds the allowable size.

2. When the Java stack tries to extend and there is not enough memory for the extension, the JVM raises an OutOfMemoryError.

I conducted an experiment on the stack. Since recursive calls can increase the references of the stack and lead to overflow, the design code is as follows:

My machine is running x86_64, so the default Stack size is 128KB.

When I adjusted the -xss parameter to 3M in Eclipse, the exception disappeared.

Another thing to note on the stack is that for native code calls, memory may be allocated on the stack, such as C calls to malloc(). In this case, GC is not involved, and we need to manually manage the stack memory in the program, using the free() method to free the memory.

2. Heap management

Heap management is much more complex than stack management, and I learned about the role of each part of the heap, its Settings, and the exceptions that can occur in each part, and how to avoid them.

The figure above shows the combination of Heap and PermanentSapce. In Eden, objects are newly created, and From Space and To Space are objects that survive each garbage collection. Therefore, Eden will be emptied after each garbage collection. Surviving objects are placed first in From Space and then moved To To Space when From Space is full. When To Space is full, move To Old Space. The two Survivor zones are symmetric and have no sequence relationship, so there may be objects copied from Eden and the object copied from the previous Survivor in the same zone, while only the object copied from the first Survivor to the old zone. Also, one of the Survivor zones is always empty. At the same time, Survivor zones can be configured to be multiple (more than two) depending on program requirements, which increases the duration of an object in the young generation and reduces the likelihood of it being placed in the old generation.

Old Space houses objects with long life cycles, and some large newborn objects are also housed in Old Space.

The size of the heap is -xms and -xmx to specify the minimum and maximum values, and -xmn to specify the size of the Young Generation (some older versions also use -xx :NewSize), which is the total size of Eden plus FromSpace and ToSpace in the figure above. Run the -xx :NewRatio command to specify the Eden area size. If Xms and Xmx are equal, this parameter does not need to be set. The -xx: SurvivorRatio command is used to set the ratio of Eden to a Survivor zone. (refer to the blog: http://www.cnblogs.com/redcreen/archive / 2011/05/04/2037057 HTML)

There are two types of heap exceptions, one is Out of Memory(OOM), the other is Memory Leak(ML). Memory Leak will eventually lead to OOM. In the actual application, it can be seen as follows: From the Console, the memory monitoring curve is always at the top, and the program responds slowly. From the thread, most threads are GC, occupying a large amount of CPU, and finally the program terminates abnormally and reports OOM. The duration of OOM varies from one hour to 10 days a month. After an OOM/ML exception is confirmed, ensure that the site is protected. Dump heap. If not, enable GCFlag to collect garbage collection logs. If the problem is not ML, it is usually solved by increasing the Heap, increasing the physical memory, or modifying the program logic.

3. Recycling

Reclamation is triggered in the JVM when an object is not referenced, an uncaught exception occurs in the scope, the program completes normal execution, the program executes system.exit (), or the program terminates unexpectedly.

The algorithm for flagging garbage usage in the JVM is a root search algorithm. Simply put, you start with an object called GC Roots and search down. If an object cannot reach GC Roots, it can be reclaimed. This algorithm is better than a spamming algorithm called reference counting because it avoids the fact that two objects cannot be recycled when they reference each other.

There are three algorithms for garbage collection in the JVM:

1. Mark clearing algorithm, which scans the whole space from the root set, marks the surviving objects, and then recyls the unmarked objects in the scanning whole space. This algorithm is efficient when there are many viable objects, but it will generate memory fragmentation.

2. Copy algorithm, which scans from the root set and copies the surviving objects to the new space. This algorithm is more efficient when there are few surviving objects.

3. The tag collation algorithm, like the tag clearing algorithm, can scan and mark the surviving objects, and will collate the marked objects while recovering the unmarked objects, thus solving the problem of memory fragmentation.

In the JVM, different memory regions have different roles and properties, and different garbage collection algorithms are used, so there are several different garbage collectors defined in the JVM (the lines in the figure indicate that two garbage collectors can be used simultaneously) :

1. The Serial GC. By its name, serial GC is meant to be single-threaded, so it requires all threads to pause while collecting. This does not make sense for high-performance applications, so serial GC is typically used in Client mode JVMS.

2. ParNew GC. On the basis of SerialGC, multi-threading mechanism is added. But if the machine is single-CPU, this collector is less efficient than SerialGC.

3. The Parrallel Scavenge GC. This collector is also called a throughput-first collector, and throughput = program run time /(how long it takes for the JVM to perform garbage collection + program run time). Assuming that the program runs for 100 minutes and garbage collection takes 1 minute for the JVM, then throughput is 99%. The Parallel Insane GC is the default configuration of the Server mode JVM because it provides relatively good throughput.

4.ParallelOld is a type of parallel collector that uses the mark-collation algorithm introduced in JDK1.6. Prior to this, serial collector was only available.

5.Serial Old is the default collector in the Old generation client mode, single thread execution, and also as a backup collector when the CMS collector fails.

6.CMS, also known as response time priority collector, uses mark clearing algorithm. It has a recycle thread count of (CPU cores +3)/4, so it is more efficient when CPU cores are 2. CMS is divided into four processes: initial marking, concurrent marking, re-marking, and concurrent cleaning.

7.GarbageFirst (G1). In particular, the G1 collector can recycle both Young Generation and Tenured Generation. It was introduced in a version of JDK6 with high performance and attention to throughput and response times. Enter the group 617434785 can get free knowledge points in the text free video

The combined use of the garbage collector can be specified with parameters in the following table:

(MarsYOungNote: The third one in the picture is wrong. UseConeMarkSweepGC)

The default GC type can be viewed by jstat -gcutil [pid] 1000 or by jmap heap dump. The default GC type can be viewed by jstat -gcutil [pid] 1000 or by -verbose: gc-xx: +PrintGCTimeStamps -xx: +PrintGCDetails -xloggc :./gc.log to record GC logs.

There is a condition called Full GC, and the following conditions trigger a Full GC, also called MajorGC:

Tenured Space is not enough to create dozens of objects or arrays, so it will perform FullGC and OOM: Java heap Space after FullGC.

2. The size of Permanet Generation is insufficient, storing too much class information, which triggers FullGC in non-CMS cases. If you don’t have enough space later, use OOM:PermGen Space.

3. When promotion failed and Concurrent mode failure occur during CMS GC, FullGC will also be triggered. Promotion failed is caused when a Minor GC fails to place survivor Spaces and objects can only be placed in older generations. Concurrent mode failure is caused by running out of space in the old generation when an object needs to be put into the old generation during the CMS GC.

4. After judging MinorGC, if the object to be promoted to TenuredSpace is larger than TenuredSpace, FullGC will also be triggered.

As you can see, when FullGC occurs frequently, there must be a memory problem.

JVM data format specification and Class file

1. Data type specification

According to Von Neumann’s theory of computing, computers end up processing binary numbers. How does a JVM end up converting A Java file into a binary that can be recognized by all platforms? The JVM itself defines an abstract unit of storage called a Word. A word is large enough to hold a value of byte, char, short, int, float, reference, or returnAdress, and two words are large enough to hold the larger types long and double. It is usually the size of a pointer to a host platform, such as a 32-bit platform where the word is 32 bits.

The JVM also defines the basic data types it supports, including two parts: numeric types and returnAddress types. Numeric types are either integer or floating point.

Plastic:

Floating point:

A value of type returnAddress is a pointer to the opcode of a Java virtual machine instruction.

In contrast to Java’s basic data types, there are no Boolean types in the JVM specification. This is because heap Boolean operations in the JVM are handled with int, whereas Boolean arrays are handled with byte arrays.

As for String, we know that it is stored in the constant pool, but it is not a basic data type and can be in the constant pool because it is a JVM stipulation. If we look at the String source code, we can see that String is actually an array based on the basic data type CHAR. As shown in figure:

2. Bytecode files

The format of the bytecode files shows how the JVM regulates data types. Here is the structure of a ClassFile:

For the definitions of each field (refer to the JVM Specification and blog post: http://www.cnblogs.com/ zhuYears /archive/ 2012/02/07/2340347.html),

Magic:

Magic number, the only function of magic number is to determine whether the file is a Class file acceptable to the virtual machine. The magic value is fixed to 0xCAFEBABE and will not change.

Minor_version, major_version:

Are the minor and major versions of the Class file. Together, they form the format version number of the Class file. VMS of different versions support different Class file versions. VMS of a higher version can support earlier Class files.

Constant_pool_count:

The constant pool counter, constant_pool_count, has a value equal to the number of members in the constant_pool table plus 1.

Constant_pool [] :

Constant pool, constant_pool is a table structure that contains all string constants, Class or interface names, field names, and other constants referenced in the Class file structure and its substructures. Constant pools differ from others in that the index starts at 1 through constant_pool_count-1.

Access_flags:

Access_flags is a mask flag used to indicate the access permissions and underlying attributes of a class or interface. The value range and meanings of access_flags are shown in the following table:

This_class:

Class index, this_class value must be a valid index value for the item in the constant_pool table. Constant_pool table entries at this index must be constants of type CONSTANT_Class_info, representing the Class or interface defined by the Class file.

Super_class:

Superclass index. For a class, the super_class value must be 0 or a valid index value for the item in the constant_pool table. If it does not have a value of 0, the constant_pool table entry at the index must be a CONSTANT_Class_info constant, representing the immediate parent of the Class defined by the Class file. Of course, if a super_class has a value of 0, it must be java.lang.Object, because it is the only class without a superclass.

Interfaces_count:

The value of interfaces_count represents the number of direct parent interfaces of the current class or interface.

Interfaces [] :

The value of each member of the interfaces[] array must be a valid index value to the item in the constant_pool table, and its length must be interfaces_count. Each member interfaces[I] must be of type CONSTANT_Class_info.

Fields_count:

The value of fields_count represents the number of members of the current Class file fields[] array.

Fields [] :

Each member of the fields[] array must be a data item of a fields_info structure that represents a complete description of a field in the current class or interface.

Methods_count:

Methods_count specifies the number of members of the methods[] array in the current Class file.

The methods [] :

Method table, each member of the methods[] array must be a data item in a method_info structure that represents a complete description of a method in the current class or interface.

Attributes_count:

The value of attributes_count represents the number of members of the attributes table in the current Class file.

The attributes [] :

Attribute table, the value of each item in the Attributes table must be an Attribute_info structure.

Four, a Java class instance analysis

To understand the JVM’s data type specification and memory allocation in general, I created memerytest.java:

After compiling to memerytest. class, I used WinHex to view the file. The definition of each part of the bytecode file was different. But you can imagine how much a rigorously formatted file can help the JVM with memory management and execution.

After running the program, I found the corresponding process ID in Windows Explorer.

And check heap memory usage on the console via Jmap-heap 10016:

Parallel GC is running on 4 threads. The minimum FreeRatio of the heap is 40%, the maximum FreeRatio of the heap is 70%, and the heap size is 4090M. New objects occupy 1.5m. Young Generation can be extended to a maximum of 1363M, Tenured Generation to a size of 254.5m, and NewRadio and SurvivorRadio, The following is more specific about the current division of 1.5M in Young Generation: Eden occupies 1.0M and uses 5.4%; Space occupies 0.5M and uses 93%; To Space occupies 0.5M and uses 0%.

Below we through jmap dump beat the content of the heap print file:

Use Eclipse’s MAT plugin to open the corresponding file:

Select the first memory leak analysis report and open the test.bin file to display MAT’s analysis on possible memory leaks.

As a result, there are three places where memory leaks can occur, 22.10% of the Heap, 13.78%, and 14.69%, and if memory leaks occur, there is usually a very high ratio of objects. Open the first Probem Suspect and the result is as follows:

The ShallowHeap is the heap size of the object itself, excluding references. The RetainedHeap is the size of the ShallowHeap an object holds, including its own ShallowHeap and the ShallowHeap of the object that can be referenced. During garbage collection, if an object is no longer referenced and is collected, its RetainedHeap is the total amount of memory that can be collected. As you can see from the figure above, there is no memory leak in the program, so you can rest assured. If there are any objects that you are not sure about, you can investigate the changes of an object by HeapDumpFile at multiple points in time.

Five, the summary

This is the last few days I have sorted out the JVM related materials, mainly around its basic composition and operating principles, memory management, basic data types and bytecode files. JVM is a very excellent JAVA program and a good specification. This sorting and learning made me have a clearer understanding of it and a deeper understanding of the JAVA language.

This learning process strengthened my understanding of programmer development. Knowledge must be refined. In the next step, I will carefully read three versions of JVM Specification from Oracle while working, and improve my basic Java literacy by combining practice.