Object instantiation and memory layout

Object instantiation

How objects are created

  • The new object
    • Deformation 1: static method of Xxx
    • Metamorph 2: static method of XxxBuilder/XxxFactory (both Builder mode and Factory mode can get objects)
  • Class newInstance0: reflection mode, can only call empty parameter constructor, permission must be public
  • Constructor newInstance(Xxx) : a reflective way of calling empty arguments and parameter constructors with no permissions required
  • Using clone() : Without calling any constructors, the current class needs to implement the Cloneable interface to implement the Clone () method
  • Use deserialization: a binary stream of an object from a file, from the network
  • Third-party library Objenesis

Object instantiation steps

1. Load the class meta information

The virtual machine checks a new instruction to see if its arguments can locate a symbolic reference to a class in Metaspace’s constant pool, and to see if the class represented by the symbolic reference has been loaded, parsed, and initialized. (that is, to determine whether the class meta information exists).

If not, use the current ClassLoader in parental delegate mode to find the corresponding. Class file with ClassLoader+ package name + class name Key. If the file is not found, a ClassNotFoundException is thrown, and if found, the Class is loaded and the corresponding Class object is generated

Class loading stages: load, link (verify -> prepare -> parse), initialize

2. Allocate memory for objects

The size of the object is calculated, and a chunk of memory is allocated to the new object in the heap. If the instance member variable is a reference variable, only the reference variable (address) space is allocated, that is, 4 bytes. Object size is calculated.

  • If memory is neat, use pointer collisions

If The memory is regular, The virtual machine will Bump The Pointer to allocate memory for The object. This means that all used memory is on one side and free memory is on the other, with a pointer in the middle as a pointer to the dividing point. Allocating memory simply moves the pointer to the free side by a distance equal to the size of the object. If the garbage collector chooses Serial, ParNew, etc. based on compression algorithms, the virtual machine uses this allocation method. Pointer collisions are typically used with collectors with compact processes.

  • If memory is not tidy, use the free list

If memory is not tidy and used memory and unused memory are interleaved, the virtual machine will use the free list method to allocate memory for objects. The virtual machine maintains a list of memory blocks that are available, finds a large enough chunk of the list to allocate to object instances, and updates the list. This method of allocation is called the Free List.

Note: The allocation method chosen depends on whether the Java heap is clean, which in turn depends on whether the garbage collector used has collation capabilities.

3. Handle concurrency problems

Another problem when allocating memory is keeping new objects thread-safe in a timely manner. Object creation is a very frequent operation, and the virtual machine needs to deal with concurrency issues. VMS solve concurrency problems in two ways:

  • Compare And Swap (CAS) failed retry And region lock: Atomicity of pointer update operations is ensured

  • TLAB allocates a small block of memory in the Java heap for each Thread. This is called the Thread Local Allocation Buffer (TLAB). This can be set with the -xx :+/ -usetlab parameter. (Heap chapter has been introduced)

4. Default initialization of attributes (zero value initialization)

When memory allocation is complete, the virtual machine initializes all allocated memory space to zero (excluding object headers). This step ensures that the instance fields of the object can be used directly in Java code without assigning initial values, and that the program can access the zero values corresponding to the data types of these fields.

5. Set the object header of the object

The object’s owning class (that is, the metadata information of the class), the object’s HashCode, and the object’s GC information, lock information and other data are stored in the object’s object header. How this process is set up depends on the JVM implementation.

6. Explicit initialization of attributes, initialization in the constructor code block, initialization in the constructor (init method for initialization)

From a Java program’s point of view, initialization begins. Initializes a member variable, executes a construction block, calls the class constructor, and assigns the first address of the heap object to the reference variable. Therefore, in general (depending on whether the bytecode is followed by the Invokespeclal instruction), the new instruction will be followed by the execution method to initialize the object as the programmer wishes, so that a truly usable object is fully created.

Note:

  • The explicit initialization of properties does not include static variables, which and static code blocks are explicitly initialized when the Clinit () method is executed at load time and only once.

  • The constructor block initializes {} : there is no static modifier. The code block decorated with static{} is a static code block, also executed in the Clinit () method.

  • Construct code block, also called instantiate code block :{}

  • Static code block: static{}

The order in which to assign an instance object’s properties is executed (init()) :

  1. The default value of the property is initialized
  2. Initialize explicitly or in a constructed code block (see code position order)
  3. Class in the constructor

Static variables and static code blocks:

  1. Initialization of default values for properties (chain-prep)

  2. Explicit initialization of static variables and static code blocks (initialization phase – Clinit ())

Object distribution example

public class Customer{
    int id = 1001;
    String name;
    Account acct;

    {
        name = Anonymous Client;
    }
    public Customer(a){
        acct = newAccount(); }}class Account{}Copy the code
public class CustomerTest {
    public static void main(String[] args) {
        Customer cust = newCustomer(); }}Copy the code

Object location access

  1. Handle access

Advantages:

Reference stores stable handle addresses. When objects are moved (which is common in garbage collection), only the instance data pointer in the handle is changed. Reference itself does not need to be modified.

  1. Direct Pointers (adopted by Hotspot)

Advantages:

Reference refers directly to the address of the object without passing through the handle pool, reducing the need for a reference query. Hotspot takes this approach.

Object memory layout

Object head (Header)

Mark Word

The Mark Word in the object header is used to indicate the thread lock status of the object, GC age, and object hashCode

On 32-bit systems, it is 32 bits, or 4 bytes.

On 64-bit systems, it is 64-bit, or 8 bytes.

  • Unused: Unused bit

  • Identity_hashcode: 31-bit object identifier hashCode that uses lazy loading technology. The method system.identityHashCode () is called to compute and the result is written to the object header. When the object is locked (biased, lightweight, heavyweight), the bytes of MarkWord do not have enough space to hold hashCode, so the value is moved to the pipe Monitor.

  • Age: 4-bit Java object age. In GC, if an object is copied once in a Survivor zone, the age increases by 1. When an object reaches a set threshold, it will be promoted to the old age. By default, the age threshold for parallel GC is 15 and for concurrent GC is 6. Since age has only four bits, the maximum value is 15, which is why the -xx :MaxTenuringThreshold option has a maximum value of 15.

  • Biased_lock: Whether the biased lock flag is enabled for the object. It is only 1 binary bit. A value of 1 indicates that biased locking is enabled for the object; a value of 0 indicates that biased locking is not enabled for the object. Lock and biASED_LOCK together indicate what lock state the object is in.

  • Lock: 2-bit lock-status flag bit that is set because you want to represent as much information as possible in as few bits as possible. Depending on the value of the tag, the meaning of the entire Mark Word is different. Biased_lock and lock together express the following meanings of lock status:

  • Thread: ID of the thread that holds the biased lock.

  • Epoch: Timestamp of bias lock.

  • Ptr_to_lock_record: pointer to the lock record in the stack in the lightweight lock state.

  • Ptr_to_heavyweight_monitor: pointer to the object Monitor Monitor in the heavyweight lock state.

Lock escalation
  1. When the initial lock object is created, there are no threads competing for it. The Mark Word of the object is the first case in the figure below. This is biased towards the lock identifier bit 0 and the lock state 01, indicating that the object is unlocked (no threads competing for it).
  2. When there is a thread competing for the lock, biased lock is used first, indicating that the lock object favors this thread. This thread needs to execute any code associated with the lock without any checks and switches. This kind of competition is not fierce, and the efficiency is very high. Mark Word records the ID of its preferred thread as if it were an acquaintance. The second scenario is shown below.
  3. When two threads start competing for the lock object, the situation changes. Instead of faving (exclusive) the lock, the lock is upgraded to a lightweight lock. The two threads compete fairly. The third scenario is shown below.
  4. If the thread lock object more competition, led to more switching and waiting, the JVM will lock the object lock escalation to heavyweight lock, this is called the synchronous lock, the lock object Mark Word to change again, will point to a monitor object, the monitor objects in the form of collection, to register and manage the queue of threads

Klass Pointer

Klass Word is a pointer to the Class information in the method area

On 32-bit systems, it is 32 bits, or 4 bytes.

On 64-bit systems, it is 64-bit, or 8 bytes. (4 bytes to enable pointer compression)

When a 64-bit machine sets the maximum heap memory to less than 32GB, pointer compression is turned on by default and the 8-byte pointer is compressed to 4 bytes

Disable pointer compression: -xx: -usecompressedoops

Array length (for array objects)

This is optional, only available if the object is an array object, and takes up 4 bytes. 4 bytes regardless of whether pointer compression is enabled

Instance Data

Instance data is the body part used to hold the attributes and values of an object, and the memory footprint depends on the number and type of attributes of the object

Basic type Installation Basic type size calculation

Reference variables on 32-bit systems are 32 bits, or 4 bytes.

Reference variables on 64-bit systems are 64 bits, that is, 8 bytes. (4 bytes to enable pointer compression)

The instance data section will only store objectsThe instance data, does not storeStatic data. In addition, the instance data portion of the child object inherits all the instance data of the parent class, including the private types.

Align Padding

  1. The HotSpot VIRTUAL machine specifies that the object’s starting address must be an integer multiple of 8, which means that the object’s size must be an integer multiple of 8. Therefore, if the total memory occupied by the object header + instance data of an object does not reach a multiple of 8, it will be aligned to fill the total size to the nearest multiple of 8.

  2. Fields also need to be aligned with each other, and the minimum unit of field alignment is 4 bytes.

The virtual machine dispenses a box with the nearest multiple of 4 for each field. For example, if a class has a Boolean field and an int field, allocate the first 4 byte box for the Boolean, put the Boolean in the box, consume 1 byte, waste 3 bytes, because the int consumes 4 bytes, do not fit, need to allocate a 4 byte box.

The virtual machine does not assign boxes to fields in the order in which they are declared, but resorts them to make the most of them. For example, a class has the following variables: char, int, Boolean, byte. If boxes were allocated in declarable order, a box would be allocated for char, wasting 2 bytes. Assign int a box, which happens to be full, no waste. Next, allocate a box for Boolean, wasting 3 bytes. Finally, allocate a box for byte, wasting another 3 bytes.

After reordering, the VM can allocate only two boxes in the order of int(4), char(2)+ Boolean (1)+byte(1), which greatly reduces memory waste. But fields of reference types are always assigned last.

Calculate object size

Size of each type of data

Primitive types Memory usage (bytes)
reference Enable pointer compression 4. Disable pointer compression 8
boolean 1
byte 1
short 2
char 2
float 4
int 4
long 8
double 8

Disable pointer compression: -xx: -usecompressedoops

Object size = object header (token word + type pointer + array length (optional)) + instance data + alignment padding

Static properties are not included in the object size

In order to reduce space waste, in general, the priority order of field allocation is:

Double > Long > int > float > char > short > byte > Boolean > Object reference.

The general rule here is: whenever possible, allocate the most space-intensive types first.

JOL (Java Object Layout)

  1. JOT can query object sizes, introducing the following dependencies:
<dependency>
    <groupId>org.openjdk.jol</groupId>
    <artifactId>jol-core</artifactId>
    <version>0.16</version>
</dependency>
Copy the code
  1. Other methods of querying object size:
  • Instrument
  • SA
  • Unfase

The following example is on a 64-bit machine with pointer compression turned on by default.

No property object size

public class Test {
    public static void main(String[] args) {
        Object o = newObject(); System.out.println(ClassLayout.parseInstance(o).toPrintable()); }}Copy the code

Object size = token word 8+ type pointer 4+ array length 0 + instance data 0 + Alignment fill 4=16 bytes

There are property object sizes

public class App {
    public char charP;
    public int intP;
    public boolean booleanP;
    public byte byteP;
}
Copy the code

Size = tag word 8+ type pointer 4+ array 0 + instance data (intP 4+charP 2+booleanP 1+byteP 1) + alignment fill 4=24 bytes

public class Test {
    public static void main(String[] args) {
        App app = newApp(); System.out.println(ClassLayout.parseInstance(app).toPrintable()); }}Copy the code

The output

Disable pointer compression result: -xx: -usecompressedoops

The subclass size of the parent class

  • Does the instance data part of a subclass exclude the parent class’s private instance attribute privateFlag? Don’t

  • Does the instance data portion of a subclass override the same instance attribute of the parent class? It’s not covered. It’s double


class Father {
    public boolean publicFlag;/ / 1
    private boolean privateFlag;/ / 1
    public static boolean staticFlag;// There is no instance
}
 
public class Children extends Father {
    public boolean publicFlag;/ / 1
    private int b;/ / 4
    protected double c;/ / 8
    long d;//8 Note that this is not the wrapper class Long
}
Copy the code

Calculate the Children object size:

Parent = instance data 2 + alignment padding 2=4 bytes

Child = tag word 8+ type pointer 4+ array 0 + instance data 21 + alignment fill 3=36 bytes

Total = parent + child =4+36=40

The output

Conclusion:

  • The subclass object contains all the instance variables of the parent class, and the parent class instance variables are allocated first, and then the subclass instance variables are allocated.

  • Static variables are not allocated

There’s an array property object size

Size = tag word + type pointer + array length + instance data (array length * array metadata size) + completion fill


    private static class ObjectC {
    	ObjectD[] array = new ObjectD[2];
    }
    
    private static class ObjectD {
    	int value;
    }
 
Copy the code

ObjectC obj = new ObjectC();

ObjectC size = 8(token word) + 4(type pointer) + 4(ObjectD[] reference) = 16

Array object size

System.out.println(ClassLayout.parseInstance(new int[100]).toPrintable());
System.out.println(ClassLayout.parseInstance(new Object[100]).toPrintable());
   
Copy the code

New int[100] = 8(marker) +4 (type pointer) +4 (array length) + 100×4 (array size x array content size) = 416 bytes

OOP – Klass model

HotSpot is based on c++, and c++ is an object-oriented language with basic object-oriented characteristics, so the simplest way to represent objects in Java is to generate a c++ class for each Java class.

Instead of doing this, the HotSpot JVM designed an Oop-Klass Model. By OOP, I mean (Ordinary Object Pointer), which is used to represent instance information of an Object. Klass, on the other hand, contains metadata and method information that describes Java classes.

The designers of the HotSopt JVM did not want every object to have a Vtable (virtual function table), so they split the object model into KLass and OOP, where OOP had no virtual functions and Klass had virtual function tables that could be used for method dispatch.

Klass

  • When the JVM loads a class file, it creates instanceKlass, which represents its metadata, including constant pools, fields, methods, and so on, and stores it in the method area. InstanceKlass is a data structure in the JVM;
  • Contains metadata and method information that describes Java classes
  • Provides a c++ type description equivalent to a Java class
  • Implement the function distribution function of Java objects
  • Klass is in the methods area

  • InstanceMirrorKlass:The Class object that represents java.lang.Class is actually an instance of this C++ Class, stored in the heap, with the scientific name mirror Class
  • InstanceRefKlass: Used to represent subclasses of the Java /lang/ref/Reference class
  • InstanceClassLoaderKlass: Used to iterate over classes loaded by a loader
  • Arrays in Java are not static data types, but dynamic data types, that is, generated at runtime. The meta-information of Java arrays is represented by subclasses of ArrayKlass:
  • TypeArrayKlass: An array representing a primitive type
  • ObjArrayKlass: An array used to represent reference types

OOP

  • An Ordinary Object Pointer that represents an instance of an Object
  • OOP is created when a Java program is running with a new object
  • The JVM uses OOP to store the user’s instance data
  • In HotSpot, either instanceOopDesc or arrayOopDesc is used to describe object headers, where arrayOopDesc is used to describe array types.

handle

In addition to OOP, there is a handle outside of kClass in the JVM:

  • Handle is an encapsulation of OOP behavior. Here are some things to note:

  • In most cases, when a JVM accesses a Java class, it must use the _Handle of handle to get OOP, and then use oop to get the corresponding Klass, so that Handle can access OOP functions. If you are calling an oop function that corresponds to a c++ class within the JVM, you do not need to go through handle and can get the specified klass directly through oop

purpose

The designers of the HotSopt JVM did not want every object to have a Vtable (virtual function table), so they split the object model into KLass, where OOP did not have any virtual functions, and KLass, where klass had virtual function tables for method distribution.

To achieve polymorphism, the JVM uses the vtable, Itable technology, which is stored in the Klass model

C++ is different from Java

Vtable: virtual table of all functions of this class (except static, final) and functions of its parent class

Itable: list of functions that implement interfaces of the class

Virtual functions: With Java methods (functions), the concept of a virtual function in c + +, with the virtual keyword, each class will have a virtual function table, the virtual function table will first from the parent class inherited virtual function of the parent table, if the subclasses override the virtual function of the parent (whether rewritten function of virtual function), to which virtual function call, Is based on the actual object (regardless of whether the type of the pointer is the current class, it may be the parent type), which type of object the pointer is pointing to, the virtual function defined in the class of which type is called. Each class has only one virtual function table, which is shared by all objects. In Java, the concept of virtual functions is automatically implemented: polymorphisms — variables of a parent class that call methods of a subclass

Pure virtual functions: like Java abstract methods, the main feature of C++ is that they cannot be used to declare objects. They are abstract classes and are used to ensure that the structure of the program has a direct mapping relationship with the structure data of the application domain. The class with pure virtual function is called abstract class, abstract class can be inherited by the subclass, in the subclass must give the implementation of pure virtual function, if the subclass does not give the implementation of the pure virtual function, then the subclass is also abstract class, only in the subclass does not exist pure virtual function, the subclass can be used to declare the object! Abstract classes can also be used to declare Pointers or references, or in function declarations. Classes with abstract class properties also have constructors and destructors, which are all protected classes. If no implementation of a pure virtual function is given, it cannot be called directly or indirectly in the constructor or destructor of its class. Implementations of pure virtual functions can be defined outside the class declaration.

Abstract class: similar to Java abstract classes, a C++ class that contains both pure and impure virtual functions

Pure virtual class: a class in C++ that has only pure virtual functions

C++ Java
Virtual functions Common function
Pure virtual function Abstract function
An abstract class An abstract class
Pure virtual class interface

Extension: normal functions in C++ are not polymorphic — the function is not called based on the actual object, but instead calls the method of the current variable type directly

InstanceKlass, instanceOopDesc, InstanceMirrorKlass, Class object

HSDB View the TestLog instance structure

Conclusion:

  1. When the JVM loads a class, it obtains the class file that stores the class by its fully qualified name and creates instanceKlass, which represents its metadata and stores it in the method area. And generate the Class object of this Class in the heap, namely the instanceMirrorKlass object.

  2. When an object is new, the JVM creates instanceOopDesc to represent the object and store it in the heap. It’s used to represent instance information of an object, which looks like a pointer but is actually an object hidden in a pointer; InstanceOopDesc corresponds to an object instance in Java

  3. Instead of exposing instanceKlass to Java, HotSpot creates a corresponding java.lang.Class object (corresponding to InstanceMirrorKlass) and calls the Class object a “Java mirror” of the former.

  4. Klass holds references to class objects (_JAVa_mirror is the instanceKlass reference to class objects), and mirroring is considered an important mechanism for good object-oriented reflection and metaprogramming design

  5. The reference returned by the new operation points to instanceOopDesc in the heap. The type pointer in instanceOopDesc points to instanceKlass in the method area, which points to the Class object of the corresponding type. TestLog instance (instanceOopDesc) — >TestLog type information (instanceKlass) — >TestLog Class object.

  • The JDK8 removes the permanent generation and instead uses a meta-space to implement method areas, creating Class instances in the Java Heap
  • Static variable references were stored in instanceKlass prior to JDK6. Static properties exist in instanceMirrorKlass, which is a class object.
  • When a Class object is first loaded, an object of the corresponding Java.lang. Class Class is instantiated in the Java heap memory to access the Class data in the method area
  • The Class object holds type information about a Class

The runtime data area (method area)

Is the Class instance in the heap or in the method area?

JVM reunderstanding (1) Oop-Klass model –HSDB use validation

Kclass model and the JVM class loading process in detail

Direct memory

  • It is not part of the virtual machine run-time data area, nor is it an area of memory as defined in the Java Virtual Machine Specification.

  • Direct memory is an area of memory outside the Java heap that is requested directly from the system.

  • Derived from NIO, Native memory is operated by DirectByteBuffer that exists in the heap

  • In general, access to direct memory is faster than the Java heap. That is, the read and write performance is high.

    • Therefore, for performance reasons, direct memory may be considered in situations where read and write are frequent.
    • Java’s NIO library allows Java programs to use direct memory for data buffers
  • OutOfMemoryError may also result

  • Since the direct memory exists outside the Java heap, its size is not directly limited by the maximum heap size specified by -xnx, but system memory is finite, and the sum of the Java heap and direct memory is still limited by the maximum memory the operating system can give.

  • -xx :MaxDirectMemorySize= Size: set direct memory; If not specified, the default value is the same as the maximum value of the heap-xmx parameter

  • disadvantages

    • Distribution recovery costs are high
    • It is not managed by JVM memory reclamation

Indirect buffer

To read and write files, you need to interact with disks and switch from user mode to kernel mode. In this case, two copies of memory are required to store duplicate data, which is inefficient.

Direct buffer

With NIO, there is only one copy of the direct cache delineated by the operating system that can be accessed directly by Java code. NIO is suitable for reading and writing large files.

advantages

  1. Reduce garbage collection because garbage collection STW
  2. It speeds up replication. When we do IO, we make a copy of the data and send it out of the heap. Using direct memory eliminates this step.
  3. Process data can be shared, reducing object replication between JVMS and making split deployment of JVMS easier.
  4. Memory can be expanded.

disadvantages

  1. OOM is hard to check.
  2. Not suitable for storing complex objects, generally simple objects are suitable.

Direct Memory OOM

public class BufferTest2 {
    private static final int BUFFER = 1024 * 1024 * 20;//20MB

    public static void main(String[] args) {
        ArrayList<ByteBuffer> list = new ArrayList<>();

        int count = 0;
        try {
            while(true) {// Allocate local memory space directly
                ByteBuffer byteBuffer = ByteBuffer.allocateDirect(BUFFER);
                list.add(byteBuffer);
                count++;
                try {
                    Thread.sleep(100);
                } catch(InterruptedException e) { e.printStackTrace(); }}}finally{ System.out.println(count); }}}Copy the code

Running parameters: -XMx20m-xx :MaxDirectMemorySize=10m

Unsafe operates on direct memory

public class DirectMemoryTest {
    public static void main(String[] args) throws IllegalAccessException {
        Byte size=1;
        Field unsafeField = Unsafe.class.getDeclaredFields()[0];
        unsafeField.setAccessible(true);
        Unsafe unsafe = (Unsafe)unsafeField.get(null);

        long l = unsafe.allocateMemory(size);
        unsafe.putAddress(l, 100);
        long readValue = unsafe.getAddress(l);
        System.out.println(readValue);/ / 100}}Copy the code

Pickups – Locally variable gauge slot size

The slot size in the local variable table is 4 bytes

Boolean, byte, shor, char, int, float all occupy 4 bytes in a slot

Long, double occupies 2 slots of 8 bytes

Most of the instructions in Java bytecode do not support the integer types byte, char, and short. None even support Boolean. The compiler extends signed byte and short data to the corresponding int at compile time or run time. Boolean and CHAR zeros are extended to the corresponding int, and arrays of Boolean, byte, short, and CHAR are converted to the corresponding int bytecode instructions. Therefore, Most operations on Boolean, byte, short, and CHAR data actually use the corresponding int type as the operation type

Storage in the heap is stored according to the actual size, the local variable table slot size on the stack is fixed for better operation, fewer instructions.

In-depth understanding of the JVM family

  • 1. In-depth understanding of the JVM (I) – Introduction and architecture
  • 2. In-depth understanding of the JVM II – 1 classloader subsystem
  • 3. In-depth understanding of THE JVM (III) – Runtime data area (virtual machine stack)
  • 4. In-depth understanding of THE JVM (IV) – runtime data area (program counter + local method stack)
  • 5. In-depth understanding of THE JVM (V) – Runtime data area (heap)
  • 6. In-depth understanding of THE JVM (VI) – Runtime data area (methods area)
  • 7. In-depth understanding of the JVM (vii) – execution engine (interpreter and JIT compiler)
  • 8. An in-depth understanding of the JVM (8) – string constant pool