preface

In the real world, everyone is an object, and there is a complete life cycle from birth to growth to death. In the computer world, objects also have a life cycle, including object creation, object memory layout, object access, and object destruction. This is true of objects in C++, and this is true of objects in Java. Except in C++, the life cycle of objects is completely controlled by the programmer, including creation, use, and recycling. In Java, the programmer is only responsible for creating and using objects, and the Java virtual machine takes over. Creating objects is easier than destroying them. As Java programmers, just because virtual machines do object recycling for us doesn’t mean we don’t have to understand a full object life cycle. Only by understanding the process from birth to death can we love and hate an object deeply.

Object creation

1. The process of creating objects

Creating an object is mostly a new keyword from the programmer’s point of view, but in a virtual machine it is much more than that. It includes at least the following stages:

  • The class file in which the object is loaded (the compiled.class file) goes into memory
  • Allocate a block of memory on the heap the same size as the class object
  • Initialize the memory value of the object to 0
  • Initializes the object header, including, the hash code of the object, and the GC generation age of the object
  • Call the init function of the class to initialize objects at the Java level

The flow chart is as follows:

2. How to allocate object memory

Given that Java generates objects and allocates memory, how does a virtual machine allocate a chunk of memory from the memory pool for newly created objects? There are two main methods to implement virtual machines:

  • Pointer collision method

If the memory in the Java heap is perfectly neat, all used memory is placed on one side, free memory is placed on the other, and a pointer is placed in the middle as an indicator of the dividing point. The allocated memory simply moves that pointer an equal distance to the size of the object toward the free space. This allocation is called a “Bump the Pointer.”

  • Free list method:

If memory in the Java heap is not neat, used memory and free memory cross each other, that is simply no way pointer collision, the virtual machine, you must maintain a list of records on which memory blocks are available, at the time of distribution from the list to find a large enough space division to the object instance, and update the list of records, This type of allocation is called a Free List.

The above two methods only consider the allocation of object memory, in fact, there is also memory reclamation, which is also required by the virtual machine. Memory allocation and deallocation are discussed a lot in C/C++, but they are not important for Java object creation. Just know that there are two ways to allocate object memory, each of which has advantages and disadvantages. Say more, object memory allocation and recycling design, I only take Nginx, Nginx will play memory resource management this incisively and vividly, interested can look at the relevant source code.

Layout of objects

Java objects not only contain the Instance Data we defined in the class, they also need to contain some additional information needed by the virtual machine and empty Padding, i.e. Header, Instance Data, and Padding, as shown below:

  1. The size of the Mark Word in the object header is 32 – or 64-bit on a 32 – or 64-bit VM, and the type of the pointer is the same. The fields in the object header have the following features:
  • Mark Word is designed as a flexible data structure to store as much information as possible in a very small space. It reuses its storage space according to the state of the object. See the following figure for specific reuse effects in different states

  • The type pointer points to the Class object of the Class to which the object belongs, but this field is not required in all virtual machines. How the virtual machine implements object access is discussed in the next section.
  • In addition, if the object is an array, the object header also needs to hold information about the length of the array. This allows the virtual machine to determine the size of the object from the object metadata.
  1. The instance data of an Object is visible and controlled by the user, including those inherited from all parent classes (up to Object) and those defined in subclasses. Note that all static variables, local variables, and functions are not included in the instance data in the object’s memory:

Static variables and functions belong to classes. Each class has only one copy, which is stored in the method area when the class is loaded. Functions also exist in the method area as metadata for classes (and are subject to just-in-time compilation). Local variables are dynamically allocated in the stack during method execution, primarily in the local variable table.

  1. Alignment padding does not necessarily exist and has no special meaning. It simply acts as a placeholder. Since HotSpot VM’s automatic memory management system requires that the object’s starting address be an integer multiple of 8 bytes, in other words, the object’s size must be an integer multiple of 8 bytes. The object header is exactly a multiple (1 or 2) of 8 bytes, so when the object instance data part is not aligned, it needs to be filled by alignment.

There is a widely read book in C++ called “understanding the C++ object model in depth” which is devoted to C++ object model. Java is much easier than C++ objects. To calculate the sizeof an object in C++, use sizeof. In Java, however, it is not so direct. If you want to learn more about the size of Java objects in memory, see how much memory a Java object occupies. In fact, once you know the composition of the object in memory (headers, instance data, and padding) and the size of the various field types in the instance data, you can easily calculate the overall size of the object.

Object access

Just as the people who create the natural world primarily create objects in the computer world primarily in order to use them. Java objects are stored on the heap, and their references are mainly stored on the stack, where the object’s address in the heap is stored. The object is accessed by a virtual machine (vm). The object is accessed by a virtual machine (VM).

  • Handle: Reference stores the pointer to the data pointer of the object instance, and the object instance can be accessed through two times of addressing. The specific process is shown in the following figure:

The advantage of this scheme is that the reference stores the stable address of the handle. When the object is moved in memory, only the instance data pointer in the handle needs to be modified without modifying the reference.

  • Direct Pointers: Reference stores object instance data, as most people understand it. At this point, the pointer to the object’s type data is put into the Mark Word. The specific visit flow chart is as follows:

The advantage of this approach is that it is faster, it saves the time overhead of a pointer location, and since objects are accessed very frequently in Java, this overhead can add up to a very significant execution cost.

When you are asked what is stored in reference, don’t say it is the address of the object.

Object destruction

This is the most complex part of an object’s life cycle, and the essence of Java. In C++, when a new object is no longer needed, you need to manually DELETE it. In Java, the Collection is performed by virtual machines. The following two problems need to be solved in the MEMORY of Gargbage Collection (GC) objects:

When will it be recycled?

An object cannot be reclaimed while it is still in use. Only dead objects can be reclaimed. How do you determine if an object is no longer in use?

  1. Reference counter method

  • Principle: to add a reference counter to an object, the counter value increases by 1 every time a reference is made to it. When a reference is invalid, the counter value is reduced by 1; An Object whose counter is 0 at any time is no longer usable, as Object A in the figure above shows.
  • Advantages: simple principle, simple use, high efficiency.
  • Disadvantages: It is difficult to solve the problem of objects looping over each other, unless circular references are disabled.
  • Practice: Microsoft COM technology, Python language, etc.
  1. Accessibility determination method

  • Principle: Through a series of objects called “GC Roots” as the starting point, the search starts from these nodes and goes down the path called the Reference Chain. When an object is not connected to GC Roots by any Reference Chain (in terms of graph theory, Object5, Object6, and Object7 in the figure above prove that the object is not available.
  • Advantages: Can solve the problem of circular reference, meet the needs of various situations.
  • Disadvantages: complex implementation, traversal efficiency is low.
  • Practice: Mainstream Virtual machines in Java.

How to recycle

Once you find the object that needs to be recycled, how to recycle it is also a technical task. On the one hand, it affects the reclamation performance of virtual machines, and on the other hand, it also affects the allocation of object memory. Therefore, the recycling algorithm is very important, mainly including the following algorithms:

  • Mark-sweep algorithm: it is divided into two phases: “Mark” and “Sweep”. First, all objects that need to be reclaimed are marked, and all marked objects are reclaimed after the marking is complete.
  • Copying algorithm: A method for dividing available memory into two equally sized pieces by capacity and using only one piece at a time. When this area of memory is used up, the surviving objects are copied to the other area, and the used memory space is cleaned up again.
  • Mark-compact: The marking process is the same as the mark-clean algorithm, but instead of cleaning up the recyclable objects directly, the next step is to move all surviving objects to one end and then clean up memory directly beyond the end boundary.
  • Generation-collection algorithm: Collection algorithm, this algorithm does not have any new idea, only according to the different life cycle of the object into several pieces of memory. Typically, the Java heap is divided into the new generation and the old generation, so that the most appropriate collection algorithm can be adopted based on the characteristics of each generation.

Because these algorithms involve a lot of content, I will only throw out the concepts here, and I will arrange an article on virtual machine garbage collection, which is also the most important thing in Java virtual machines. I don’t want to cover it in a few words, nor do I want to cover it in this article.

This may surprise readers, but in one of our last company’s live products (implemented in C++), we almost didn’t allow the use of new (procedural oriented programming) in our projects, for fear of accidentally leaking memory. This is really treading on thin ice, but I think it’s a bit of a throwback to the benefits of C++ object orientation.

conclusion

Chen shuo, the author of muduo, a multi-threaded network library in C++, mentioned in his “Linux multi-threaded server programming” that “it is easy to create objects, but difficult to destroy them”. After seeing the object life cycle management in Java, he finally realized that this was true. Thanks to the great Java inventor, the easy part of object lifecycle management was given to the user and the hard part was reserved for himself.