Start with the interview questions

String s = new String("xyz"); How many instances are created?Copy the code

This is a classic interview question, and the “standard answer” I saw in a so-called Java bible is as follows:

Two, a heap area "xyz" and a stack area pointing to "XYZ" s.Copy the code

There are too many slots in this so-called “standard answer”, which we will analyze later.

Although the answer is ridiculous, I don’t think the question itself is meaningful, because it doesn’t define what “create” means, and it doesn’t specify when to “create”. Is it runtime? When does the package not include class loading? Is there a context code context? There’s no definition of what an instance is, is it a Java instance? Or is it a single String instance? Does the package not include C++ instances in the JVM?

Apparently, the problem is a “problem with a problem”. This answer is also a “problematic answer”.

String structure

Before we analyze this, we need to have a general understanding of the String structure in Java in order to draw memory diagrams later:

As you can see from the figure above, the String class has three attributes:

  • Value: char array used to store characters.

  • Hash: The cache String hash code, which defaults to 0 (the hash value of String is computed only when the hashCode method is actually called).

  • SerialVersionUID: used for serialization.

Normal questions and reasonable explanations

Add some qualifiers to the above question stem to get a new question:

String s = new String("xyz"); How many String instances are created?Copy the code

Some of the best answers to this question can be found on the Internet:

Two of them. One is a string literal"xyz"The corresponding instance exists in the globally shared constant pool, and the other is passednewString(String) Creates and initializes the contents (characters) and"xyz"Same instance. Consider that if there is a string in the constant pool, only one string will be created. There will also be a pair in the stacknewS of the String instance that comes out.Copy the code

With stack and heap in mind, and constant pools in mind, I think that’s what most interviewers are looking for in this question, and maybe that’s what they’re looking for.

But this answer is only reasonable, not completely correct.

First of all, I do not understand why many answers always use “constant pool” instead of “string constant pool”. In the Java system, there are actually three constant pools. Three constant pools have different concepts and uses.

Second, even if the “constant pool” is the “String constant pool”, the “String constant pool” stores references to String instances, not strings, which is quite different. And that answer doesn’t take into account the environment in which the code is executed.

These problems will be analyzed below.

Distinguish between variables and instances

Let’s go back to the opening question and the “standard answer” :

Problem: String s =new String("xyz"); How many instances are created? Answer: two, one heap area "xyz" and one stack area pointing to "XYZ" sCopy the code

Obviously the person who wrote the answer didn’t make a distinction between variables and examples. In Java, a variable is a variable, and a variable of type is only for an object instance or null, not the instance itself. The number of declared variables is not necessarily related to the number of created instances.

Here’s an example:

String s1 = "xyz";  
String s2 = s1.concat("");  
String s3 = null;  
new String(s1);  
Copy the code

This code involves three String variables:

  • S1, pointing to 1 of the following String instance
  • S2 points to the same thing as s1
  • S3, with a value of null, does not point to any instances

And three String instances:

  • String instance of the residing String constant corresponding to the “xyz” literal
  • “” The literal corresponds to a String instance of the residing String constant
  • A new instance of String created by new String(String) without any variables pointing to it

Class loading

For String s = new String(“xyz”); How many String instances are created? That’s the question.

It seems that all the answers on the web are analyzing the class-loading process and the actual execution process together.

This seems to be fine, because to execute a code snippet, the class must be loaded, and for the same classloader, it must be loaded at most once.

But let’s look at the bytecode of this code:

New Java /lang/String appears to occur only once, i.e. only one String instance is created. That is, the code in the original problem creates only one new String instance each time it executes. The LDC instruction here simply pushes a reference to a String instance (“xyz”) that was previously created during class loading to the top of the operand stack, not creating a new String instance.

Shouldn’t there be two instances? And when was the String instance created?

We all know that class loading phase is a Java virtual machine will symbols refer to replace the constant pool of direct reference process, according to the JVM specification, conform to the specifications of the JVM implementation should be in the process of class loading is created and hosted a String instance as to correspond to “xyz” literal constants, concrete was conducted in the phase of class loading. This constant is shared globally, and a new String instance needs to be created only if a String with the same content has not previously hosted it.

So you can say that during the class load parsing phase, you’ve actually created a String instance, and when you execute code, you create a new String instance. Of course, there’s nothing wrong with you putting the two together.

The JVM to optimize

The above discussion is only for the Java language and The Java Virtual machine as defined by the specification. Conceptually, this is true, but the actual JVM implementation could be more optimized, and it is possible that the code snippet in the original problem might not create an entire String instance (no space allocated) during actual execution.

To say “standard” without context code is to bully.

Let’s take a look at this code:

Running this code constantly creates strings that eat memory, and then frequently causes GC.

We will add -xx :+PrintGC -xx: -doescapeAnalysis to print logs and turn off escape analysis.

Let’s run it:

As a result, the constant creation of strings eats memory and leads to frequent GC.

Let’s now change -xx: -doescapeAnalysis to -xx :+DoEscapeAnalysis and run this code again:

A magical thing happened. No more GC logs were posted as I continued running. Doesn’t it eat memory when you create strings?

The reality is that after being optimized by HotSpot VM, the newString() method does not create newString instances. This will naturally not eat memory and will no longer trigger GC.

Now, if you look at the problem at the beginning, you can simply say String s = new String(“xyz”); Will two String instances be created?

I just gave an example of escape analysis, HotSpot VM also has many optimizations like method inlining, scalar substitution, and garbage code elimination.

klass-oop

Oop instances in the JVM should not be ignored if “Java” instances are not included.

In order to better clarify this later, I need to add some knowledge of the Klass-OPP model. First make a convention, as long as the JVM specific implementation of the content is based on THE Jdk8 HotSpot VM expansion.

HotSpot VM is based on C++, and C++ is an object-oriented language with basic object-oriented characteristics, so the simplest way to represent objects in Java is to generate a C++ class for each Java class. Instead of doing this, HotSpot VM designs a Klass-OOP model.

Klass, which is the existence of Java class meta-information in the JVM. When a Java class is loaded by the JVM classloader, it exists in the JVM as klass.

Oop, which is the existence of Java objects in the JVM. Each time a new object is created, an OOP object of the corresponding type is created within the JVM.

InstanceOopDesc indicates a non-array object, and arrayOopDesc indicates an array object.

While objArrayOopDesc represents a reference array object, typeArrayOopDesc represents a primitive array object.

For example: an instance of the String class in Java will have a corresponding instanceOopDesc instance in the JVM.

String constant pool

In the Java architecture, there are three types of constant pools:

  • Constant pool in class bytecode: exists on hard disk. There are two main types of constants: literals and symbolic references.

  • Runtime constant pool: part of the method area. When we talk about constant pools, we refer to this area: the run-time constant pool in the method area.

  • String constant pool: exists in the heap area. This constant pool is a StringTable at the JVM level, storing only references to instances of Java.lang. String, not the contents of String objects. When we say that a string is in the string constant pool we’re saying that there’s a reference to it in this StringTable, whereas if it’s not in there we’re saying that StringTable has no reference to it.

Today, we are going to look at the string constant pool.

String constant Pool, that is, String Pool. The equivalent class in the JVM is StringTable, and the underlying implementation is a Hashtable. It’s using the idea of hashing.

The following code adds a string method to the string constant pool. It’s C++ code, but I’m sure anyone who has studied Java can understand it, or at least understand what this code does. The hash value generated by the String content + length is used to locate the index subscript, and then the instanceOopDesc corresponding to the Java String instance is encapsulated as HashtableEntry and stored in the constant pool as a storage structure.

With the knowledge of string constant pools completed, let’s return to the question at the beginning of this article:

String s = new String(“xyz”); How many instances are created?

Let’s draw a memory diagram that omits two instanceOopDesc instances of String.

The answer is not hard to come by:

If you include C++ instances in the JVM, there are two Java String instances, two String instances for instanceOopDesc, and one char[] array for typeArrayOopDesc. That adds up to five, or two String instances plus three OOP instances.Copy the code

conclusion

String s = new String(“xyz”); How many instances are created?

From the above analysis, we can find that every attributive is added to the title of the question, the question will have a different answer.

Whether class loading is considered, whether JVM optimization is considered, whether corresponding OOP instances are included, etc., each point is worth talking about.

Next time someone asks you, you might as well share this article with them.

Write in the last

For this article, I read a lot of blogs of @Rednaxelafx and Zhou Zhiming and learned a lot in the process. Thank you for your contribution to the popular science and development of JVM in China!

There is a very interesting story, I’m find “how to understand the String” HSDB with relevant information, see a write a good article, exclaim, there are so many domestic low-key a great god, and then add the article number next to the public, found that the founder of the great god is PerfMa “cold spring” Li Jiapeng predecessors, offended offended!

The last

I am untalented, the article will inevitably have mistakes, if you find, welcome to put forward, I will correct it.

Thank you for reading, your likes and comments are all encouragement and support to me.

If you have anything you want to communicate with me, you can follow my wechat official account “CoderW”. Welcome and thank you for your attention!


The code involved: github.com/xiaoyingzhi…

The JVM Spec Java SE 8 edition:docs.oracle.com/javase/spec…

Refer to the article: isfeasible. Cn/posts/view /…

Refer to the article: www.iteye.com/blog/rednax…

Refer to the article: lovestblog. Cn/blog / 2014/0…