This is the seventh day of my participation in the August More text Challenge. For details, see:August is more challenging

preface

I encountered a string comparison problem that I would contact if I just learned Java:Usually, according to”== compares addresses and equals compares values“A theorem will give you the answer. However, there is something special about String. The addresses of two strings with the same value generated by new String(String) are not equal. The addresses of two strings with the same value generated by other methods are equal.

The code is as follows:

	// The first way to create a string is to assign literal values
	String str1 = "abc";
	String str2 = "abc";
	// The second way to create a string
    String str3 = new String("xyz");
    String str4 = new String("xyz");
    System.out.println(str1 == str2); //true
    System.out.println(str3 == str4); //false
Copy the code

The constant pool and the heap are involved in the reason why two pairs of equivalent strings do not produce the same result when creating the same string.

In the first method, the literal “ABC” is placed in the constant pool, and then str1 and str2 both point to “ABC” in the constant pool, so they have the same address. The second way to create a String is to put “xyz” in the constant pool, and then use the constructor to copy the “xyz” in the constant pool into the heap to generate a new String. There is no relation between the “xyx” in the constant pool, so the two variables refer to two different variables in the heap, so the two variables have different addresses.

What about the intern()? What’s the connection to constant pools?

Constant pool

A constant pool is where literal, symbolic, or direct references are stored. The constant pool is divided into class constant pool and runtime constant pool.

A class constant pool

The class constant pool is a repository for literal and symbolic references in a class at compile time. The string “ABC” above is a literal; Symbolic references are fully qualified names for classes and interfaces, names and descriptors for fields, and names and descriptors for methods.

As shown in figure:Here is the name and descriptor, or symbolic reference, of the new String(String) method in the constant pool.

Runtime constant pool

When we talk about constant pools, we mean run-time constant pools. During the resolution phase of class loading, the class constant pool is loaded into memory (before JDK1.7 in the method area, now in the Heap) and the symbolic reference is resolved into a direct reference that points to the corresponding method/class in memory based on the description of the method/class. The runtime constant pool is dynamic, and new variables can be added to the constant pool at run time.

intern()

Look at the description of the intern() method:When a string calls intern(), it returns a string object in the pool if there are other string objects equal to the string. If not, the string is added and a reference to the string is returned. The String pool is privately maintained by the String class.

Here we introduce it againString poolingThis concept.

String pooling

The string pool holds references to string objects in the constant pool, not string objects. A string created by the first method of literal assignment is placed in the constant pool, which stores a reference to the string object. When a string is created again in the constant pool, it is first checked from the string pool to see if there is an equivalent reference to the string. If there is, it points directly to the corresponding object.

The string created in the second method looks in the string pool for a string equivalent to the construction parameter to determine whether a new string needs to be created in the constant pool, and then copies the constant pool string to create a new string in the Heap.In the heap, as shown, a new string named Original is created in the constant pool and copied to generate a new string in the heap. The comments also mention that unless you need an explicit copy of the string, you don’t need to use this constructor because strings are immutable.

Here we test the string pool using intern() :

    public static void main(String[] args) {
    	  // The first part tests
        String str1 = "abc";
        String str2 = new String("abc");
        System.out.println(str1.intern() == str1); //true
        System.out.println(str1.intern() == str2); //false
        System.out.println(str1.intern() == str2.intern()); //true
        // The second part tests whether the reference enters the string pool after the string is created by char[]
        String str3 = new String(new char[] {'g'.'h'});
        String str4 = "gh";
        System.out.println(str3.intern() == str3); //false
        System.out.println(str3.intern() == str4); //true
        // The third part tests whether the reference to the string created by char[] enters the string pool after calling intern()
        String str3 = new String(new char[] {'g'.'h'});
        str3.intern();
        String str4 = "gh";
        System.out.println(str3.intern() == str3); //true
        System.out.println(str3.intern() == str4); //true
    }
Copy the code

The above three parts of the code are independent tests.

Part 1: Str1 creates ABC in the constant pool and puts the reference into the string pool. Str2 copies ABC in the constant pool and creates a new string in the heap. Intern () gets from the string pool the ABC reference to str1 in the constant pool.

Part 2: Str3 creates strings in the heap via char[], not in the constant pool, so references to gh are not automatically put into the string pool. Str4 creates gh in the constant pool, so the string pool holds a reference to str4 gh. Intern () gets from the string pool the gh reference to str4 in the constant pool.

Part 3: Str3 creates the string in the heap with char[], not in the constant pool, so the reference to gh is not automatically put into the string pool, but it calls intern() to manually add the reference to str3’s gh to the string pool. When str4 is created using a literal assignment and a reference to gh is queried in the string pool, str4 refers to the gh reference to str3. What the intern() gets from the string pool is the gh reference to str3 in the heap.

It also follows from the above code that intern() can put string references created in the heap into the string pool that the string pool does not have equivalent references.

This also explains why strings are immutable. This ensures that multiple references can simultaneously point to the same object in the string pool. If the string is mutable, and one reference operation changes the value of the object, it has an effect on the other references, which is obviously not allowed.

Get to the point

Back to the question on Zhihu. We create a “string” in the constant pool and put its reference in the string pool. Str1 calls intern() and returns a reference in the constant pool, while str1 points to a reference in the heap, so the output is false.

ToString () in StringBuilder creates a string with char[] :After abcdef is created in the heap, str2 calls intern() to put the heap reference into the string pool and returns the reference, which is the same string object that str2 points to in the heap, so the output is true.

conclusion

Sometimes small problems in Java will diverges a lot of knowledge, whether it is the underlying or JVM theory learning, combined with application cases will understand more profound. As mentioned in this article, constant pools are part of the class file structure and class loading theory learning.