From: http://www.hollischuang.com/archives/2517

String is the most important data type in Java after the basic data type. A lot of people would think he was simple. But there are a lot of questions about strings. Here are two random interview questions and see if you can answer them all correctly:

Q1: String s = new String(“hollis”); Several objects are defined.

Q2: What do you think of the String intern method

A String b String C String D String

A1: If “Hollis” already exists in the constant pool, the object will be directly referenced, that is, only one object will be created. If “Hollis” does not exist in the constant pool, the object will be created first and then referenced, that is, two objects will be created.

When a String instance STR calls intern(), Java looks for String constants of the same Unicode in the constant pool and returns a reference to them. If not, it adds a String of Unicode equal to STR to the constant pool and returns a reference to it.

There seems to be nothing wrong with these two answers, but if you think about it, there seems to be something wrong. The new String also checks the constant pool and references it if it does not exist. If it does not exist, create one in the constant pool. Doesn’t the following code make sense?

String s = new String("Hollis").intern(); Copy the codeCopy the code

If, every time we create a string with new, we check the string pool and return. Should the following code also print true?

    String s1 = "Hollis";
    String s2 = new String("Hollis");
    String s3 = new String("Hollis").intern(); System.out.println(s1 == s2); System.out.println(s1 == s3); Copy the codeCopy the code

However, the above code output is (base jdk1.8.0_73) :

false
trueCopy the codeCopy the code

I don’t know if the smart reader is a little confused after reading this code. What’s going on?

Don’t worry, just listen to me slowly.

Literals and run-time constant pools

The JVM makes some optimizations when instantiating string constants to improve performance and reduce memory overhead. To reduce the number of strings created in the JVM, the string class maintains a pool of string constants.

In the methods section of the JVM runtime region, there is an area called the runtime constant pool, which is used to store various literal and symbolic references generated at compile time.

If you are familiar with Class file structures or have decomcompiled Java code, you probably know that after Java code is compiled by Javac, the file structure contains a portion of the Constant pool. For example:

public static void main(String[] args) {
    String s = "Hollis"; } Duplicate codeCopy the code

After compilation, the constant pool contents are as follows:

 Constant pool:
   #1 = Methodref #4.#20 // java/lang/Object."
        
         ":()V
        
   #2 = String #21 // Hollis
   #3 = Class #22 // StringDemo
   #4 = Class #23 // java/lang/Object.#16 = Utf8 s.#21 = Utf8 Hollis
   #22 = Utf8 StringDemo
   #23 = Utf8 java/lang/ObjectCopy the codeCopy the code

Some of the more important things about the constant pool in the Class file above:

   #16 = Utf8 s
   #21 = Utf8 Hollis
   #22 = Utf8 StringDemoCopy the codeCopy the code

Of the constants above, s is the symbolic reference mentioned earlier, and Hollis is the literal. The contents of the constant pool portion of the Class file are loaded by the runtime constant pool at runtime. For details about literals, refer to Java SE Specifications

The new String creates several objects

String s = new String(“Hollis”); Create object situation.

At compile time, the symbol reference S and the literal Hollis are added to the constant pool of the Class file, and at Class load time, the “literal” is added to the String constant pool. , these two constants go into the constant pool.

However, this “in” phase does not load all constants defined in the class directly. Instead, it does a comparison, and if the string to be added to the string constant pool already exists, then no string literals need to be added.

So, when we say < if “hollis” already exists in the constant pool, then a direct reference, i.e. only one object will be created at this point, > we’re talking about how this string literal is created in the string pool.

Now that we’re done with compile time, it’s time to run time, and at run time, new String(“Hollis”); When executed, a string object is created in the Java heap, and the corresponding string literal is stored in the string constant pool. But String s = new String(“Hollis”); The symbolic reference to the object s is stored on the Java virtual machine stack. It holds a reference to the string object just created in the heap.

So, you can see why the following output is false.

String s1 = new String("Hollis");
String s2 = new String("Hollis"); System.out.println(s1 == s2); Copy the codeCopy the code

Because == compares the addresses of objects s1 and S2 create in the heap, of course. But if you use equals, then you’re comparing the contents of the literal, and you get true.

The relationship between the Java heap and the string constant pool is also different in different JDK versions, and is represented here as two separate physical regions for convenience. For details, see the Java Virtual Machine specifications.

So String s = new String(“Hollis”); The answer to creating a few objects is clear.

The “object” in the constant pool is determined at compile time and created when the class is loaded. If the string constant is already in the constant pool when the class is loaded, this step is omitted. Objects in the heap are determined at runtime, created when the code executes to new.

Dynamic extension of constant pools at run time

Compile-time literal and symbolic references are an important part of the run-time constant pool, but not the whole story. So there’s another case where you can add constants at run time like in the run-time constant pool. That’s the intern method of String.

When a String instance calls intern(), Java looks for String constants of the same Unicode in the constant pool and returns a reference to them if there is one. If not, it adds a String of Unicode equal STR to the constant pool and returns its reference.

Intern () has two functions. The first is to put a string literal into the constant pool (if the pool doesn’t have one), and the second is to return a reference to the constant.

Let’s take a look at that confusing example at the beginning:

    String s1 = "Hollis";
    String s2 = new String("Hollis");
    String s3 = new String("Hollis").intern(); System.out.println(s1 == s2); System.out.println(s1 == s3); Copy the codeCopy the code

You could easily say String s1 = “Hollis”; S3 = new String(“Hollis”).intern(); They do the same thing. Both define a string object, store its string literal in the constant pool, and return a reference to that literal to the defined object reference.

For String s3 = new String(“Hollis”).intern(); Without calling intern, S3 points to a reference to the object that the JVM created in the heap (s2 in the figure). But when the intern method is executed, S3 will point to that string constant in the string constant pool.

Since both s1 and s3 are references to literals in the string constant pool, s1==s3. However, the reference to S2 is an object in the heap, so S2! = s1.

The correct use of intern

String s3 = new String(“Hollis”).intern(); In, actually intern is redundant?

If you don’t use intern, the Hollis literal will be loaded into the Class file’s constant pool and added to the runtime constant pool, so why bother? In what situations should you use intern?

Before explaining this, let’s take a look at the following code:

    String s1 = "Hollis";
    String s2 = "Chuang";
    String s3 = s1 + s2;
    String s4 = "Hollis" + "Chuang"; Copy the codeCopy the code

After decompiling, the code is as follows:

    String s1 = "Hollis";
    String s2 = "Chuang";
    String s3 = (new StringBuilder()).append(s1).append(s2).toString();
    String s4 = "HollisChuang"; Copy the codeCopy the code

As you can see, the same string concatenation is implemented differently in S3 and S4 after compiler compilation. S3 is converted to StringBuilder and Append, while S4 is concatenated directly into new strings.

If you’re interested, you can also see that String s3 = s1 + s2; S1 = “Hollis”; s1 = “Hollis”; s1 = “Hollis”; s1 = “Hollis”; s1 = “Hollis”; s1 = “Hollis”; And String s2 = “Chuang”; HollisChuang is not in the constant pool.

If the code only has String s4 = “Hollis” + “Chuang”; , the constant pool will have only HollisChuang and no “Hollis” and “Chuang”.

The reason for this is that constant pools are supposed to hold defined literals. That is, for concatenation of strings, pure literals and literal concatenations, the concatenation result is saved as a constant to the string.

If one of the parameters in string concatenation is a variable that is not a literal, the entire concatenation is compiled to StringBuilder.append, in which case the compiler has no way of knowing its determinate value. This can only be determined at run time.

So, with this feature, intern comes in handy. A lot of times, we get strings in programs that can only be determined at run time, but not at compile time, so they can’t be added to the constant pool at compile time.

In this case, for strings that might be used frequently, use intern, and each time the JVM runs through this code, it returns a reference to that literal directly from the constant pool, thereby reducing the number of string objects created.

As an example of String#intern:

static final int MAX = 1000 * 10000;
static final String[] arr = new String[MAX];

public static void main(String[] args) throws Exception {
    Integer[] DB_DATA = new Integer[10];
    Random random = new Random(10 * 10000);
    for (int i = 0; i < DB_DATA.length; i++) {
        DB_DATA[i] = random.nextInt();
    }
    long t = System.currentTimeMillis();
    for (int i = 0; i < MAX; i++) {
         arr[i] = new String(String.valueOf(DB_DATA[i % DB_DATA.length])).intern();
    }

    System.out.println((System.currentTimeMillis() - t) + "ms"); System.gc(); } Duplicate codeCopy the code

In the above code, we clearly know that there are many repetitions of the same string, but the value of the string is determined only at run time. Therefore, we can only add it to the constant pool by using the intern display, which can reduce the number of strings created repeatedly.

conclusion

If a new String does not exist, create a new String in the constant pool. Then create a new String in the constant pool. Doesn’t the following code make sense?

String s = new String("Hollis").intern(); Copy the codeCopy the code

The reference to the literal object is returned directly to the defined object. This process does not create another String in the Java heap.

Indeed, the above code is written using intern, which makes no sense. The literal Hollis is loaded into the runtime constant pool as a compile-time constant.

The above confusion is caused by a lack of understanding of the concepts of string constant pools and literals. Some questions are actually like this, a single question, you know the answer, but a number of questions integrated together. In the final analysis, the understanding of knowledge still stays at the point, without a plane.

The content of this article is welcome to discuss, if there are biased welcome to correct, the example is for the purpose of explaining deliberately, if there are inappropriate hope understanding.


By Hollis_ Public account Hollis


Link: https://juejin.cn/post/6844903625114796046


Source: Nuggets


Copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please indicate the source.