There are eight basic types in the JAVA language and a special type, String. These types provide a constant pool concept in order to make them faster and more memory efficient during execution. A constant pool is like a cache provided at the JAVA system level.

The eight basic types of constant pools are system-coordinated, with the String type being a special case. It can be used in two main ways:

  • It’s in double quotation marksStringObjects are stored directly in the constant pool.
  • If it’s not in double quotation marksStringObject that can be usedStringTo provide theinternMethods. The intern method queries the string constant pool to see if the current string exists, and if it does not, puts the current string into the constant pool

So let’s focus on the String#intern method.

Let’s take a closer look at how it works.

1. JAVA code

/** * Returns a canonical representation for the string object. * <p> * A pool of strings, initially empty, is maintained privately by the * class <code>String</code>. * <p> * When the intern method is invoked, if the pool already contains a * string equal to this <code>String</code> object as determined by * the {@link #equals(Object)} method, then the string from the pool is * returned. Otherwise, this <code>String</code> object is added to the * pool and a reference to this <code>String</code> object is returned. *  <p> * It follows that for any two strings <code>s</code> and <code>t</code>, * <code>s.intern()&nbsp; ==&nbsp; t.intern()</code> is <code>true</code> * if and only if <code>s.equals(t)</code> is <code>true</code>. * <p> * All Literal strings and String-valued expressions are * interned. String literals are defined in section 3.10.5 of the * <cite>The Java&trade; Language Specification</cite>. * * @return a string that has the same contents as this string, but is * guaranteed to be from a pool of unique strings. */ public native String intern();Copy the code

As seen in the String#intern method, this method is a native method, but the comments are very clear. “If the current string exists in the constant pool, the current string is returned directly. If the string is not in the constant pool, it will be put into the constant pool and then returned.

2. Native code

After JDK 7, Oracle took over the JAVA source code and closed it to the public. According to JDK lead developers, openJdk7 and JDK7 use the same main code, with minor changes to the branch code. So you can directly trace the source code of openJdk7 to explore the implementation of intern.

####native: openjdk7\ JDK \ SRC \share\native\ Java \lang\ string.c

Java_java_lang_String_intern(JNIEnv *env, jobject this)  
{  
    return JVM_InternString(env, this);  
}  
Copy the code

\openjdk7\hotspot\src\share\vm\prims\jvm.h

/* 
* java.lang.String 
*/  
JNIEXPORT jstring JNICALL  
JVM_InternString(JNIEnv *env, jstring str);   
Copy the code

\openjdk7\hotspot\src\share\vm\prims\jvm.cpp

// String support ///  
JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str))  
  JVMWrapper("JVM_InternString");  
  JvmtiVMObjectAllocEventCollector oam;  
  if (str == NULL) return NULL;  
  oop string = JNIHandles::resolve_non_null(str);  
  oop result = StringTable::intern(string, CHECK_NULL);
  return (jstring) JNIHandles::make_local(env, result);  
JVM_END   
Copy the code

\openjdk7\hotspot\src\share\vm\classfile\symbolTable.cpp

oop StringTable::intern(Handle string_or_null, jchar* name, int len, TRAPS) { unsigned int hashValue = java_lang_String::hash_string(name, len); int index = the_table()->hash_to_index(hashValue); oop string = the_table()->lookup(index, name, len, hashValue); // Found if (string ! = NULL) return string; // Otherwise, add to symbol to table return the_table()->basic_add(index, string_or_null, name, len, hashValue, CHECK_NULL); }Copy the code

\openjdk7\hotspot\src\share\vm\classfile\symbolTable.cpp

oop StringTable::lookup(int index, jchar* name, int len, unsigned int hash) { for (HashtableEntry<oop>* l = bucket(index); l ! = NULL; l = l->next()) { if (l->hash() == hash) { if (java_lang_String::equals(l->literal(), name, len)) { return l->literal(); } } } return NULL; }Copy the code

JAVA uses jni to call the c++ intern method of the StringTable. The intern method of the StringTable is similar to the JAVA HashMap method, but it cannot be automatically expanded. The default size is 1009.

Note that the String Pool of String is a fixed size Hashtable. The default value is 1009. If you put too many strings into the String Pool, it will cause Hash collisions and the list will be very long. The immediate effect of long lists is that performance deteriorates significantly when string.Intern is called.

In JDK6, stringTables are fixed, with a length of 1009, so if there are too many strings in the constant pool, the efficiency drops quickly. In JDK7, the length of a StringTable can be specified with a single argument:

  • -XX:StringTableSize=99991

Many JAVA programmers create objects like String s = new String(” ABC “). The main purpose of these questions is to test the programmer’s mastery of the constant pool of string objects. The above statement creates two objects, the first object is an “ABC” String stored in the constant pool, and the second object is a String in the JAVA Heap.

Take a look at this code:

public static void main(String[] args) {
    String s = new String("1");
    s.intern();
    String s2 = "1";
    System.out.println(s == s2);

    String s3 = new String("1") + new String("1");
    s3.intern();
    String s4 = "11";
    System.out.println(s3 == s4);
}
Copy the code

The printed result is

  • Under the jdk6 hasfalse false
  • Under the jdk7false true

S3. Intern (); s3. Statement to String s4 = “11”; behind The s.i ntern (); Put it in String s2 = “1”; behind What is the result

public static void main(String[] args) {
    String s = new String("1");
    String s2 = "1";
    s.intern();
    System.out.println(s == s2);

    String s3 = new String("1") + new String("1");
    String s4 = "11";
    s3.intern();
    System.out.println(s3 == s4);
}
Copy the code

The printed result is:

  • Under the jdk6 hasfalse false
  • Under the jdk7false false

####1, jdK6

Jdk6 has figure

Note: The green line in the figure represents the content direction of the string object. The black line represents the address pointing.

As shown above. First, in JDK6, all of the above prints are false because the constant pool in JDK6 is placed in the Perm area, which is completely separate from the normal JAVA Heap area. As mentioned above, strings declared in quotes are generated directly in the String constant pool, whereas strings generated from new are placed in the JAVA Heap. So comparing an object address in a JAVA Heap area to an object address in a String constant pool is definitely not the same, even if the string.intern method is called.

####2, jdK7

Let’s talk more about jdK7. To be clear, in Jdk6 and previous versions, the constant pool of strings is placed in the Perm section of the heap. The Perm section is a static section of the heap that stores information about loaded classes, constant pools, method fragments, etc. The default size is only 4m. Once the extensive use of intern is in the constant pool can directly produce Java lang. OutOfMemoryError: PermGen space. So in the JDK7 version, the string constant pool has been moved from the Perm area to the normal Java Heap area. One of the main reasons for the move is that the Perm region is too small, and of course jdK8 has removed the Perm region directly and created a new meta region. It seems that the JDK developers decided that the Perm area was no longer suitable for the development of JAVA.

I’ll explain why this is printed after the string constant pool is moved to the JAVA Heap area.

Figure 1 jdk7

  • In the first code, look at the S3 and S4 strings.String s3 = new String("1") + new String("1");This code now generates two final objects, which are “1” in the string constant pool and objects pointed to by s3 references in the JAVA Heap. And two anonymous ones in betweennew String("1")We don’t talk about them. The s3 reference object content is “11”, but there is no “11” object in the constant pool.
  • The followings3.intern();This line of code is to put the “11” String from S3 into the String constant pool. Since there is no “11” String in the constant pool, the normal practice is to generate an “11” object in the constant pool as shown in the jdK6 figure. The key point is that the constant pool is no longer in the Perm area in JDK7. Instead of storing an object in the constant pool, you can store references directly in the heap. This reference refers to the object referenced by S3. That is, the reference address is the same.
  • The lastString s4 = "11";In this code, “11” is declared to display, so it will be created directly in the constant pool. When it is created, it will find that the object already exists, which is a reference to the S3 reference object. So the S4 reference points to the same as s3. So the final comparisons3 == s4Is true.
  • Now look at the S and S2 objects.String s = new String("1");The first line of code generates two objects. “1” in the constant pool and string objects in the JAVA Heap.s.intern();The s object went to the constant pool and found “1” already in the constant pool.
  • The followingString s2 = "1";This code generates a reference to S2 to the “1” object in the constant pool. The result is that the reference addresses of S and S2 are significantly different. The picture is very clear.

Jdk7 figure 2

  • Look at the second piece of code, from the second picture above. The change between the first code and the second code iss3.intern();The order is putString s4 = "11";After the. So, execute firstString s4 = "11";When s4 is declared, there is no “11” object in the constant pool. After execution, “11” object is the new object generated by S4 declaration. And then executes3.intern();The “11” object in the constant pool already exists, so references to S3 and S4 are different.
  • In the s and S2 codes in the second code,s.intern();The object pool is executing the first line of codeString s = new String("1");The “1” object has already been generated. The s2 declarations below all reference addresses directly from the constant pool. The reference addresses of S and S2 will not be equal.

#### Summary As you can see from the example code above, there are some changes to intern and constant pools in the JDK7 version. It mainly includes 2 points:

  • Move the String constant pool from the Perm area to the Java Heap area
  • String#internMethod, if an object exists in the heap, the reference to the object is saved instead of recreating the object.

1, Intern correct use examples

Let’s look at a common example of using the String#intern method.

The code is as follows:

static final int MAX = 1000 * 10000;
static final String[] arr = new String[MAX];

public static void main(String[] args) throws Exception {
    Integer[] DB_DATA = new Integer[10];
    Random random = new Random(10 * 10000);
    for (int i = 0; i < DB_DATA.length; i++) {
        DB_DATA[i] = random.nextInt();
    }
	long t = System.currentTimeMillis();
    for (int i = 0; i < MAX; i++) {
        //arr[i] = new String(String.valueOf(DB_DATA[i % DB_DATA.length]));
         arr[i] = new String(String.valueOf(DB_DATA[i % DB_DATA.length])).intern();
    }

	System.out.println((System.currentTimeMillis() - t) + "ms");
    System.gc();
}
Copy the code

-xmx2g-xMS2g-xmn1500m -xmx2g-xms2g-xmn1500m The result is shown below

2160ms

Use the intern

826ms

Don’t use intern

From the above results, we found that the code without intern generated 1000W strings and took up about 640M of space. The code of intern was used to generate 1345 strings, occupying a total space of about 133K. In fact, the program only used 10 strings, so the exact calculation should be exactly 100W times different. Although the example is a bit extreme, it does accurately reflect the huge space savings generated by using intern.

Careful students will notice some increase in time after using the intern method. This is because every time a new String is used in the program, then the intern operation takes time, which is unavoidable if the memory space is sufficient, but in ordinary use, the memory space is certainly not infinite. The JVM garbage collection time due to not using intern taking up space is much longer than this. After all, it took 1000W intern times to gain just over a second.

2, Use improperly

After reading the use of intern and the principle of intern, we see a problem caused by the improper use of INTERN.

When using FastJSON for interface reading, we found that after reading nearly 70W pieces of data, our log printing became very slow, each log printing takes about 30ms, if more than 2 or 3 logs are printed in a request, we will find that the request takes more than twice the time. The problem disappeared after the JVM was restarted. After continuing to read the interface, the problem recurs. Now let’s look at the process in which things go wrong.

####1. Locate the cause according to log4j logs

It takes a long time to print logs using log4j#info. So use housemd software to track the time stack of info methods.

  • trace SLF4JLogger.
  • trace AbstractLoggerWrapper:
  • trace AsyncLogger
org/apache/logging/log4j/core/async/AsyncLogger.actualAsyncLog(RingBufferLogEvent)                sun.misc.Launcher$AppClassLoader@109aca82            1            1ms    org.apache.logging.log4j.core.async.AsyncLogger@19de86bb  
org/apache/logging/log4j/core/async/AsyncLogger.location(String)                                  sun.misc.Launcher$AppClassLoader@109aca82            1           30ms    org.apache.logging.log4j.core.async.AsyncLogger@19de86bb  
org/apache/logging/log4j/core/async/AsyncLogger.log(Marker, String, Level, Message, Throwable)    sun.misc.Launcher$AppClassLoader@109aca82            1           61ms    org.apache.logging.log4j.core.async.AsyncLogger@19de86bb  
Copy the code

The code comes from the asynclogger.location method. Inside the main is to call for the return Log4jLogEvent. CalcLocation (fqcnOfLogger); And Log4jLogEvent calcLocation ()

Log4jLogEvent. CalcLocation () code is as follows:

public static StackTraceElement calcLocation(final String fqcnOfLogger) {  
    if (fqcnOfLogger == null) {  
        return null;  
    }  
    final StackTraceElement[] stackTrace = Thread.currentThread().getStackTrace();  
    boolean next = false;  
    for (final StackTraceElement element : stackTrace) {  
        final String className = element.getClassName();  
        if (next) {  
            if (fqcnOfLogger.equals(className)) {  
                continue;  
            }  
            return element;  
        }  
        if (fqcnOfLogger.equals(className)) {  
            next = true;  
        } else if (NOT_AVAIL.equals(className)) {  
            break;  
        }  
    }  
    return null;  
}  
Copy the code

Thread.currentthread ().getStackTrace(); The problem.

####2, trace the native code of thread.currentthread ().getstacktrace () and verify String#intern

Thread.currentThread().getStackTrace(); Native methods:

public StackTraceElement[] getStackTrace() { if (this ! = Thread.currentThread()) { // check for getStackTrace permission SecurityManager security = System.getSecurityManager(); if (security ! = null) { security.checkPermission( SecurityConstants.GET_STACK_TRACE_PERMISSION); } // optimization so we do not call into the vm for threads that // have not yet started or have terminated if (! isAlive()) { return EMPTY_STACK_TRACE; } StackTraceElement[][] stackTraceArray = dumpThreads(new Thread[] {this}); StackTraceElement[] stackTrace = stackTraceArray[0]; // a thread that was alive during the previous isAlive call may have // since terminated, therefore not having a stacktrace. if (stackTrace == null) { stackTrace = EMPTY_STACK_TRACE; } return stackTrace; } else { // Don't need JVM help for current thread return (new Exception()).getStackTrace(); } } private native static StackTraceElement[][] dumpThreads(Thread[] threads);Copy the code

Download the source code of openJdk7 to query the native implementation code of the JDK. The list is as follows:

\openjdk7\jdk\src\share\native\java\lang\Thread.c \openjdk7\hotspot\src\share\vm\prims\jvm.h line:294: \openjdk7\hotspot\src\share\vm\prims\jvm.cpp line:4382-4414: \openjdk7\hotspot\src\share\vm\services\threadService.cpp line:235-267: \openjdk7\hotspot\src\share\vm\services\threadService.cpp line:566-577: \ openjdk7 \ hotspot \ SRC \ share \ \ classfile \ javaClasses vm CPP line: 1635-1651165, 4165, 8] [:

After tracking down the underlying JVM source code, it was the following three pieces of code that caused the entire application to slow down.

oop classname = StringTable::intern((char*) str, CHECK_0);  
oop methodname = StringTable::intern(method->name(), CHECK_0);  
oop filename = StringTable::intern(source, CHECK_0);  
Copy the code

These three pieces of code get the class name, method name, and file name. Because class names, method names, and file names are stored in the string constant pool, they are retrieved each time through the String#intern method. What is not taken into account is that the default StringPool has a length of 1009 and is immutable. So once the string size in the constant pool reaches a certain size, performance deteriorates dramatically.

####3,fastjson improper use of String#intern

The slow intern is due to fastjson’s improper use of the String#intern method. Tracing the implementation code in Fastjson,

####com.alibaba.fastjson.parser.JSONScanner#scanFieldSymbol()

if (ch == '\"') {
    bp = index;
    this.ch = ch = buf[bp];
    strVal = symbolTable.addSymbol(buf, start, index - start - 1, hash);
    break;
}
Copy the code

####com.alibaba.fastjson.parser.SymbolTable#addSymbol():

/**
 * Constructs a new entry from the specified symbol information and next entry reference.
 */
public Entry(char[] ch, int offset, int length, int hash, Entry next){
    characters = new char[length];
    System.arraycopy(ch, offset, characters, 0, length);
    symbol = new String(characters).intern();
    this.next = next;
    this.hashCode = hash;
    this.bytes = null;
}
Copy the code

Fastjson uses intern method to cache all JSON keys in the string constant pool, so that each read is very fast, greatly reducing space and time. And json keys are usually immutable. This doesn’t take into account the large number of JSON keys that change, which can be a huge burden on the string constant pool.

Fastjson has fixed this bug in version 1.1.24. The program adds a maximum cache size beyond which no more string constants are added to the pool.

[1.1.24 version of the com. Alibaba. Fastjson. Parser. SymbolTable# addSymbol () Line: 113] code

public static final int MAX_SIZE           = 1024;

if (size >= MAX_SIZE) {
    return new String(buffer, offset, len);
}
Copy the code

This problem is caused by 70W of data, but it may be more than 30ms of data with millions of data. So be careful when using String#intern mode provided at the system level!

String#intern: String#intern: String#intern: String#intern: String#intern: String#intern: String#intern Let us use and contact it when can avoid some bugs, enhance the robustness of the system.

Here are a few key posts. Thank you!

  • Save Memory by Using String Intern in Java
  • Java String array: is there a size of method?
  • Understanding String Table Size in HotSpot
  • How is Java’s String#intern() method implemented?
  • Changes to String. Intern in JDK7

Source:

Meituan Technology by John_Yang