preface

Bored when you see the question “What is the String length limit?” Indeed, that’s what I felt when I first saw it.

However, when tracing this problem in depth, I found that the significance of the String length limit itself is not important, but that a lot of knowledge points will be connected in this process, which is a perfect problem. No wonder similar questions arise in high-level interviews.

This article will take you through the process of tracing String length limits. It is important to remind the reader that it is not the conclusion that matters, but the process of analysis and the store of knowledge involved. For example, the underlying implementation of String, the scope of int type, “Java Virtual Machine specification”, Java compiler source code implementation and a lot of knowledge points.

String source tracing

To see the length limit of String class, must first look at the source code of String, here is the most used JDK8 as an example to illustrate. The underlying implementation of String has changed in JDK9 and beyond. See “A new optimization for String strings in JDK9”.

We all know that the String class provides a length method. Can we use this method directly to determine the maximum length of a String?

/** * Returns the length of this string. * The length is equal to the number of <a href="Character.html#unicode">Unicode  * code units</a> in the string. * * @return the length of the sequence of characters represented by this * object. */ public int length() { return value.length; }Copy the code

The documentation doesn’t say what the maximum length is, but you can get a clue from the type of results returned. The result type is int, which means that the range of int values is one of the constraints.

If you know that int ranges from 2^ 31-1 to positive integers, that’s good. If you don’t, look at the corresponding wrapper class Integer:

public final class Integer extends Number implements Comparable<Integer> { /** * A constant holding the minimum value an  {@code int} can * have, -2<sup>31</sup>. */ @Native public static final int MIN_VALUE = 0x80000000; /** * A constant holding the maximum value an {@code int} can * have, 2<sup>31</sup>-1. */ @Native public static final int MAX_VALUE = 0x7fffffff; / /... }Copy the code

Both MIN_VALUE and MAX_VALUE values or comments indicate the range of values for int. The maximum length of a String should be:

2^ 31-1 = 2147483647Copy the code

Back to the length method, we see that the value of length is obtained by value, which in JDK8 is implemented as a char array:

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final char value[];
    // ...   
}
Copy the code

In Java, chars are encoded in UTF16. A char takes up two bytes. So we have to multiply this by 2.

At this point, the calculation formula is:

2^31-1 =2147483647 16-bit Unicodecharacter 2147483647 * 2 = 4294967294 (Byte) 4294967294/1024 = 4194303.998046875 (KB) 4194303.998046875/1024 = 4095.9999980926513671875 (MB) 4095.9999980926513671875/1024 = 3.99999999813735485076904296875 (GB)Copy the code

This means that the maximum memory used by strings is approximately 4GB. However, if you declare a string with a length of 100,000, you will see the compiler throw an exception with the following message:

Error: Constant string too longCopy the code

What happened to $2.1 billion? How can 100,000 be abnormal? This exception is determined by compile-time constraints.

Compile-time limits for string constant pools

Those of you who are familiar with JVM virtual machines know that when a string declaration is made through a literal, it is compiled into the Class constant pool as a constant.

String s = "program new horizon ";Copy the code

The constant pool has a limit on String length. Each data item in the constant pool has its own type. Unicode strings encoded in UTF-8 in Java are represented as type CONSTANT_Utf8 in the constant pool.

As you can see in the Java Virtual Machine Specification, strings are defined via CONSTANT_String_info.

You can see that “the value of the string_index entry must be a valid index to the constant pool, and the constant pool entry at that index must be a CONSTANT_Utf8_info(§4.4.7) structure.”

Continue with the definition of CONSTANT_Utf8_info:

Length specifies the length of the bytes[] array, of type U2. The definition of U2 can also be found in the Java Virtual Machine Specification:

U2 represents an unsigned number of two bytes, with one byte having 8 bits and two bytes having 16 bits. Thus, the maximum u2 can represent is 2^ 16-1 = 65535.

At this point, the second constraint is that the constant pool format in the Class file states that its string constants cannot exceed 65535 in length.

At this point, if you try to declare a string of 65535 length with a literal:

String s = "8888... 8888 "; // There are 655,550,000 characters "8" in itCopy the code

The compiler throws the same exception. Why is that?

This question can also be answered in the Java Virtual Machine Specification (Section 4.7.3) :

It was originally designed to remedy a bug in the early design, “the length is just 65535 bytes, and ends with a 1 byte instruction, which cannot be processed by the exception handler”, so the maximum length of the array is limited to 65534 bytes.

If you can look at the source code for the compiler section of the JVM, you can see the code implementation of this restriction in the Gen class:

/** Check a constant value and report if it is a string that is * too large. */ private void checkStringConstant(DiagnosticPosition pos, Object constValue) { if (nerrs ! = 0 || // only complain about a long string once constValue == null || ! (constValue instanceof String) || ((String)constValue).length() < Pool.MAX_STRING_LENGTH) return; log.error(pos, "limit.string"); nerrs++; }Copy the code

Pool.MAX_STRING_LENGTH is defined as follows:

public class Pool { public static final int MAX_STRING_LENGTH = 0xFFFF; / /... }Copy the code

Try declaring a 65534 string again and it will compile normally. At this point, it can be concluded that the maximum length of the string at compile time is 65534.

We know that Java distinguishes compile time from run time, so is there a length limit at run time?

The length limit of the runtime

The limitations of the String runtime are primarily in the String constructor. A constructor for String looks like this:

public String(char value[], int offset, int count) {
   // ...
}
Copy the code

The parameter count is the maximum length of the string. In this case, the calculation is the same as the previous algorithm, which is converted to bit and then to GB:

(2^31-1)*16/8/1024/1024/1024 = 4GB
Copy the code

That is, the runtime can theoretically support strings up to 4GB in size, exceeding which limits will throw an exception. JDK9 optimizes String storage by using byte arrays instead of char arrays, which can save up to half the space for pure Latin1 characters.

Of course, this 4GB limit assumes that the JVM can allocate that much available memory.

summary

Through the above analysis, we can draw the following conclusions: First, the length of the string cannot exceed 65534 during compilation; Second, at runtime, the string length cannot exceed 2^31-1 and the memory usage (4GB) cannot exceed the maximum memory allocated by the virtual machine.

The conclusion is simple, but have you learned the knowledge and ideas used in the analysis of this article? If not, fix it.

Brief introduction of the blogger: Author of the technical book SpringBoot Technology Insider, loves to delve into technology and writes technical articles.

Public account: “program new vision”, the blogger’s public account, welcome to follow ~

Article reprinted: Please contact the weibo user: Zhuan2quan


Program new horizon

\

The public account “program new vision”, a platform for simultaneous improvement of soft power and hard technology, provides massive information

\