I recently took a moment to review the question of whether strings are limited in length, which I wrote a separate article about before, and found some new insights. So I’m going to rearrange this.

This time on the basis of the previous article, in addition to adding some validation procedures, there are also some errors corrected. In this analysis, I will try to debug the COMPILATION process of Jdk, and I will refer to some JVM specifications and introduce this knowledge point in all aspects.

Because this problem involves the compilation principle of Java related knowledge, so the interpretation will be more easy to understand by means of video, video I uploaded to the station B: www.bilibili.com/video/BV1uK…

String length limit

To figure this out, we first need to look at the String source code to see if there are any length restrictions or definitions.

The String class has a number of overloaded constructors, several of which allow the user to pass in length to perform length:

public String(byte bytes[], int offset, int length) 
Copy the code

The length of a String is the maximum length of an int. The length of a String is the maximum length of an int.

According to the definition of the Integer class, the maximum value of java.lang.Integer#MAX_VALUE is 2^ 31-1;

So, can we assume that the maximum length String can support is this value?

It’s not, it’s just a maximum length that we can support at run time when we’re constructing strings, and actually, at run time, when we’re defining strings there’s a length limit.

Such as the following code:

String s = "11111... 1111 "; // There are 100,000 characters "1"Copy the code

When we define a string using the above form, when we perform javac compilation, we will raise an exception as follows:

Error: Constant string too longCopy the code

The length specified by the String constructor can support 2147483647(2^ 31-1).

String s = “XXX “; When we define String, XXX is called a literal, which is compiled as a constant into the Class constant pool.

So there’s a problem, because to get into a constant pool, you have to follow the constant pool rules.

Constant pool limit

As we know, javac is a command to compile a Java file into a class file, so there is a certain format that needs to be followed during class file generation.

CONSTANT_String_info is used to represent a constant object of type java.lang.String, as defined in Section 4.4 of the Java Virtual Machine Specification:

CONSTANT_String_info {
    u1 tag;
    u2 string_index;
}
Copy the code

Where the value of the string_index entry must be a valid index to the constant pool, and the constant pool entry at that index must be a CONSTANT_Utf8_info structure representing a sequence of Unicode code points, This sequence of Unicode code points is eventually initialized as a String object.

The CONSTANT_Utf8_info structure is used to represent the value of a string constant:

CONSTANT_Utf8_info {
    u1 tag;
    u2 length;
    u1 bytes[length];
}
Copy the code

Length specifies the length of the bytes[] array, which is of type U2,

Through the “norms”, we can learn. U2 represents an unsigned number of two bytes, so 1 byte has 8 bits, and 2 bytes have 16 bits.

The maximum 16-bit unsigned number that can be represented is 2^ 16-1 = 65535.

That is, the constant pool format in the Class file states that its string constants cannot exceed 65535 in length.

So, let’s try to define strings as follows:

String s = "11111... 1111 "; // Contains 65535 characters "1"Copy the code

If you try to compile using Javac, you will also get “error: constant string too long”, so what is the cause?

In fact, this reason can be found in javac code, in the Gen class as follows:

private void checkStringConstant(DiagnosticPosition var1, Object var2) {
    if (this.nerrs == 0 && var2 != null && var2 instanceof String && ((String)var2).length() >= 65535) {
        this.log.error(var1, "limit.string", new Object[0]);
        ++this.nerrs;
    }
}
Copy the code

As you can see from the code, if the parameter type is String and the length is greater than or equal to 65535, the compilation will fail.

This is where you can try to debug the javac compilation process (there is a way to debug the Java compilation process in the video), or you can find that this place will report errors.

If we try to define a string with 65534 characters, we will see that it compiles normally.

This value is explained in the Java Virtual Machine Specification:

if the Java Virtual Machine code for a method is exactly 65535 bytes long and ends with an instruction that is 1 byte long, then that instruction cannot be protected by an exception handler. A compiler writer can work around this bug by limiting the maximum size of the generated Java Virtual Machine code for any method, instance initialization method, or static initializer (the size of any code array) to 65534 bytes

Run time limit

The String length constraint mentioned above is a compile-time constraint that uses String s= “”; This is a limitation of the literal definition.

Then. MAX_VALUE is approximately 4 gigabytes. At runtime, if the String length exceeds this range, an exception may be thrown. (Prior to JDK 1.9)

Int is a 32-bit variable type, the longest they can have in positive numbers

2^31-1 =2147483647 16-bit Unicodecharacter 2147483647 * 16 = 34359738352 bits 34359738352/8 = 4294967294 (Byte) 4294967294 = 4194303.998046875 (KB) 4194303.998046875/1024 = 4095.9999980926513671875 (MB) 4095.9999980926513671875/1024 = 3.99999999813735485076904296875 (GB)Copy the code

It has nearly 4 gigabytes of capacity.

Many people may wonder why the maximum length is less than 65535 at compile time, so how can it be greater than 65535 at runtime? This is actually quite common, as in the following code:

String s = "";
for (int i = 0; i <100000 ; i++) {
    s+="i";
}
Copy the code

The resulting string length is 100,000, and I’ve had this problem in real life before.

In the previous system docking, hd pictures needed to be transmitted. The agreed transmission method was that the other party would transfer the pictures into BASE6 encoding, and then we would transfer them into pictures after receiving them.

The exception is thrown when the BASE64 encoded content is assigned to the string.

conclusion

Strings have a length limit. At compile time, constants in the string constant pool are required to be no more than 65535, and during javAC execution, the maximum value is 65534.

At runtime, the length cannot exceed the range of Int, otherwise an exception will be thrown.

Finally, this knowledge, I recorded a video (www.bilibili.com/video/BV1uK)… , with tips on how to do lab tests, how to look up Java specifications, and how to deubg javAC. Further study is welcome.