While browsing programcreek, I found a few small but piquant themes. For example: How does Java’s substring() method work? Topics like soul torture are well worth exploring in depth.

In addition, I want to tell you that the process of research is very interesting, just like exploring treasure in a maze. At first, I feel at a loss, but after some careful groping, I will not only find treasure, but also have a feeling of enlightenment. It is very great.

For the vast majority of junior programmers or do not pay attention to the “internal work” of the old bird, often stay in the “know it does not know why” level – will use, but to say the underlying principle, can only scratch the head hands a spread a question mark face.

I’ve been on that level for a long time. But I decided to change, because “internal strength” is like laying a foundation. Only when the foundation is well laid can we build a tall building that can stand the test. I’d like to take a closer look at how Java substring() works. All right, get ready to level up!

01, What does substring() do

Sub is an abbreviation for subtract, so substring literally means “subtract the string.” Such an analysis, is not the feeling of the method of naming or pretty exquisite?

The full form of substring() is substring(int beginIndex, int endIndex). This method returns a new string between beginIndex and endIndex-1 at the beginning and end of the original string.

String cmower = "Silent King II, a funny programmer.";

cmower = cmower.substring(0.4);

System.out.println(cmower);

Copy the code

The output of the program is:

Silence reigned two

Copy the code

Why is that? Let me explain this briefly.

Java’s subscripts are all numbered from 0 (I’m not sure if there’s a programming language that starts with 1), which is different from the way we’re used to numbering from 1. Java does this for the following reasons:

Java is implemented in C, where subscripts start at 0 — which may sound like nonsense. The real reason is that subscripts are not subscripts, and in pointer (C) language, it’s actually an offset, an offset from the starting position. The first element is at the beginning, so its offset is 0.

Besides, there is another way of saying it. In the early days of computer resources, 0 as the starting subscript was more efficient than 1 in compiling.

With that in mind, take a look at the code above and it becomes clear. Cmower. Substring (0, 4) returns the string “Silent King 2” — including the starting but not the ending substring — because “Silence King 2” has subscript 0, “Silence king 2” has subscript 1, “King” has subscript 2, and “two” has subscript 3.

02. What exactly happens when substring() is called?

Previously, we learned that [strings are immutable](), so when the substring() method is called, it returns a new string. The address reference of the variable CMOwer will change as shown in the figure below.

To prove that the figure above is completely correct, let’s take a look at the source code for substring() in JDK 7.

public String(char value[], int offset, int count) {

    //check boundary

    this.value = Arrays.copyOfRange(value, offset, offset + count);

}



public String substring(int beginIndex, int endIndex) {

    //check boundary

    int subLen = endIndex - beginIndex;

    return new String(value, beginIndex, subLen);

}

Copy the code

Substring () returns a new String object from new String(), and a new character array is copied from array.copyofrange () when the new object is created.

But JDK 6 is different. JDK 6? JDK 13 came out in what year? But WHAT I want to tell you is that it is helpful to learn by comparing the source code of the JDK.

There is a saying that if you want to understand a successful person, you should not only pay attention to what he did after he became famous, but also pay attention to what he did before.

For substring() in JDK 6, check out the source code.

//JDK 6

String(int offset, int count, char value[]) {

    this.value = value;

    this.offset = offset;

    this.count = count;

}



public String substring(int beginIndex, int endIndex) {

    //check boundary

    return  new String(offset + beginIndex, endIndex - beginIndex, value);

}

Copy the code

The substring() method itself isn’t that different from JDK 7, as it returns a new String object via new String(). But the String() constructor is quite different. JDK 6 simply changes the values of the two properties (offset and count).

PS: value is an array of real characters, offset is the index of the first element in the array, and count is the number of characters in the array.

What does that mean?

Calling substring() creates a new string, but the string values still point to the same array in memory, as shown in the figure below.

03. Why are the constructors in JDK 7 changed

After looking at the JDK 6 and 7 source code, you may wonder why JDK 7 is being changed. Wouldn’t it be nice if everyone shared the same array of strings instead of taking up new memory? Actually?

If you have a very long string that goes around the earth, it can cause performance problems when you need to call substring() to truncate a very small part of the string. Since this small string refers to the entire long array of characters, the long array cannot be reclaimed, and memory is always occupied, potentially causing a memory leak.

PS: A memory leak is when a program fails to release memory that is no longer in use due to negligence or error.

Before JDK 7 came along, what did you do about it? Here are the answers.

cmower = cmower.substring(0.4) + "";

Copy the code

Why, why, why, should an extra “+ “” solve the memory leak problem? For those of you who don’t believe me, let me break it down.

First, we decompile the bytecode using JAD, and the above line of code reads as follows.

 cmower = (new StringBuilder(String.valueOf(cmower.substring(0.4)))).toString();

Copy the code

Cmower = cmower.subString (0, 4) + “”; cmower = cmower.subString (0, 4) + “”; After this line of code executes, the cMOwer points to a different character array than before the substring() call.

PS: If you don’t understand how the “+” operator works, please refer to my previous article “Shame, Java String Concatenation has so many positions”, so that I won’t repeat it here, so that I won’t be hit by regular readers.

04, finally

To summarize, the substring() method itself hasn’t changed much between JDK 7 and JDK 6, but the String constructor has changed a lot. JDK 7 rewrites a character array, while JDK 6 doesn’t. Thus JDK 6 May cause memory leaks when executing the long string substring().


Well, readers, that’s all for this article. Can see here are the most excellent programmers, two elder brother must point a thumbs-up for you 👍. If you don’t like it and want to see more, I recommend a few more.

Soul torture: Create Java strings with “” or constructor

Soul Searching: Why Are Java Strings immutable?

Soul check: How do I check if a Java array contains a value?

If you think this article is useful, please like it, leave a message and pay attention to it. Let’s form a good habit together! In addition, I have compiled a list of must-read books for top Java programmers, which are available for free in reply to the keyword “Java”.