directory

  • Is split() efficient for splitting strings?
  • The JDK provides the string slicing utility class StringTokenizer
  • Hand – on – hand takes you to implement a more efficient string cutting tool class
  • conclusion

Today to introduce a small knowledge point, but will be very practical, is usually when we write Java code, if you want to cut the string, we can skillfully use some skills to improve the performance of 5~10 times. No more nonsense, direct dry goods for everyone!

Is split(), which is commonly used at work, efficient in splitting strings?

First, we use the following code to concatenate a comma-delimited super-long string. We can concatenate each number from 0 to 9999 into a comma-delimited super-long string so that we can experiment, as shown below:

public class StringSplitTest {

    public static void main(String[] args) {
        String string = null;
        StringBuffer stringBuffer = new StringBuffer();

        int max = 10000;
        for(int i = 0; i < max; i++) {
            stringBuffer.append(i);
            if(i < max - 1) {
                stringBuffer.append(","); } } string = stringBuffer.toString(); }}Copy the code

Then we can use the following code to test how long it takes to split an extremely long string with the most basic split method and loop through it 1W times:

public class StringSplitTest {

    public static void main(String[] args) {
        String string = null;
        StringBuffer stringBuffer = new StringBuffer();

        int max = 10000;
        for(int i = 0; i < max; i++) {
            stringBuffer.append(i);
            if(i < max - 1) {
                stringBuffer.append(",");
            }
        }
        string = stringBuffer.toString();

        long start = System.currentTimeMillis();
        for(int i = 0; i < 10000; i++) {
            string.split(",");
        }
        longend = System.currentTimeMillis(); System.out.println(end - start); }}Copy the code

After the test of the above code, it was finally found that it took more than 2000 milliseconds to cut the string according to commas with split method for 1w times, which was not fixed and was about 2300 milliseconds.

The JDK provides the string slicing utility class StringTokenizer

The StringTokenizer is a tool for slicing strings. It performs better than the StringTokenizer. We can use it to slicing strings 1W times. Take a look at the specific performance test results:

import java.util.StringTokenizer;

public class StringSplitTest {

    public static void main(String[] args) {
        String string = null;
        StringBuffer stringBuffer = new StringBuffer();

        int max = 10000;
        for(int i = 0; i < max; i++) {
            stringBuffer.append(i);
            if(i < max - 1) {
                stringBuffer.append(",");
            }
        }
        string = stringBuffer.toString();

        long start = System.currentTimeMillis();
        for(int i = 0; i < 10000; i++) {
            string.split(",");
        }
        long end = System.currentTimeMillis();
        System.out.println(end - start);

        start = System.currentTimeMillis();
        StringTokenizer stringTokenizer =
                new StringTokenizer(string, ",");
        for(int i = 0; i < 10000; i++) {
            while(stringTokenizer.hasMoreTokens()) {
                stringTokenizer.nextToken();
            }
            stringTokenizer = new StringTokenizer(string, ","); } end = System.currentTimeMillis(); System.out.println(end - start); }}Copy the code

Go to the StringTokenizer and use hasMoreTokens() to determine if you have the next tokens cut. If you do, use nextToken() to get the tokens cut. I’m going to create a new StringTokenizer object.

When you test this out, you can see that the time it takes to cut a string 1W times with the StringTokenizer is around 1900 milliseconds. How do you feel? Do you see the gap? Changing the way you slice a string reduces the time by 400 to 500ms, and performance is already up to 20% better.

Hand – on – hand takes you to implement a more efficient string cutting tool class

String slicing function: string slicing function: string slicing function: string slicing function

private static void split(String string) {
  String remainString = string;
  int startIndex = 0;
  int endIndex = 0;
  while(true) {
    endIndex = remainString.indexOf(",", startIndex);
    if(endIndex <= 0) {
      break;
    }
    remainString.substring(startIndex, endIndex);
    startIndex = endIndex + 1; }}Copy the code

The above code is our custom string slicing function, which basically means that each slicing goes through a while loop, startIndex starts at 0, and each loop finds the index of the next comma from startIndex, which is endIndex. A string is truncated based on startIndex and endIndex, and startIndex can advance to endIndex + 1, and the next loop will truncate the substring before a comma. Let’s test it again with the above custom cut function, as follows:

import java.util.StringTokenizer;

public class StringSplitTest {

    public static void main(String[] args) {
        String string = null;
        StringBuffer stringBuffer = new StringBuffer();

        int max = 10000;
        for(int i = 0; i < max; i++) {
            stringBuffer.append(i);
            if(i < max - 1) {
                stringBuffer.append(",");
            }
        }
        string = stringBuffer.toString();

        long start = System.currentTimeMillis();
        for(int i = 0; i < 10000; i++) {
            string.split(",");
        }
        long end = System.currentTimeMillis();
        System.out.println(end - start);

        start = System.currentTimeMillis();
        StringTokenizer stringTokenizer =
                new StringTokenizer(string, ",");
        for(int i = 0; i < 10000; i++) {
            while(stringTokenizer.hasMoreTokens()) {
                stringTokenizer.nextToken();
            }
            stringTokenizer = new StringTokenizer(string, ",");
        }
        end = System.currentTimeMillis();
        System.out.println(end - start);

        start = System.currentTimeMillis();
        for(int i = 0; i < 10000; i++) {
            split(string);
        }
        end = System.currentTimeMillis();
        System.out.println(end - start);
    }

    private static void split(String string) {
        String remainString = string;
        int startIndex = 0;
        int endIndex = 0;
        while(true) {
            endIndex = remainString.indexOf(",", startIndex);
            if(endIndex <= 0) {
                break;
            }
            remainString.substring(startIndex, endIndex);
            startIndex = endIndex + 1; }}}Copy the code

conclusion

After testing the code above, our own String splitting function took about 1000ms, which is more than twice the performance of string.split and twice the performance of StringTokenizer. What if strings were larger? In fact, the larger the string, the greater the performance gap, which may increase our performance by a larger factor!

END

pleasantly surprised