JDK source code: String

LastIndexOf method

This method returns the index at the last occurrence of the specified character in the string, taking multiple method arguments. You can pass in an int, you can pass in a String, and you can pass in a starting position. Latin1 and UTF16 are used for processing according to different codes.

public int lastIndexOf(int ch) {
        return lastIndexOf(ch, length() - 1);
    }
    
public int lastIndexOf(int ch, int fromIndex) {
        return isLatin1() ? StringLatin1.lastIndexOf(value, ch, fromIndex)
                          : StringUTF16.lastIndexOf(value, ch, fromIndex);
    }
    
public int lastIndexOf(String str) {
        return lastIndexOf(str, length());
    }
    
public int lastIndexOf(String str, int fromIndex) {
        return lastIndexOf(value, coder(), length(), str, fromIndex);
    }
    
static int lastIndexOf(byte[] src, byte srcCoder, int srcCount,
                           String tgtStr, int fromIndex) {
        byte[] tgt = tgtStr.value;
        byte tgtCoder = tgtStr.coder();
        int tgtCount = tgtStr.length();
        int rightIndex = srcCount - tgtCount;
        if (fromIndex > rightIndex) {
            fromIndex = rightIndex;
        }
        if (fromIndex < 0) {
            return- 1; }if (tgtCount == 0) {
            return fromIndex;
        }
        if (srcCoder == tgtCoder) {
            return srcCoder == LATIN1
                ? StringLatin1.lastIndexOf(src, srcCount, tgt, tgtCount, fromIndex)
                : StringUTF16.lastIndexOf(src, srcCount, tgt, tgtCount, fromIndex);
        }
        if (srcCoder == LATIN1) {   
            return- 1; }return StringUTF16.lastIndexOfLatin1(src, srcCount, tgt, tgtCount, fromIndex);
    }
Copy the code

The logic encoded by Latin1 is,

To determine whether an int value can be converted to byte, check whether the right shift of 8 bits is 0. If 0, all but the lower 8 bits are 0.
throughMath.min(fromIndex, value.length - 1)Take the offset value.
The index value is returned if the offset is found.
Can’t find return -1.

public static int lastIndexOf(final byte[] value, int ch, int fromIndex) {
        if(! canEncode(ch)) {return- 1; } int off = Math.min(fromIndex, value.length - 1);for (; off >= 0; off--) {
            if (value[off] == (byte)ch) {
                returnoff; }}return- 1; }Copy the code

Similarly, similar processing is done for UTF16 encodings, but because Unicode includes a Basic Multilingual Plane (BMP), there are supplementary planes. The value passed in is of int type (4 bytes). Therefore, if the value exceeds the BMP plane, four bytes are required to save the high-surrogate and low-surrogate respectively, and four bytes are required to compare the value.

In addition, if you look for a substring, the match starts from the first character of the substring until the substring is completely matched.

The substring method

This method is used to get the specified substring of a string. There are two methods, one is to pass in only the start index, the second is to pass in the start index and the end index. The logic is clear from the source code, calculating the length of the truncated substring, and then generating a new String according to Latin1 and UTF16 respectively.

public String substring(int beginIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        int subLen = length() - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        if (beginIndex == 0) {
            return this;
        }
        return isLatin1() ? StringLatin1.newString(value, beginIndex, subLen)
                          : StringUTF16.newString(value, beginIndex, subLen);
    }
    
public String substring(int beginIndex, int endIndex) {
        int length = length();
        checkBoundsBeginEnd(beginIndex, endIndex, length);
        int subLen = endIndex - beginIndex;
        if (beginIndex == 0 && endIndex == length) {
            return this;
        }
        return isLatin1() ? StringLatin1.newString(value, beginIndex, subLen)
                          : StringUTF16.newString(value, beginIndex, subLen);
    }
    
public static String newString(byte[] val, int index, int len) {
        return new String(Arrays.copyOfRange(val, index, index + len),
                          LATIN1);
    }
    
public static String newString(byte[] val, int index, int len) {
        if (String.COMPACT_STRINGS) {
            byte[] buf = compress(val, index, len);
            if(buf ! = null) {return new String(buf, LATIN1);
            }
        }
        int last = index + len;
        return new String(Arrays.copyOfRange(val, index << 1, last << 1), UTF16);
    }
Copy the code

SubSequence method

Equivalent to the substring method.

public CharSequence subSequence(int beginIndex, int endIndex) {
        return this.substring(beginIndex, endIndex);
    }
Copy the code

Concat method

This method is used to concatenate the specified string argument to the string. Logic is,

Gets the length of the string to be concatenated, or itself if the length is 0.
If the two codes are the same, pass directlySystem.arraycopyCopy and return a new String.
If the encodings are different, the UTF16 encodings are used to copy the values of each into a byte array and return a new String.

public String concat(String str) {
        int olen = str.length();
        if (olen == 0) {
            return this;
        }
        if (coder() == str.coder()) {
            byte[] val = this.value;
            byte[] oval = str.value;
            int len = val.length + oval.length;
            byte[] buf = Arrays.copyOf(val, len);
            System.arraycopy(oval, 0, buf, val.length, oval.length);
            return new String(buf, coder);
        }
        int len = length();
        byte[] buf = StringUTF16.newBytesFor(len + olen);
        getBytes(buf, 0, UTF16);
        str.getBytes(buf, len, UTF16);
        return new String(buf, UTF16);
    }
Copy the code

The replace method

This method is used to replace the character specified in the string, with two encoding processes.

public String replace(char oldChar, char newChar) {
        if(oldChar ! = newChar) { String ret = isLatin1() ? StringLatin1.replace(value, oldChar, newChar) : StringUTF16.replace(value, oldChar, newChar);if(ret ! = null) {returnret; }}return this;
    }
Copy the code

public static String replace(byte[] value, char oldChar, char newChar) {
        if (canEncode(oldChar)) {
            int len = value.length;
            int i = -1;
            while (++i < len) {
                if (value[i] == (byte)oldChar) {
                    break; }}if (i < len) {
                if (canEncode(newChar)) {
                    byte buf[] = new byte[len];
                    for (int j = 0; j < i; j++) {   
                        buf[j] = value[j];
                    }
                    while (i < len) {
                        byte c = value[i];
                        buf[i] = (c == (byte)oldChar) ? (byte)newChar : c;
                        i++;
                    }
                    return new String(buf, LATIN1);
                } else {
                    byte[] buf = StringUTF16.newBytesFor(len);
                    inflate(value, 0, buf, 0, i);
                    while (i < len) {
                        char c = (char)(value[i] & 0xff);
                        StringUTF16.putChar(buf, i, (c == oldChar) ? newChar : c);
                        i++;
                    }
                    returnnew String(buf, UTF16); }}}return null; 
    }
Copy the code

public String replace(CharSequence target, CharSequence replacement) {
        String tgtStr = target.toString();
        String replStr = replacement.toString();
        int j = indexOf(tgtStr);
        if (j < 0) {
            return this;
        }
        int tgtLen = tgtStr.length();
        int tgtLen1 = Math.max(tgtLen, 1);
        int thisLen = length();

        int newLenHint = thisLen - tgtLen + replStr.length();
        if (newLenHint < 0) {
            throw new OutOfMemoryError();
        }
        StringBuilder sb = new StringBuilder(newLenHint);
        int i = 0;
        do {
            sb.append(this, i, j).append(replStr);
            i = j + tgtLen;
        } while (j < thisLen && (j = indexOf(tgtStr, j + tgtLen1)) > 0);
        return sb.append(this, i, thisLen).toString();
    }
Copy the code

ReplaceFirst and replaceAll

It’s all done with regex.

public String replaceFirst(String regex, String replacement) {
        return Pattern.compile(regex).matcher(this).replaceFirst(replacement);
    }
    
public String replaceAll(String regex, String replacement) {
        return Pattern.compile(regex).matcher(this).replaceAll(replacement);
    }
Copy the code

The split method

This method is used to slice the string. In the implementation, it will judge whether the regular engine can be used or not. If the regular engine can be used, it will be directly sliced.

public String[] split(String regex) {
        return split(regex, 0);
    }
    
public String[] split(String regex, int limit) {
        char ch = 0;
        if (((regex.length() == 1 &&
             ". $| () [{^? * + \ \".indexOf(ch = regex.charAt(0)) == -1) ||
             (regex.length() == 2 &&
              regex.charAt(0) == '\ \' &&
              (((ch = regex.charAt(1))-'0') | ('9'-ch)) < 0 &&
              ((ch-'a') | ('z'-ch)) < 0 &&
              ((ch-'A') | ('Z'-ch)) < 0)) &&
            (ch < Character.MIN_HIGH_SURROGATE ||
             ch > Character.MAX_LOW_SURROGATE))
        {
            int off = 0;
            int next = 0;
            boolean limited = limit > 0;
            ArrayList<String> list = new ArrayList<>();
            while ((next = indexOf(ch, off)) != -1) {
                if(! limited || list.size() <limit - 1) {
                    list.add(substring(off, next));
                    off = next + 1;
                } else {    
                    int last = length();
                    list.add(substring(off, last));
                    off = last;
                    break; }}if (off == 0)
                return new String[]{this};

            if(! limited || list.size() <limit)
                list.add(substring(off, length()));

            int resultSize = list.size();
            if (limit= = 0) {while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                    resultSize--;
                }
            }
            String[] result = new String[resultSize];
            return list.subList(0, resultSize).toArray(result);
        }
        return Pattern.compile(regex).split(this, limit);
    }
Copy the code

The join method

Concatenate an array of strings with some delimiter. This is mostly done through the StringJoiner class.

public static String join(CharSequence delimiter, CharSequence... elements) {
        Objects.requireNonNull(delimiter);
        Objects.requireNonNull(elements);
        StringJoiner joiner = new StringJoiner(delimiter);
        for (CharSequence cs: elements) {
            joiner.add(cs);
        }
        return joiner.toString();
    }
    
public static String join(CharSequence delimiter,
            Iterable<? extends CharSequence> elements) {
        Objects.requireNonNull(delimiter);
        Objects.requireNonNull(elements);
        StringJoiner joiner = new StringJoiner(delimiter);
        for (CharSequence cs: elements) {
            joiner.add(cs);
        }
        return joiner.toString();
    }
Copy the code

The Add and toString methods of StringJoiner are as follows. The main logic of the add method is to assign each string to an array of strings and add up the length of the delimiters. The toString method basically concatenates an array of strings with delimiters and returns the final string.

public StringJoiner add(CharSequence newElement) {
        final String elt = String.valueOf(newElement);
        if (elts == null) {
            elts = new String[8];
        } else {
            if (size == elts.length)
                elts = Arrays.copyOf(elts, 2 * size);
            len += delimiter.length();
        }
        len += elt.length();
        elts[size++] = elt;
        return this;
    }
Copy the code

public String toString() {
        final String[] elts = this.elts;
        if(elts == null && emptyValue ! = null) {return emptyValue;
        }
        final int size = this.size;
        final int addLen = prefix.length() + suffix.length();
        if (addLen == 0) {
            compactElts();
            return size == 0 ? "" : elts[0];
        }
        final String delimiter = this.delimiter;
        final char[] chars = new char[len + addLen];
        int k = getChars(prefix, chars, 0);
        if (size > 0) {
            k += getChars(elts[0], chars, k);
            for (int i = 1; i < size; i++) {
                k += getChars(delimiter, chars, k);
                k += getChars(elts[i], chars, k);
            }
        }
        k += getChars(suffix, chars, k);
        return jla.newStringUnsafe(chars);
    }
    
private void compactElts() {
        if (size > 1) {
            final char[] chars = new char[len];
            int i = 1, k = getChars(elts[0], chars, 0);
            do {
                k += getChars(delimiter, chars, k);
                k += getChars(elts[i], chars, k);
                elts[i] = null;
            } while(++i < size); size = 1; elts[0] = jla.newStringUnsafe(chars); }}Copy the code

ToLowerCase method

Latin1: Latin1: Latin1: Latin1: Latin1: Latin1: Latin1

public String toLowerCase(Locale locale) {
        return isLatin1() ? StringLatin1.toLowerCase(this, value, locale)
                          : StringUTF16.toLowerCase(this, value, locale);
    }
    
public String toLowerCase() {
        return toLowerCase(Locale.getDefault());
    }
Copy the code

Iterating through the byte array, converting each to a value of type int, and passingCharacter.toLowerCaseCheck if all are lower case, if so return the string itself.
If it is(lang == "tr" || lang == "az" || lang == "lt")Three languages, extra processing, because not in Latin1 encoding.
Under normal circumstances, use firstSystem.arraycopyAssign to a new array, then lower case each character by iterating through the source array.
Creates a new String from the new array and returns it.

public static String toLowerCase(String str, byte[] value, Locale locale) {
        if (locale == null) {
            throw new NullPointerException();
        }
        int first;
        final int len = value.length;
        for (first = 0 ; first < len; first++) {
            int cp = value[first] & 0xff;
            if(cp ! = Character.toLowerCase(cp)) {break; }}if (first == len)
            return str;
        String lang = locale.getLanguage();
        if (lang == "tr" || lang == "az" || lang == "lt") {
            return toLowerCaseEx(str, value, first, locale, true);
        }
        byte[] result = new byte[len];
        System.arraycopy(value, 0, result, 0, first);  
        for (int i = first; i < len; i++) {
            int cp = value[i] & 0xff;
            cp = Character.toLowerCase(cp);
            if(! canEncode(cp)) {return toLowerCaseEx(str, value, first, locale, false);
            }
            result[i] = (byte)cp;
        }
        return new String(result, LATIN1);
    }
Copy the code

Method toUpperCase

Use to convert a string to uppercase, implementing the same logic as converting to lowercase above.

public String toUpperCase(Locale locale) {
        return isLatin1() ? StringLatin1.toUpperCase(this, value, locale)
                          : StringUTF16.toUpperCase(this, value, locale);
    }
    
public String toUpperCase() {
        return toUpperCase(Locale.getDefault());
    }
Copy the code

The trim method

Used to delete the beginning and end whitespace of a string, there are two encoding processes, take Latin1 for example,

public String trim() {
        String ret = isLatin1() ? StringLatin1.trim(value)
                                : StringUTF16.trim(value);
        return ret == null ? this : ret;
    }
Copy the code

Gets the length of the byte array.
Iterate to see how many characters need to be skipped at the beginning of the string, including any ASCII value less than or equal to a space.
Iterate to see how many characters need to be skipped at the end of the string.
Creates a new String based on the number of Spaces skipped at the beginning and end.

public static String trim(byte[] value) {
        int len = value.length;
        int st = 0;
        while ((st < len) && ((value[st] & 0xff) <= ' ')) {
            st++;
        }
        while ((st < len) && ((value[len - 1] & 0xff) <= ' ')) {
            len--;
        }
        return ((st > 0) || (len < value.length)) ?
            newString(value, st, len - st) : null;
    }
Copy the code

The toString method

Return this directly.

public String toString() {
        return this;
    }
Copy the code

Method toCharArray

Convert a string to a char array. In Latin1, the core is (char)(SRC [srcac ++] & 0xff).

public char[] toCharArray() {
        return isLatin1() ? StringLatin1.toChars(value)
                          : StringUTF16.toChars(value);
    }
Copy the code

public static char[] toChars(byte[] value) {
        char[] dst = new char[value.length];
        inflate(value, 0, dst, 0, value.length);
        return dst;
    }
    
public static void inflate(byte[] src, int srcOff, char[] dst, int dstOff, int len) {
        for(int i = 0; i < len; i++) { dst[dstOff++] = (char)(src[srcOff++] & 0xff); }}Copy the code

The format method

Format the string by Formatter.

public static String format(String format, Object... args) {
        return new Formatter().format(format, args).toString();
    }

public static String format(Locale l, String format, Object... args) {
        return new Formatter(l).format(format, args).toString();
    }
Copy the code

The valueOf method

Used to convert an incoming object into a String, and can pass in multiple type parameters.

Objet returns “null” if it is null, otherwiseobj.toString().
Char array, simply new a String.
Boolean, returns a “true” or “false” string.
Char, first attempt to read Latin1 encoded String, otherwise use UTF16.
Int,Integer.toString(i).
The long,Long.toString(l).
Float,Float.toString(f).
A double,Double.toString(d).

public static String valueOf(Object obj) {
        return (obj == null) ? "null" : obj.toString();
    }
    
public static String valueOf(char data[]) {
        return new String(data);
    }
    
public static String valueOf(char data[], int offset, int count) {
        return new String(data, offset, count);
    }
    
public static String valueOf(boolean b) {
        return b ? "true" : "false";
    }
    
public static String valueOf(char c) {
        if (COMPACT_STRINGS && StringLatin1.canEncode(c)) {
            return new String(StringLatin1.toBytes(c), LATIN1);
        }
        return new String(StringUTF16.toBytes(c), UTF16);
    }
    
public static String valueOf(int i) {
        return Integer.toString(i);
    }
    
public static String valueOf(long l) {
        return Long.toString(l);
    }
    
public static String valueOf(float f) {
        return Float.toString(f);
    }
    
public static String valueOf(double d) {
        return Double.toString(d);
    }
Copy the code

Intern method

A native method for implementing String. Intern () on the JVM

public native String intern();
Copy the code

GetBytes method

Used to copy a specified byte array, mainly through System. Arraycopy. However, if the target array is UTF16 encoded, both the high and low values are assigned to the byte array.

void getBytes(byte dst[], int dstBegin, byte coder) {
        if (coder() == coder) {
            System.arraycopy(value, 0, dst, dstBegin << coder, value.length);
        } else {
            StringLatin1.inflate(value, 0, dst, dstBegin, value.length);
        }
    }
    
public static void inflate(byte[] src, int srcOff, byte[] dst, int dstOff, int len) {
        checkBoundsOffCount(dstOff, len, dst);
        for (int i = 0; i < len; i++) {
            putChar(dst, dstOff++, src[srcOff++] & 0xff);
        }
    }
    
static void putChar(byte[] val, int index, int c) {
        index <<= 1;
        val[index++] = (byte)(c >> HI_BYTE_SHIFT);
        val[index]   = (byte)(c >> LO_BYTE_SHIFT);
    }
Copy the code

Coder method

Gets the encoding of the string, which must be UTF16 if a non-compact layout is used, or Latin1 or UTF16 otherwise.

byte coder() {
        return COMPACT_STRINGS ? coder : UTF16;
    }
Copy the code

IsLatin1 method

Check whether the code is Latin1. It must be compact and LATIN1 to be a LATIN1 code.

private boolean isLatin1() {
        return COMPACT_STRINGS && coder == LATIN1;
    }
Copy the code

————- Recommended reading ————

Summary of my open Source projects (Machine & Deep Learning, NLP, Network IO, AIML, mysql protocol, Chatbot)

Why to write “Analysis of Tomcat Kernel Design”

My 2017 article summary – Machine learning

My 2017 article summary – Java and Middleware

My 2017 article summary – Deep learning

My 2017 article summary — JDK source code article

My 2017 article summary – Natural Language Processing

My 2017 Article Round-up — Java Concurrent Article

Talk to me, ask me questions:

Welcome to: