The concept of string

  • A String is a finite sequence of zero or more characters, also called a String, usually denoted as s = “a1A2… an”

  • A subsequence consisting of any number of consecutive characters in a string is called a substring of the string. For example, “over” is a substring of “lover”

This article is shared with friends who need to brush the questions in the interview, I have specially arranged it. The technology inside is not clear by a few words. The answers to many questions are actually very simple, but the thinking and logic behind them are not simple. If you want to learn Java engineering, high performance and distributed, simple. Performance tuning, Spring, MyBatis, Netty source code, data structure, JVM, multithreading and so on, due to the space is limited, the following only show a small part of the interview questions, there is a need for a complete version of the friend can click a link jump to get,Link: Stamp here free download, get code: nuggets

String comparison

  • “Silly”, “stypid”, the first letter is “S”, so there is no difference, and the second letter is <” T “because it comes before” T “, so we say “silly” <” stypid”

  • Strings are compared by the encoding between the characters that make up the string, and the encoding refers to the serial number of the characters in the corresponding character set

  • Therefore, two strings are equal only when their string lengths and each character bean are equal

  • For unequal strings, e.g. S = “a1A2… An “, t = “b1b2… Bm “, s < t when one of the following conditions is satisfied

N < m and ai = bi (I =1,2….. ,n)

  • When s = “happen”,t = “happy”, because the first 4 letters of both strings are the same, and the ASCII code of the fifth letter e is 101, the ASCII code of y is 121, e < Y, so S < t

The abstract data type of a string

  • The logical structure of a string is very similar to that of a linear table, except that the string is for a character set, and all elements are characters, so the basic operation of a string is quite different from that of a linear table

  • Linear tables are more concerned with the operation of a single element, such as adding, deleting and checking an element, while strings are more concerned with finding the location of substrings, replacing and other operations

  • Data (Data element) : String elements consist of only one character, and adjacent elements have a precursor and a successor relationship

  • Basic operation

List of storage

Sequential storage

  • The sequential storage structure of a string uses a set of contiguous address storage cells to store the sequence of strings in a string, and allocates a fixed-length storage area to each defined string variable according to a predefined size

  • It is generally possible to store the actual string length at the subscript of array 0, and some books define it to be stored at the last subscript of the array

  • But some programming languages feel that storing a number takes up space, it requires the string value to be followed by an end tag, such as “\0”, this time if you want to know the length of the string, you need to traverse the calculation, but it still takes up space, why?

  • There are some problems with sequential storage, such as concatenation of two strings, insertion of new strings, and string substitution, which can make the string sequence longer than the array length

Chain store

  • The chain storage of string is similar to linear list, but due to the particularity of string structure, each element data in the structure is a character. If the chain list is simply used to store string values, each node corresponds to one character, there will be a great waste of space

  • Therefore, a node can hold more than one character. If the last node is not occupied, it can be completed with “#” or other non-string characters

  • At this time, it is very important to store the appropriate number of characters in a node, which will directly affect the efficiency of string processing, so you need to make a choice according to the actual situation

  • But the chain storage of strings is generally not as flexible as sequential storage, and the performance is not as good as sequential storage structure

Naive pattern matching algorithm

  • The substring location operation is usually called pattern matching of strings. Assuming that the substring T = “Google” is found from the main string S = “goodGoogle”, the following steps are usually required:

1. The first three letters of S and T are matched successfully, but the fourth letter of S fails to match

2. The matching fails when the second, third, and fourth digits of the main string S start

3. The main string S starts at the fifth digit and all six letters are matched successfully

  • In simple terms, each character of the main string is used as the beginning of the substring, and the string to be matched is matched. The main string is looped with the length of T at the beginning of the inner character until the match is successful or all traverses are completed

The efficiency problem

  • The best case is that the match is successful at the beginning, and the time is order one.

  • The worst time complexity is O((n-m+1)*m), where n is the main string length and m is the substring length to be matched

  • Naive pattern matching method is inefficient

KMP pattern matching algorithm

  • If the primary string S = “abcdefgab”, T = “abcdex”, then if the previous naive algorithm is used to match, then the process is as follows. In the process ②③④⑤⑥, the first character and the substring T start with the same character

  • The first character “a” of “abcdex” is not equal to any character in the following string “bcdex”. Since “A” is not equal to any character in the following string, then the first five characters are equal to each other for ①. This means that the first character of the substring T, “A”, cannot be equal to the second through fifth characters of the string S, so ②③④⑤⑥ is redundant

  • If we know that the first character “A” in the T string is not equal to all the following characters in the T string (that’s the premise), and that the second character “B” in the T string is equal to the second character “B” in the S string, The first character “A” in T string and the second character “b” in S string cannot be equal

  • In the same way, if the first character “A” in the T string is not equal to the following character, the “A” in the T string is not equal to the following character “C”, “D”, “E” in the S string, so the process ②③④⑤⑥ is not necessary, just ①⑥ can be reserved

  • ⑥ is reserved because T[6] ≠ S[6] in ①. Although we know T[1] ≠ T[6], we cannot conclude that T[1] is not necessarily equal to S[6].

  • What if the T string also contains the first character “A”?

  • Suppose S = “abcabcabc”, T = “abcabx”, the first 5 characters are exactly equal, and the sixth character is not equal

  • Because the first “A” of T is equal to the fourth “A”, the second “B” is equal to the fifth “B”, and the fourth “A” and the fifth “B” have been compared with the corresponding position in the main string S in ①, so it can be judged that, The first character “A” of T and the second character “b” are not compared with the fourth and fifth characters of S, so ④⑤ can also be omitted