————— the next day —————






What does that mean? Let’s take an example:



In the figure above, the string B is A substring of A, and the first occurrence of B in A is subscript 2 (the first subscript of the string is 0), so 2 is returned.


Let’s take another example:



In the figure above, the string B does not exist in A, so -1 is returned.


To unify the concept, we refer to string A as the main string and string B as the pattern string later in this article.



Xiao Hui’s idea is simple and crude. Let’s use the following example to demonstrate:



In the first round, we start with the first digit of the main string and compare the characters of the main and pattern strings one by one:



Obviously, the first character of the main string is A, and the first character of the pattern string is B, and the two do not match.



In the second round, we move the pattern string back one bit, starting at the second position of the main string, and compare the characters of the main and pattern strings one by one:



The second character of the main string is b, and the second character of the pattern string is also B.



The third character of the main string is B, and the third character of the pattern string is ALSO C, which do not match.



In the third round, we move the pattern string back one more bit, starting with the third bit of the main string, and compare the characters of the main and pattern strings one by one:



The third character of the main string is b, and the third character of the pattern string is also B.



The fourth character of the main string is C, and the fourth character of the pattern string is also C.



The fifth character of the main string is e, and the fifth character of the pattern string is also E.


As a result, the pattern string BCE is a substring of the main string abbcefgh, and the subscript at the position where the main string first appears is 2:



So that’s gray’s solution, which has a name, BF, for Brute Force.




In this case, the first three characters of the pattern string a match the characters of the main string in each round of character matching, until the last character of the pattern string B, the mismatch is found:




As a result, the two strings need to be compared four times in each round, which is obviously wasteful.


Assuming that the length of the main string is M and the length of the pattern string is N, then in this extreme case, the worst time complexity of BF algorithm is O (Mn).



— — — — — —









What does it mean to compare hashes?


Those of you who have used hash tables know that every string can be converted to an integer, using some hash algorithm, and that integer is hashcode:


Hashcode = hash (string)


Obviously, comparing hashCode with just two strings is much easier than comparing two strings character by character.




Given the main and pattern strings as follows (assuming the string contains only 26 lowercase letters) :



As a first step, we need to generate hashCode for the pattern string.


There are various algorithms that generate HashCode, such as:


According to a combined



This is the simplest way, we can think of a as 1, B as 2, and C as 3…… Then add all the characters of the string, and the result is its Hashcode.


bce = 2 + 3 + 5 = 10


However, although this algorithm is simple, it is likely to cause hash conflicts, such as the same HASHcode for BCE, BEC, and CBE.


Convert to base 26



Since the string contains only 26 lower-case letters, we can evaluate each string as a 26 base number.


bce = 2*(26^2) + 3*26 + 5 = 1435


The advantage of this method is that it greatly reduces hash conflicts, but the disadvantage is that it is a large amount of computation and may be beyond the range of integers, and the result needs to be modulo.



For demonstration purposes, we will use the bitwise hash algorithm, so the BCE hashcode is 10:



Second, generate the hashcode for the first equal primary string in the main string.


Since the main string is usually longer than the pattern string, it doesn’t make sense to convert the entire main string to HashCode. It only makes sense to compare substrings of the same length as the pattern string.


So we first generate hashCode, the first substring of the main string that is the same length as the pattern string,

Abb = 1 + 2 + 2 = 5:


Third, compare the two Hashcodes.



Obviously, 5! =10, indicating that the pattern string and the first substring do not match, we continue the next round of comparison.


The fourth step is to generate the hashCode for the second equal primary string in the main string.


BBC = 2 + 2 + 3 = 7:



Fifth, compare the two HashCodes.


Obviously, 7! =10, indicating that the pattern string and the second substring do not match, we continue to the next round of comparison.


Step 6, generate the hashCode for the third equal primary string in the main string.


Bce = 2 + 3 + 5 = 10:



Seventh, compare the two Hashcodes.


Obviously, 10 ==10, both hash values are equal! Does that mean that the two strings are equal?


Don’t get too excited, we need further validation due to the possibility of hash collisions.


Step 8, compare two strings character by character.


The hashCode comparison is just a preliminary verification, and then we need to compare the two strings character by character, like the BF algorithm, and finally determine the two strings match.



Finally, it is concluded that the pattern string BCE is a substring of the main string ABbcefgh, and the first occurrence of the subscript is 2.





What does that mean? Let’s look at another example:


In the figure above, I know that the hashcode of the substring abbcefg is 26. How do I calculate the hashcode of the next substring, bbcefgd?


Instead of resuming the substring’s characters, we can take a simpler approach. Since the new substring is preceded by an A and followed by a D, so:


New HashCode = old HashCode -1+4 = 26-1+4 = 29


The calculation of the next substring bcefgDE is the same:


New HashCode = old HashCode -2+5 = 29-2+5 = 32



Public static int rabinKarp(String STR, String pattern){// static int m = str.length(); // Pattern string length int n = pattern.length(); // Compute mode stringhashThe value of the int patternCode =hash(pattern); // Count the first substring of the main string as long as the pattern stringhashThe value of the int strCode =hash(str.substring(0, n)); // use pattern stringhashValue and local to the main stringhashValue comparison. // If a match is made, an exact comparison is made; If they do not match, the values of adjacent substrings in the main string are calculatedhashValue.for (int i=0; i<m-n+1; i++) {        if(strCode == patternCode && compareString(i, str, pattern)){            returni; } // If not the last round, update the main string from I to I +nhashvalueif(i<m-n){ strCode = nextHash(str, strCode, i, n); }}return- 1; }private static inthash(String str){ int hashcode = 0; // The simplest hashcode is used here: // Take a as 1, b as 2, and C as 3..... And then we add them in bitsfor (int i = 0; i < str.length(); i++) {        hashcode += str.charAt(i)-'a';    }    returnhashcode; }private static int nextHash(String str, inthash, int index, int n){    hash -= str.charAt(index)-'a';    hash += str.charAt(index+n)-'a';    return hash; }private static boolean compareString(int i, String str, String pattern) { String strSub = str.substring(i, i+pattern.length());returnstrSub.equals(pattern); }public static void main(String[] args) { String str ="aacdesadsdfer";    String pattern = "adsd";    System.out.println("Position of first appearance :"+ rabinKarp(str, pattern)); }Copy the code







— — the END — — –



Like this article friends, welcome to pay attention to the public number programmer xiao Grey, watch more exciting content