This article is participating in the nuggets team number online activity, click to see the dachang spring recruiting positions

I. Title Description:

This is a function that implements a very common string match: LeetCode 28. Implement strStr(), which I will write using the classic KMP algorithm. As THE KMP algorithm is very classic, as long as you are a computer major student or have studied the algorithm, you should know the KMP algorithm. Therefore, IN this paper, I will not give a strict proof, but mainly focus on the explanation of the code, and try to make you understand the KMP algorithm.

Ii. Analysis of Ideas:

Let SSS be the given string and PPP be the pattern string to look up. A naive string match starts at bit III of SSS and bit 000th of PPP. If a mismatch is found, the match starts again at bit I +1i + 1I +1 of SSS and bit 000th of PPP.

There are many unnecessary fallbacks in naive practices, and by documenting some of the properties of PPP itself, we can reduce fallbacks.

  1. Prefix ∣mid∣postfix∣chprefix \mid mid \mid postfix \mid chprefix∣mid∣postfix∣ch Prefixprefixprefix is the prefix of PPP, postfixpostfixpostfix is the suffix of PPP, they are equal, also called the longest common prefix, midmidmid is the string between them. CHCHCH represents the character we are currently matching.

  2. If s[I]s[I] S [I]s[I] does not match CHCHCH and SSS, then we do not need to start from p[0]p[0]p[0] p[0]. We can directly match s[I] S [I] and mid[0]mid[0]mid[0]. Mid [0]mid[0]mid[0] mid[0]mid[0] Prefixprefixprefix must match the string before s[I]s[I].

  3. Next [I]next[I] Next [I]next[I] represents the longest common prefix of p[0… I −1]p[0… I −1]p[0… I −1] In particular, next[0]=− 1Next [0]= -1next[0]=−1, which is the core of the KMP algorithm. Note that the longest common prefix here is not strictly true, because a string is also a prefix of itself, and the longest should be itself, meaning the longest common prefix less than itself.

  4. Next [I]next[I]next[0… I −1]next[0… I −1]next[0… I −1] Next [0… I −1] I left this in the comments of the code. For brevity, I used a prefix in the comments instead of the longest public prefix, and the same goes for suffixes.

Iii. AC Code:

class Solution {
private:
    vector<int> next;
public:
    int strStr(string haystack, string needle) {
        buildNext(needle);
        return match(haystack, needle);
    }

    int match(const string &s, const string &p) {
        const int n = s.length(), m = p.length(a);// Use the position of j to indicate whether the match is complete
        // If j == m, all matches must be completed, so there is no need to roll back I in the following loop
        int i = 0, j = 0;
        while (i < n && j < m) {
            // if j = -1, the current position does not match
            // Naturally I will ++ to the next position
            if (j < 0 || s[i] == p[j]) {
                ++i, ++j;
            } else{ j = next[j]; }}if (j == m) {
            return i - j;
        }
        return - 1;
    }

    void buildNext(const string &p) {
        int len = p.length(a);if (len == 0) {
            return;
        }
        next = vector<int>(len);
        // l indicates the subscript at the end of the prefix, and r indicates the subscript at the end of the suffix
        int l = next[0] = - 1, r = 0;
        while (r < len - 1) {
            // Where l, r is not ++ yet
            // if l < 0, p[0...r] has no common prefix or suffix, which would make next[r + 1] = 0;
            // if p[l] == p[r], the length of the common suffix can be increased by 1, and the result of l + 1 is exactly the length of the prefix
            if (l < 0 || p[l] == p[r]) {
                ++l, ++r;
                next[r] = l;
            } else {
                // If none of the above conditions are met, we need to reduce the length of the matched prefix and suffix
                // next[l] = next[ll = next[l]; }}}};Copy the code

Iv. Summary:

The idea of KMP algorithm is very clever, but the code implementation is not very clear, although it is easy to understand, but the implementation may encounter various problems. Either way, it’s best to take the time to do it yourself, and it may be a great harvest.