Topic describes

This is 686 on LeetCode. Repeat string matching with medium difficulty.

Tag: “string hash”, “KMP”

Given two strings a and b, find the minimum number of times string A is repeated such that string B is a substring of the superimposed string A, or -1 if none exists.

Note: the string “ABC” superimposed 0 times is “”, 1 time is “ABC”, and 2 times is “abcabc”.

Example 1:

Input: a = "abcd", b = "cdabcdab" Output: 3Copy the code

Example 2:

Input: a = "a", b = "aa" Output: 2Copy the code

Example 3:

Input: a = "a", b = "a" output: 1Copy the code

Example 4:

Input: a = "ABC ", b = "wxyz" Output: -1Copy the code

Tip:


  • 1 < = a . l e n g t h < = 1 0 4 1 <= a.length <= 10^4

  • 1 < = b . l e n g t h < = 1 0 4 1 <= b.length <= 10^4
  • abConsists of lowercase English letters

Fundamental analysis

First of all, the “lower bound” and “upper bound” of the replication times can be analyzed:

The analysis of the “lower bound” is easy: at least it willaThe replication length is greater than or equal tobIs the length of a match.

After the “lower bound” is defined, the upper bound is the minimum number of copies needed to obtain a clear answer.

Because the pivot string is given byaCopy it multiple times and find the substring from the main stringb, so you can specify the starting position of the substring, no more thanaThe length of the.

That is, if the length goes beyond the initial matching position of length A, it must have been matched before.

Therefore, we know that the maximum “upper bound” of replication times is “lower bound + 111”.

Let the length of a be NNN, the length of b be MMM, the lower bound be c1C1c1, and the upper bound be c2= C1 +1c2 =c1+1c2 = C1 +1.

Therefore, we can replicate C2C2C2 times for A and match B after obtaining the main string. If the end position after successful matching does not exceed N ∗c1n * c1n∗ C1, it means that c1C1C1 can be replicated and c1C1C1 will be returned. If the end position exceeds C2C2C2, c2C2C2 will be returned. −1-1−1 is returned if the match fails.

CARDS often

This is my first version of AC.

Although this is a very obvious sub-string matching problem, but last night more than an hour later than usual, woke up in the morning mental state is not very good, every cell in the body is refusing to write KMP 🤣

Moved crooked brains to write a “card often” practice.

This practice confirms once again that LC’s evaluation mechanism is very strange: it actually counts both the time of single use case and the total time instead of timing each use case separately or counting the total time of use case.

As a result, I directly TLE 666 times before passing (from 700700700 to 100100100), of which 444 times TLE is shown to pass all the samples, but still TLE, I don’t understand why such a confusing mechanism should be set up.

Going back to the practice itself, first copy a to ensure that the length is greater than or equal to B, and then repeatedly “copy-check” within a certain period of time. If it can be found within the specified time, return the number of copies, otherwise return -1.

Code:

import java.time.Clock;
class Solution {
    public int repeatedStringMatch(String a, String b) {
        StringBuilder sb = new StringBuilder();
        int ans = 0;
        while (sb.length() < b.length() && ++ans > 0) sb.append(a);
        Clock clock = Clock.systemDefaultZone();
        long start = clock.millis();
        while (clock.millis() - start < 100) {
            if(sb.indexOf(b) ! = -1) return ans;
            sb.append(a);
            ans++;
        }
        return -1; }}Copy the code
  • Time complexity: O(C)O(C)O(C)
  • Space complexity: O(C)O(C)O(C)

Upper and lower bound properties

After “basic analysis”, we found that “upper and lower bounds” have accurate size relationship, in fact, do not need to use “card” method.

You only need to perform the “upper bound” replication, try to match, and return the answer based on the matching result.

Code:

class Solution {
    public int repeatedStringMatch(String a, String b) {
        StringBuilder sb = new StringBuilder();
        int ans = 0;
        while (sb.length() < b.length() && ++ans > 0) sb.append(a);
        sb.append(a);
        int idx = sb.indexOf(b);
        if (idx == -1) return -1;
        return idx + b.length() > a.length() * ans ? ans + 1: ans; }}Copy the code
  • Time complexity: ⌈mn⌉+1\left \lceil \frac{m}{n} \right \rceil +1 ⌈nm⌉+1 copy and a substring match. Complexity of O (n ∗ (⌈ mn ⌉ + 1)) O (n * (\ left \ lceil \ frac {m} {n} \ \ right rceil + 1)) O (n ∗ (⌈ nm ⌉ + 1))
  • Space complexity: O (n ∗ (⌈ mn ⌉ + 1)) O (n * (\ left \ lceil \ frac {m} {n} \ \ right rceil + 1)) O (n ∗ (⌈ nm ⌉ + 1))

KMP

IndexOf can be implemented through KMP/ string hash. If you are not familiar with KMP, you can check the KMP algorithm in detail. It explains the matching process of KMP and provides a practical template with a large number of illustrations.

Using KMP instead of indexOf effectively takes advantage of the fact that the main string is copied from multiple As.

Code:

class Solution {
    public int repeatedStringMatch(String a, String b) {
        StringBuilder sb = new StringBuilder();
        int ans = 0;
        while (sb.length() < b.length() && ++ans > 0) sb.append(a);
        sb.append(a);
        int idx = strStr(sb.toString(), b);
        if (idx == -1) return -1;
        return idx + b.length() > a.length() * ans ? ans + 1 : ans;
    }

    int strStr(String ss, String pp) {
        if (pp.isEmpty()) return 0;
        
        // Read the length of the original and matching strings respectively
        int n = ss.length(), m = pp.length();
        // Both the original string and the matching string are preceded by a space so that their subscripts start at 1
        ss = "" + ss;
        pp = "" + pp;

        char[] s = ss.toCharArray();
        char[] p = pp.toCharArray();

        // Construct the next array with the length of the matching string (next array is related to matching string)
        int[] next = new int[m + 1];
        // I = 2, j = 0, I < or equal to the length of the matching string
        for (int i = 2, j = 0; i <= m; i++) {
            // next(j) = next(j)
            while (j > 0&& p[i] ! = p[j +1]) j = next[j];
            // if the match is successful, make j++ first
            if (p[i] == p[j + 1]) j++;
            // Update next[I] to end the loop, I ++
            next[i] = j;
        }

        I = 1, j = 0, I is less than or equal to the length of the original string.
        for (int i = 1, j = 0; i <= n; i++) {
            J = next(j)
            while (j > 0&& s[i] ! = p[j +1]) j = next[j];
            // if the match is successful, j++ is used first, and i++ is used after the loop is completed
            if (s[i] == p[j + 1]) j++;
            // If the whole section matches, return the index directly
            if (j == m) return i - m;
        }
        return -1; }}Copy the code
  • Time complexity: ⌈mn⌉+1\left \lceil \frac{m}{n} \right \rceil +1 ⌈nm⌉+1 copy and a substring match. Complexity of O (n ∗ (⌈ mn ⌉ + 1)) O (n * (\ left \ lceil \ frac {m} {n} \ \ right rceil + 1)) O (n ∗ (⌈ nm ⌉ + 1))
  • Space complexity: O (n ∗ (⌈ mn ⌉ + 1)) O (n * (\ left \ lceil \ frac {m} {n} \ \ right rceil + 1)) O (n ∗ (⌈ nm ⌉ + 1))

String hash

Combined with “basic analysis”, we know that this is essentially a substring matching problem, which we can solve using “string hashing”.

Let the length of A be NNN and the length of b be MMM.

The primary string SS is obtained by copying a “upper bound” times. The purpose is to detect whether there is a substring B in SS.

In the string hash, ss and B are concatenated for convenience, and the length after concatenation is lenlenlen, then the hash value of b string is [len−m+1,len][len-m +1,len][len −m+1,len] part (subscript starting from 111), Remember to targettargettarget.

Then enumerate the starting point in the range [1,n][1, n][1,n] and try to find a hash of length MMM that is the same as targettargettarget.

Code:

class Solution {
    public int repeatedStringMatch(String a, String b) {
        StringBuilder sb = new StringBuilder();
        int ans = 0;
        while (sb.length() < b.length() && ++ans > 0) sb.append(a);
        sb.append(a);
        int idx = strHash(sb.toString(), b);
        if (idx == -1) return -1;
        return idx + b.length() > a.length() * ans ? ans + 1 : ans;
    }
    int strHash(String ss, String b) {
        int P = 131;
        int n = ss.length(), m = b.length();
        String str = ss + b;
        int len = str.length();
        int[] h = new int[len + 10], p = new int[len + 10];
        h[0] = 0; p[0] = 1;
        for (int i = 0; i < len; i++) {
            p[i + 1] = p[i] * P;
            h[i + 1] = h[i] * P + str.charAt(i);
        }
        int r = len, l = r - m + 1;
        int target = h[r] - h[l - 1] * p[r - l + 1]; // the hash of b
        for (int i = 1; i <= n; i++) {
            int j = i + m - 1;
            int cur = h[j] - h[i - 1] * p[j - i + 1]; // Substring hash value
            if (cur == target) return i - 1;
        }
        return -1; }}Copy the code
  • Time complexity: ⌈mn⌉+1\left \lceil \frac{m}{n} \right \rceil +1 ⌈nm⌉+1 copy and a substring match. Complexity of O (n ∗ (⌈ mn ⌉ + 1)) O (n * (\ left \ lceil \ frac {m} {n} \ \ right rceil + 1)) O (n ∗ (⌈ nm ⌉ + 1))
  • Space complexity: O (n ∗ (⌈ mn ⌉ + 1)) O (n * (\ left \ lceil \ frac {m} {n} \ \ right rceil + 1)) O (n ∗ (⌈ nm ⌉ + 1))

The last

This is the No.686 of our “Brush through LeetCode” series, which started on 2021/01/01. There are 1916 topics in LeetCode by the start date, some of which have locks.

In this series of articles, in addition to explaining how to solve the problem, I will also present the most concise code possible. If a general solution is involved, the corresponding code template will also be used.

In order to facilitate the students to debug and submit the code on the computer, I set up the relevant warehouse: github.com/SharingSour…

In the repository, you’ll see links to the series of articles, the corresponding code for the series of articles, the LeetCode source links, and other preferred solutions.