The introduction

Regular Expression Matching is a classic dynamic programming problem in LeetCode, but it will also explain violence search, memorization search, and how to better solve the dynamic programming problem of string class


Subject analysis

LeetCode 10. Regular Expression Matching

Regular Expression contains two symbols: ‘.’ and ‘*’. The former can match any character, and the latter can repeat the preceding character zero or more times. Match (” aa “, “a”) = > false, match (” ab “, “*”) = > true, match (” aab “, “c * a * b”) = > true

** does not appear at the beginning of a string, and * cannot be preceded by *, for example, “A **b”. However, our code implementation must be based on this basis. Otherwise, we will not be able to solve the problem by considering cases too much.


Violence to solve

Recursively violent depth-first search method is often tiger balm, search problem here you simply consider two things, one is that whether this problem can be divided into several small sub-problems, 2 it is, after each segment of the subproblem several state, which is considered in the current subproblems, how many kind of different possibilities. So once you know that, you can recursively solve for each state of each subproblem.

So this might be a little abstract, but in this case, the problem is, if you put in a string s, and you match a string P, you want to know if those two strings match. Let’s first think about whether the string comparison problem can be broken down into subproblems, and you can see that strings can be broken down into characters, and then the string comparison problem becomes a character comparison problem, and then we can view the problem as, Decide whether the s/I,… n can match p [j] j,… is the condition of subproblems s [n] I + 1,… can’t match the p [m] j + 1,…, take another look at s and p [I] [j] matches, But the current problem here is whether s[I] and p[j] match. If this is true, we will continue to see whether s[I +1…n] and p[j+1…m] match. Note that when I say s[I] p[j] s[s+1…n] p[j+1…m], I do not mean to only consider the mismatch between the two characters at present, it is just to refer to the current problem and the problem to be considered later, the current problem may only need to compare one character, may need to compare more than one character, This brings us to the second point mentioned earlier, and we also need to consider the state of the current problem. For the string s, there are no special characters, and the characters in the current problem will only be letters, but for p, we need to consider two special symbols, as well as letters, here are all the possibilities, if the current subproblem is s[I,…n] and P [j…m]:

  • S = = p [j], [I] son problem set up or not depends on the child s/I + 1,… n and p [j] j + 1,…
  • P [j] = = ‘. ‘, sub problem set up or not depends on the child s/I + 1,… n and p [j] j + 1,…
  • p[j] ! = ‘.’ && p[j] ! = ‘*’ && p[j] ! = s[I], the current subproblem is not valid
  • P [j] == ‘*’, s[I]! = p[j], the validity of the subproblem depends on the subproblems S [I…n] and P [j+2…m].
  • P [j + 1) = = ‘*’ s = = p [j], [I] son problem set up or not depends on the child s/I + 1,… n and p [j] j,… or s/I,… n and p [j] j + 2,…

Here I explain the first 4 and 5 kinds of circumstances, the title description before said that starting character can’t be * p, that is to say, in the front of the * must have a letter, by definition, here we can count the number of elements in the * in front of the as is zero or one or more, if it is zero, so we can only see, S/I,… n and p/j + 2,… n matches, if one or more, then we can see the s/I + 1,… n and p [j] j,… is set up, P [j] == s[I] or p[j] == ‘.’, we can combine the code to see

public boolean isMatch(String s, String p) {
    if (s.equals(p)) {
        return true;
    }
    
    boolean isFirstMatch = false;
    if(! s.isEmpty() && ! p.isEmpty() && (s.charAt(0) == p.charAt(0) || p.charAt(0) = ='. ')) {
        isFirstMatch = true;
    }
    
    if (p.length() >= 2 && p.charAt(1) = =The '*') {
        / / see s/I,... n and p [j] j + 2,... or the s/I + 1,... n and p [j] j,...
        return isMatch(s, p.substring(2))
                 || (isFirstMatch && isMatch(s.substring(1), p));
    }
    
    // look at s[I +1...n] and p[j+1...m]
    return isFirstMatch && isMatch(s.substring(1), p.substring(1));
}
Copy the code

The implementation above is called a brute force solution because the answer to the subproblem is not recorded, which means that if the answer to the previous subproblem is used at the moment, we have to compute the previously computed subproblem.


Mnemonic search

The above violent solution is because there is no record of the answer, memorization search is in the “silly search” on the basis of adding “notepad”. Explain ahead of time to stay here will give you the code, I to change the direction of the recursion, of course it is not necessary, mainly want to explain, for this problem, considering from backward and once upon a time in the future to consider is feasible, but from the back forward considering overlap cases in more, this “notepad” embodied features will be bigger.

We assume that the current problem is to consider the ith letter of S and the JTH letter of P, so the sub-problem is whether s[0… I] and P [0…j] match:

  • P [j] is a letter, and S [I] == p[j], whether the current subproblem is valid depends on whether the subproblems s[0…i-1] and P [0…j-1] are valid
  • P [j]. ‘ ‘, the sub problem set up or not depends on the child s [0… I – 1) and p [0]… j – 1 is established
  • P [j] is a letter, and s[I]! = p[j], the current subproblem is not valid
  • P [j] is’ * ‘s [I] = = p [1], or p [1] = =’. ‘, the sub problem set up or not depends on the child s [0… I – 1) and p [j] 0… is established
  • P [j] is ‘*’, s[I]! = p [1], the current problems correctly or not depends on the child s [0… I] matches p j – 2] [0,…

Whether it’s from the front to the back, or from the back to the front, as you can see, the points are the same, but we’ve added a notepad here.

public boolean isMatch(String s, String p) {
    if (s.equals(p)) {
        return true;
    }
    
    boolean[] memo = new boolean[s.length() + 1];
    
    return helper(s.toCharArray(), p.toCharArray(), 
                  s.length() - 1, p.length() - 1, memo);
}

private boolean helper(char[] s, char[] p, int i, int j, boolean[] memo) {
    if (memo[i + 1]) {
        return true;
    }
    
    if (i == -1 && j == -1) {
        memo[i + 1] = true;
        return true;
    }
            
    boolean isFirstMatching = false;
    
    if (i >= 0 && j >= 0 && (s[i] == p[j] || p[j] == '. ' 
          || (p[j] == The '*' && (p[j - 1] == s[i] || p[j - 1] = ='. ')))) {
        isFirstMatching = true;
    }
    
    if (j >= 1 && p[j] == The '*') {
        // look at s[0,... I] and p[0,...j-2]
        boolean zero = helper(s, p, i, j - 2, memo);
        // look at s[0,... I -1] and p[0,...j]
        boolean match = isFirstMatching && helper(s, p, i - 1, j, memo);
        
        if (zero || match) {
            memo[i + 1] = true;
        }
        
        return memo[i + 1];
    }
    
    // look at s[0,...i-1] and p[0,...j-1]
    if (isFirstMatching && helper(s, p, i - 1, j - 1, memo)) {
        memo[i + 1] = true;
    }
    
    return memo[i + 1];
}
Copy the code

Except for notepads, there wasn’t much difference. In fact, this is kind of dynamic programming.


Dynamic programming

With the two approaches and explanations above as a foundation, I think iterative dynamic programming should be easy to understand. Instead of recursion, we’ll use a for loop, starting with the code:

public boolean isMatch(String s, String p) {
    if (s.equals(p)) {
        return true;
    }
    
    char[] sArr = s.toCharArray();
    char[] pArr = p.toCharArray();
    
    // dp[i][j] => is s[0, i - 1] match p[0, j - 1] ?
    boolean[][] dp = new boolean[sArr.length + 1][pArr.length + 1];
    
    dp[0] [0] = true;
    
    for (int i = 1; i <= pArr.length; ++i) {
        dp[0][i] = pArr[i - 1] = =The '*' ? dp[0][i - 2] : false;
    }
    
    for (int i = 1; i <= sArr.length; ++i) {
        for (int j = 1; j <= pArr.length; ++j) {
            if (sArr[i - 1] == pArr[j - 1] || pArr[j - 1] = ='. ') {
                // look at s[0,...i-1] and p[0,...j-1]
                dp[i][j] = dp[i - 1][j - 1];
            }
            
            if (pArr[j - 1] = =The '*') {
                // look at s[0,... I] and p[0,...j-2]
                dp[i][j] |= dp[i][j - 2];
                
                if (pArr[j - 2] == sArr[i - 1] || pArr[j - 2] = ='. ') {
                    // look at s[0,... I -1] and p[0,...j]
                    dp[i][j] |= dp[i - 1][j]; }}}}return dp[sArr.length][pArr.length];
}
Copy the code

I’m going to talk about the initialization of the DP array, because we need to consider the empty string case, so we opened the DP array by 1. Dp [0][0] =true, dp[0][0] =true, dp[0][0] =true, dp[0][2]=true, dp[0][2]=true, dp[0][2]=true, dp[0][2]=true That is, s[0,0] matches p[0,2], but the difference here is that 0 represents an empty string.


Summary and thinking about dynamic programming of string matching class

In general, for string matching problem, the subject will be two string input parameters, if confirmed the problem can be decomposed into a series of subproblems, then consider using dynamic programming to solve, can according to the interval to define state, generally only need to consider the head or the tail interval, the problem of the dynamic programming method, We consider whether the head interval, s[0,… I] and p[0,…j] match in dp[I +1][j+1]. If you choose the tail interval, then you need to traverse backwards, just like the mnemonic search explained earlier. In general, DP arrays with dynamic programming for string matching are two-dimensional, although there are exceptions. In my opinion, if the interval and ergodic direction are determined, at least the derivation of the equation of state of dynamic programming will be much clearer.

Then there is the key part of recursive equation is derived, there is no much more special skills, or that sentence, only hand cooked, without him, to say the key words, or prior to determine the current problems and face the problem of contact, or you can think like this “current consideration of subproblems in what circumstances will be solved before the child problem”, or to take the problem, for example, If we remove s[I] and p[j], the problem will become the previously solved subproblems s[0,…i-1] and P [0,…j-1]. If we only remove s[I], If p[j-1] and p[j] are removed, the problem becomes s[0,… I] and P [0,…j-2]. As for why some can be removed and others cannot, That this can only be analyzed according to the meaning of the question, I believe that through the previous analysis should not be difficult to understand.

With the above analysis, here are some considerations for dynamic programming of string matching classes:

  • Consider whether you need to consider the empty string case, if so, you need to open the DP array one more space
  • Before the recursive equation is considered, the interval and ergodic direction of the subproblem are determined
  • When you think about recursive equations, how does the current subproblem become a subproblem that you solved before


That’s all for this time, I hope to help you understand this problem, or understand the string matching class dynamic programming.