This article originated from personal public account: TechFlow, original is not easy, for attention


Wildcard Matching is a difficult problem, but the good news is that there are no algorithms that we haven’t seen before.

Given two strings s and p, s is the parent string and P is the pattern string. A brief explanation of these two concepts, which are commonly used in string matching problems. For those of you who don’t understand, let’s use an inappropriate analogy, we can think of the parent string as a lock, and the pattern string as a key. Some master keys open more than one lock, meaning the key can change and the lock is fixed. What we need to determine is whether the pattern string matches the parent string, which is whether the key can open the lock.


There are two special characters that can occur in the pattern string p. , which matches any single character. There is also *, which matches any string of characters, including empty strings. So this star looks very powerful, one can match arbitrarily. But there’s a constraint here, s and P have to match exactly, not partially. This means that all the characters in p are used. For example, the s string is aab and the p string is *c. * is enough to match all the contents of the s string, but it is not a valid match because a c is not used afterwards.

For simplicity, we use si for the i-th character of the S-string and PI for the i-th character of the p-string.

search

Let’s start with the simple way of thinking about violence first, and it’s easy to see that there’s no simple way to calculate violence in this problem. The reason is simple, because when the * is present, we don’t know how long it should match. It could be 0, it could be any length that matches to the end.

To solve this problem, the best way is to try both and enumerate the lengths that the * symbol should match. However, the number of * may be more than one, and if there are more than one, the decision on how to match the symbol * is not made at the same time, but successively. And the queen of the eight queens is not placed together, but in order, if we regard * as the queen, in fact, this is a more obvious search problem.

If you want to solve this problem as a search problem, it is not difficult, in fact, is a simple enumeration.

We search for the position where the S string and the P string match. We use si to represent the characters that the S string currently needs to match, and PI to represent the characters that the p string currently needs to match. If the character of PI position is not *, then the situation is limited, we only need to determine the upper and lower bounds and whether they match. If the position of PI is *, the situation is a little more complicated, because the presence of * implies three decisions.

The first decision is not to use * to match si at the current position. In this case, it means that we throw away the *, but SI still has not found a match, so SI cannot move. From this point, we can conclude that the next position to move to is Si, PI +1. That is, the pointer in P moves by one bit, but the pointer in S stays put, waiting to continue matching.

The second decision is to match only the current SI and not anything after the SI. In this case, we just need to treat the * as normal character processing, so we should move to si+1, PI +1. In other words, the Pointers in s and P have moved by one place each.

The third decision is * not only matches the current SI, but also matches further down, in which case we need to keep the position of PI unchanged and only move si, that is, we move to si+1 and PI.

But if we take a closer look, we find that these three conditions can actually be combined. In the second case, we only match the current position, which is equivalent to executing the third strategy at the current position and executing strategy 1 at the post-transition position.

For example, the s string is ABC, and the P string is A * C. When we match the second character, we execute strategy 3, and jump to 2, 1. Si is c, PI is *, and if we execute strategy 1, we jump to 2, 2, which is the same as when we execute strategy 2 at the beginning. This can reduce the decision-making process and speed up the process when there are many *.

Finally, since this is a search question, we don’t need to find all the solutions. In this case, using BFS would have been more efficient than using DFS, but unfortunately I tried both methods and couldn’t pass because it timed out. Maybe it’s Python (interpreted languages are inefficient), because I can get by in C++.

It won’t pass, but it’s the right idea. If you don’t mind, take a look at the code:

class Solution:



def isMatch(self, s: str, p: str) -> bool:

n, m = len(s), len(p)

if m == 1 and p[0] = =The '*':

return True

import queue

que = queue.Queue()

que.put((0.0))

while not que.empty():

si, pi = que.get()

if si == n and pi == m:

return True

# if the s string is finished matching, the P string is not finished yet

We need to check if the p string is *

If both ends of the p string are *, then it will match

if si == n:

if p[pi] == The '*':

que.put((si, pi + 1))

continue

continue

elif pi >= m:

continue

If there is no *, then we judge whether it matches

elif p[pi] == '? ' or p[pi] == s[si]:

que.put((si + 1, pi + 1))

continue

# If not, go down in two ways

elif p[pi] == The '*':

que.put((si, pi + 1))

que.put((si+1, pi))

return False

Copy the code

Dynamic programming

Although the above method can not pass, but in fact has given us a lot of inspiration, if we regard the si and PI mentioned above as states, although we use BFS to achieve, but in essence is already the idea of dynamic programming.

In some dynamic programming problems, we also maintain state through queues. We store all the legal states in the queue, and when we move to the new legal states through states and decisions, we continue to insert into the queue. So in this sense, dynamic programming is similar in nature to the search problem, or more accurately, dynamic programming is just an idea, and it can be implemented in a variety of carriers, either using array recursion, or using data structures to maintain state. This is very important, and is the basis for our dynamic programming.

Why do we run out of time?

If we think of this as a dynamic programming, we’ll see that this is a sequential dynamic programming. Briefly, although the state and transition of dynamic programming are not the same in each problem, there are only two overall ideas. One is called pusher and one is called pusher.


In the sequential thinking, we record all the legal states, and then start from the legal states to transfer through decision making, and record the transferred states for subsequent transfer. If we chain these states together, we see that we are following a logical order, which is why we call it down. Reverse is the opposite. We traverse the unknown state and determine the outcome of the current unknown state by exploring its source through the transition relationship between the states. That is, one goes from the known to the unknown, and the other goes from the unknown to its known source.

Why doesn’t it work in this case? Because when * occurs, we still have * in the state we’re pushing down. For example, when we decide to use * to match si, we move to si+1, PI. The backward states also contain *, and will face more states. Many of these states overlap, and many of these overlapping states cannot be solved even by pruning, such as a series of * in a P string. And now we can think about the inverse a little bit differently.

We use dp[I][j] to record the matching states of s[: I +1] and P [:j+1], then what states may be transferred from dp[I][j]? In fact, very limited, if si and PI match, then it can be transferred from dp[i-1][J-1]. If they don’t match, we need to check whether PI is *. If so, then there are two cases, one is * match empty, give si to pi-1, so it can be transferred from dp[I][J-1]. The other is to match si, because * can match more than one number, so at this time can be transferred from dp[i-1][j]. Since there are only a limited number of states for each I and j, we can completely enumerate all I and j backwards, and the complexity is O(nm), which is completely controllable.

This is a very obvious dynamic programming problem if you sort out the above paragraph.

Let’s look at the code and get more details from the code.

class Solution:



def isMatch(self, s: str, p: str) -> bool:

n, m = len(s), len(p)

if m == 1 and p[0] = =The '*':

return True

s = ' ' + s

p = ' ' + p

The first range is the number of columns, the second range is the number of rows

Array dp[n][m

dp = [[False for _ in range(m+2)] for _ in range(n+2)]

dp[0] [0] = True

# I starts at 0, because an * at the beginning of p matches an empty string

for i in range(0, n+1) :

for j in range(1, m+1) :

Si and Pj match

if s[i] == p[j] or p[j] == '? ':

dp[i][j] = dp[i- 1][j- 1] if i > 0 else False

continue

# pj *, then determine whether there is a legitimate transfer upstream

if p[j] == The '*':

dp[i][j] = (dp[i- 1][j] if i > 0 else False) | dp[i][j- 1]

return dp[n][m]

Copy the code

This code is short, but it contains a lot of details, including conditions such as subscripts and boundaries. So I suggest you to understand the idea of writing and then compare to the code to think, I believe there will be more harvest. Because many details may be understood after seeing the wrong case and thinking about it, this is also a good point in LeetCode. You can see the test data and know where you are wrong.

That’s all for today’s article. In addition, in the next article, I shared the use method of LeetCode plug-in in vscode. If you are interested, don’t miss it.

If you feel that something has been gained, please feel free to click on the attention or forward, your help is very important to me.