The question

Given a string s, find the longest palindromic substring in s. You may assume that the maximum length of s is 1000.

Link: https://leetcode.com/problems/longest-palindromic-substring/
Copy the code

translation

Given a string s, find the longest subroutine in it. We can assume that the length of s string is at most 1000.

The sample

Example 1:
Input: "babad"Output: "bab"Note: "aba" is also a valid answer.Example 2:
Input: "cbbd"Output: "bb"
Copy the code

Analysis of the

Although the difficulty of this problem is given as Medium in LeetCode, it is actually not easy, and it is difficult for us to figure out the best solution through our own thinking.

Let’s put all the algorithms aside and start with the simplest method. The easiest way to do this, of course, is to enumerate violence, but this problem is different from the previous string problem. When we enumerate violence, we do not need to enumerate all the starting positions to determine whether the substring is palindrome. In fact, we can use the property that two sides of a palindrome are equal to each other to directly enumerate the center position of the palindrome string. If two sides are equal to each other, we extend them. So we need to enumerate at most n palindromes, traversal at most n times per enumeration. So the final complexity is zero.

If you look at this complexity, you’ll see that this is not an optimal solution. But enumeration of violence is not the best solution for the current problem, but it’s actually a pretty good solution, not as bad as we thought, and if you don’t believe me, let’s look at another solution that looks a lot more advanced.


Dynamic Programming (DP)


There is another trick in this problem that takes advantage of the properties of palindromic strings. For a string S, if we flip it to get S_, obviously the subroutines in it will not change. So if we take the longest common subsequence of two strings before and after the flip, the result is a callback substring.

The introduction to algorithms explains this problem by using the dynamic programming algorithm, that is, for all positions I in string S and all positions J in S_, we use a DP array to record the maximum result of the common subsequence that can be formed by substrings of S and S_ ending in I and j.

Obviously, for I =0, j=0, dp[I][j] =0 (assuming string subscript starts at 1)

We write the code for DP:

for i in range(1, n):
  for j in range(1, m):
    if S[i] == S_[j]:
      dp[i][j] = dp[i-1][j-1] + 1
    else:
      dp[i][j] = max(dp[i-1][j], dp[i][j-1])
Copy the code

And it’s not hard to see that the complexity of this solution is also. And so is spatial complexityThat is to say, we have done all this work without any optimization. So from that point of view, violent search is not a bad solution for this problem.

So that’s it, that’s about it, but let’s get straight to the problem, and the best way to solve this problem,Manchester algorithm to obtain the maximum substring of a text in time.


Chester algorithm


In addition to the property we just mentioned, palindromic strings have another property, which is that they are odd and even. In short, the length of a palindrome string can be odd or even. If it is odd, the palindrome center of the string is one character; if it is even, the palindrome center actually falls between two characters.

Here’s an example:

Both ABA and ABBA are palindrome strings, the former being odd palindrome and the latter even palindrome.

These two cases are inconsistent and it is difficult for us to discuss them together. In order to simplify the problem, we need to do a preprocessing to change all palindromes into odd palindromes. To do this, it’s quite simple. We insert a special character # between all two characters.

Such as:

abba -> #a#b#b#a#

In this way, the palindrome center becomes the # in the middle. Let’s look at the original palindrome:

aba -> #a#b#a#

Palindrome center is still on B, still odd palindrome. Preprocessed code:

def preprocess(text):
    new_str = The '#'
    for c in text:
        new_str += c + The '#'
    return new_str
Copy the code

The Manchester algorithm uses three variables, which are the array P, IDX and Mr. Let’s go through them one by one.

The first is the radis array, which contains the radius of the longest palindrome string that can be formed at each position. Notice, it’s not the length, it’s the radius.

Let’s take an example:

The string S# a # b # b # a #
radis      1 2 1 2 5 2 1 2 1
Copy the code

So instead of thinking about what the RADis array is, let’s look at its properties.

First, the radius of the palindrome string at position I is Radis [I], so what is its length? Very simple: Radis [2] * 2-1. So what’s the length left after removing the # from this string? In other words, what is the length before preprocessing?

The total length is radis[I] * 2-1, where # is one more than the number of letters, so the length of the original string is (radis[I] * 2-1-1)/2 = radis[I] -1.

In other words, the length of the original string is linked to the RADis array.

Idx is easy to understand. It simply refers to a subscript in an array, followed by Mr, which is short for MOST_right. It records how far to the right the palindrome string can extend before the current position I.

It sounds like a mouthful, but here’s an example:

In this case, I is less than Mr, and the palindrome center corresponding to Mr Is ID. So I is in the palindrome range of ID, and for I, we can get its symmetric position with respect to ID: ID * 2 – I, which we set equal to i_. What’s the use of knowing where this symmetry is? Very simply, we can quickly determine the lower bound of Radis [I]. By the time we get to I, we already have the result for the i_ position. From the result of the i_ position, we can deduce the range of the I position.

radis[i] >= min(radis[i_], mr-i)

Why is that?

So let’s write it all out. Suppose Mr -i > Radis. Then all the palindromes at i_ will fall into the palindromes at ID. At this time, we can determine radis[I]=radis[i_]. Why is that?

Because according to the principle of symmetry, if the i-centered palindrome string is longer, we assume its length is radis[i_]+1. What will be the result? If this happens, then the string’s symmetric position with respect to ID is also palindromic by symmetry with respect to ID. So radis[i_1] should have the same number, and that’s a contradiction. If you don’t get it from the description, here’s an example:

S:       c a b c b d b c b a 
cradis:    x_  i_  5   i   x
Copy the code

In this case, Mr -i=5, radis[i_]=2. So Mr -i > Radis [i_]. If radis[I]=3, the position of x should be equal to the position of ID, and by symmetry, the position of x_ should also be equal to the position of ID. So radis[i_] should also be 3. This contradicts that it is equal to 2, so this cannot happen, radis[i_] limits the possibility of the I position if Mr Is far enough away.

Let’s look at the other case, what happens if Mr -i < radis[i_]?

In this case, because Mr Is too close to I, the radius of the symmetric position of I cannot be expanded at I. But there may still be characters on the right side of Mr. Can these characters form a new palindrome?

The character string is S XXXXXXXXSXXXXXXXXXXXXXXX radis i_ id I MrCopy the code

That is, will S[Mr +1] be in the same position as S[I * 2-MR-1]? We can tell the answer without judging, and the answer is no. Let’s look at the picture:

According to the symmetry, if the position of Mr +1 for I can form a new symmetry. Since radis[i_] > MR-I, that is, for the i_ position, its range of symmetry radiates to the left of the symmetry point of Mr. Let’s assume that the letter here is a, and by symmetry we know that the position of Mr +1 should also be a. In this way, the two AS can form a new symmetry, so that the radius of the ID position can be extended by 1, which constitutes a contradiction. Therefore, in this case, radis[I] can only be equal to MR-I due to the restriction of MR-I.

So under what circumstances can the radius of I continue to expand?

Only when Mr -i == radis[i_], the left side of the palindrome string formed by ID may not form a new palindrome for I_, but the right side may.

In the example above, the palindrome for position i_ extends to the left only to ML, because the position mL-1 is not equal to the position symmetric about i_. For the right side of Mr, it can be completely symmetric with I points without affecting the correctness of raids[ID]. At this point, we can continue to iterate through the loop, expanding the palindrome string of I positions.

Although the analysis of the whole process is many and complicated, it is not written in code.

# initialization
idx, mr = 0, 0
# To prevent overbounds, set the string to start at 1
for i in range(1, n):
  # Direct calculation of Radis by symmetry [I]
  radis[i] = 1 if mr <= i else min(radis[2 * idx - i], mr - i)
  # only if radis[i_] = mr-i
  ifradis[2 * idx - i] ! = mr - i and mr > i:continue
  # Continue to judge the position below
  while s[radis[i] + i] == s[i - radis[i]]:
    radis[i] += 1
  # Update idX and Mr Location
  if radis[i] + i > mr:
    mr = radis[i] + i
    idx = i
Copy the code

At this point, the Manchester algorithm is done. Even though we spend so much time introducing it, it’s only a few lines of code. I have to say, it’s very clever, and it may take a lot of thinking to really understand it the first time around.

But we still have an open question, why is such a double-loop algorithm order (n)?

To understand this, we need to put aside all the illusions and look at the essence. We don’t know how many times this loop has been done, but two things are certain. With these two points, we can get to the essence of complexity.

First, Mr Is increasing, it only gets bigger, it doesn’t get smaller. Second, Mr Ranges from 0 to n, and each increase in Mr Is the number of cycles.

So even if we don’t know how many times Mr Changes, how much it changes each time, we can still be sure that this is an O(n) algorithm.

Here, the content of the article is over, if you like it, please click to follow it ~