preface

A new series, “Easy to Understand Data Structures and Algorithms by Looking at Pictures”, is launched, mainly using pictures to describe common data structures and algorithms, easy to read and understand. This series includes dozens of examples of heaps, queues, lists, trees, graphs, sorts, and more.

Longest common subsequence

Longest Common Subsequence (LCS) A sequence, if it is the oldest of two or more known sequences, is called the longest common subsequence.

Also, note that the longest common subsequence is not the same as the longest common substring, as shown in the following example.

There are sequences S1 and S2, where S1=hello and S2=hero. So the longest common subsequence is heo, and the longest common subsequence is He. It can be seen that the difference lies in one allowing discontinuity, one requiring continuity, and the common feature is to maintain sequential nature.

Brute force

Brute force method is the simplest and most intuitive solution, since it is violent that efficiency must be the worst. There areandTwo sequences, the exhaustive process starts by enumerating all possible subsequences, and for sequence X, the number of subsequences reaches, so the time complexity of this part reaches. And the time complexity of each sub-sequence to match sequence Y is, so the time complexity of the whole process is. In other words, when the time complexity of brute force exhaustive method is exponential, and the actual sequence length may be long, it is almost impossible to use this method.

Why is the number of subsequences? All sub-sequences of A sequence can be regarded as the sequence formed after removing several (0 to m) elements from A sequence, such as ABC, {ABC} when removing 0 elements, {BC,AC,AB} when removing 1 element, {C,B,A} when removing 2 elements, and null when removing 3 elements.

Violent exhaustion general steps:

  1. For sequence X, enumerate all subsequences;
  2. For each sub-sequence matching sequence Y in step 1, the oldest matched sequence is recorded;

Dynamic programming

In view of the time complexity of brute force method is too large, another method is needed to solve this problem, dynamic programming. Generally, problems that can be solved by dynamic programming need to conform to three characteristics: optimal substructure, overlapping subproblem and no aftereffect. Just right, the longest common subsequence problem conforms to the characteristics of dynamic programming. The following is the specific analysis of this problem.

Optimal substructure

Suppose you haveandThe longest common subsequence corresponding to X and Y sequences isTo determineThe process is an optimization problem. To analyze optimal substructures, we need to start with the last element of sequence X and sequence Y. There are two cases:

  • if, that is, the last element of sequence X and sequence Y is the same, indicating that this element must be the last element of the common subsequence. In this case, the state transformation formula of the original problem is. It can be seen that in this case, the original problem has been successfully decomposed into sub-problems, and the optimal solution of each stage can be obtained through the optimal solution of the sub-problem, which conforms to the optimal sub-structure.

  • if, that is, the last element of sequence X and sequence Y is not the same. In this case, two situations need to be considered:

    1. ifIs not the last element of the longest common subsequence, then the state transition formula of the problem is, that is, fromandIn both sequences.
    2. ifIs not the last element of the longest common subsequence, then the state transition formula of the problem is, that is, fromandIn both sequences.

Above, the original problem is successfully decomposed into sub-problems, and the optimal solution of the sub-problem eventually constitutes the optimal solution of the whole problem, that is to say, the problem has the property of optimal sub-structure.

Overlapping subproblem

After the above analysis, we decomposed the original problem into three sub-problems:

It can be seen that there are overlapping sub-problems, such as forWhen the sequenceWith the sequenceIf the last element of is not identical, the subproblem continues to decompose intoand, and the previous subproblemIn theThe overlap.

Therefore, the original problem has the nature of overlapping subproblems.

There was no aftereffect

It can be seen from the former transformation formula of face problems that the sub-problems in a certain stage are no longer affected by subsequent decisions after they are determined, that is, the latter process will not affect the previously determined state. Conversely, the optimal solution of a subproblem at a certain stage can be considered to be obtained from some previous states, regardless of how the previous states were obtained.

The recursive formula

All the problems have been analyzed, then define the recursive formula, all the states and state transitions are described clearly by recursive formula.

setforThe length of whichI = 0,... mJ = 0,... n.


Dynamic programming implementation

Dynamic programming can be implemented in two ways, namely, top-down and bottom-up.

Top-down approach: In this approach, the solution of a problem can be expressed recursively using the solution of a subproblem. In addition, the overlapping sub-problems can be memorized, that is, stored in the cache table, each time before solving the sub-problem, we can check the cache table, if the sub-problem has been solved, we can directly use it. For unsolved subproblems, we solve the subproblems first, and then store the solutions to the subproblems in the cache table.

Bottom-up approach: As opposed to top-down, we can work backwards to find another way to reconstruct the problem in a bottom-up way. Instead of solving the original problem directly, we try to solve sub-problems first, then solve larger sub-problems through sub-problems, and iterate toward larger problems to solve the final problem.

The article is too long to be sent out in a top-down manner.

Bottom-up approach

In practical engineering, bottom-up method can use a two-dimensional table to record the solution process of the longest common subsequence.

Now we have the sequence X=HELLO, and the sequence Y=HERO, and we use dynamic programming bottom-up to solve for their longest common subsequence. Notice that the two dimensions of the table are each one dimension larger than their respective sequence length. The extra dimension is used to preserve the initial states, which are both zero, and these initial states are used in the calculation.

Build the subquestion table

We start with the first element of sequence X, compared to the first element of sequence Y, because H=H, so LCS(1,1)=LCS(0,0)+1=1.

Then H! = E, hence the LCS (1, 2) = Max (LCS (1, 1), the LCS (0, 2)), namely the LCS (1, 2) = the LCS (1, 1) = 1.

Then H! = R, so the LCS (1, 3) = Max (LCS (1, 2), the LCS (0, 3)), namely the LCS (1, 3) = the LCS (1, 2) = 1.

Then H! = O, hence the LCS (1, 4) = Max (LCS (1, 3), the LCS (0, 4)), namely the LCS (1, 4) = the LCS (1, 3) = 1.

Similarly, the second element of sequence X is also compared with each element of sequence Y, and the results are filled into the corresponding table. The second element of sequence X is compared as follows.

The third element of sequence X is compared as follows.

The fourth element of sequence X is compared as follows.

The fifth element of sequence X is compared and the final result is as follows.

So you can see that LCS(5,4)=3, which means that the longest common subsequence of sequence X and sequence Y has length 3.

At the same time, we can also see that through the bottom-up method of dynamic programming, we only need to build a table to get the final result, and the time complexity of building the table is O(m*n), which greatly reduces the time complexity.

Construct the longest common subsequence

Sometimes the length of the longest common subsequence cannot meet our requirements. If we want to further obtain the long common subsequence, we need to work back according to the constructed table to get the final result.

In other words, the specified elements of the two sequences are judged to be the same. If they are the same, the sequence can go back one space diagonally, but if they are different, the sequence can go left or up one space diagonally. Depending on the value, the sequence can go left or up, and if the value is the same, it can go left or up.

Following the example above, after the previous procedure, the table has been constructed and the length of the longest common subsequence has been obtained. Now let’s get the longest common subsequence. Starting with the last square,

Because O=O, so O is the element in the longest common subsequence, so I print it out, and then I go back one space diagonally.

Because L! Phi is equal to R, so neither of them is a member of the longest common subsequence, and they both have the same value going up to the left, so you can choose either way, so I’m going to go up one space.

Go ahead, because L! Phi is equal to R, so neither of them is a member of the longest common subsequence, and they both have the same value going up to the left, so you can choose either way, so I’m going to go up one space.

Go ahead, because E! =R, so both are not elements of the longest common subsequence, where the value on the left is greater than the value on the top, and choose to go one space to the left.

So I’m going to go ahead, because E is equal to E, so E is in the longest common subsequence, and I’m going to print it out, and then I’m going to go back one space diagonally.

I’m going to go ahead, because H is H, so H is in the longest common subsequence, and I’m going to print it out, and I’m going to go to the end, and now all I’m going to print out is the longest common subsequence, which is HEO. The time complexity of constructing the longest common subsequence is O(m+ N).

————- Recommended reading ————

Summary of my open Source projects (Machine & Deep Learning, NLP, Network IO, AIML, mysql protocol, Chatbot)

Why to write “Analysis of Tomcat Kernel Design”

2018 summary data structure algorithms

2018 Summary machine learning

2018 Summary Java in Depth

2018 Summary of natural language processing

2018 Summary of deep learning

2018 summary JDK source code

2018 Summary Java concurrency Core

2018 summary reading passage


Talk to me, ask me questions:

Welcome to: