2021/03/11

Problem description

For a sequence X=(x1,x2… ,xm), the subsequence is composed of non-repeating elements in X, and the relative order is unchanged.

For example, sequence (A,b, D, E,f, G), subsequence can be (A,b,e),(a,f, G), etc.

The Longest Common subsequence LCS(Longest Common and intermediate ue) is the most Common sequence in and near one sequence.

Tip: The elements of the longest common subsequence can be discontinuous in the original sequence. The longest common substring must be a continuous subsequence.

Thought analysis

Find the length of the longest common subsequence

To solve the longest common subsequence problem, we can adopt the idea of dynamic programming. Now let’s analyze the optimal substructure of this problem.

Let two sequences Xm=(x1,x2… ,xm),Yn=(y1,y2,… ,yn)X_m=(x_1,x_2,… ,x_m),Y_n=(y_1,y_2,… ,y_n)Xm=(x1,x2,… ,xm),Yn=(y1,y2,… ,yn), their longest common subsequence is Zk=(z1,z2… ,zk)Z_k=(z_1,z_2,… ,z_k)Zk=(z1,z2,… , zk).

The optimal substructure of the problem is as follows:

  • If xm = ynx_m = y_nxm = yn, have xm = yn = zkx_m = y_n = z_kxm = yn = zk, subsequence zk – 1 = (z1, z2,… , zk – 1) Z_ {1} k = (z_1, z_2,… Zk, z_ {} k – 1) – 1 = (z1, z2,… , zk – 1) is Xm – 1, Yn – 1 x_} {m – 1, Y_ {}, n – 1 Xm – 1, Yn – 1 the longest common subsequence
  • If xm≠yn,xm≠ ZKx_m \ ney_n, X_m \ne z_kxm=yn,xm=zk, the subsequence ZKZK is the longest common subsequence of Xm−1,YnX_{m-1},Y_{N} xm −1, yn
  • If xm≠yn,yn≠ ZKx_m \ne y_n,y_n \ne z_kxm=yn,yn=zk, then the subsequence ZKZK is the longest common subsequence of Xm, yn −1X_{m},Y_{n-1} xm,yn −1

A two-dimensional array LCS [I][j] is used to represent the length of the longest common subsequence of Xi,YjX_i,Y_jXi,Yj. According to the characteristics of optimal substructure, the following recursive formula can be obtained:

if X[i] == Y[j] 
lcs[i][j] = lcs[i-1][j-1] + 1 else X[i] ! = Y [j] # regulation: if [j] [I - 1] = c = c [I] [1], the default LCS [I] [j] = LCS [j] [I - 1] LCS [I] [j] = Max {LCS [j], [I - 1] LCS [I] [1]}Copy the code

For ease of calculation, add a row of zeros and a column of zeros to the LCS array. The initial state is all zeros. According to the recursive formula above, the bottom right value of the final LCS array is the length of the longest common subsequence.

For example: two sequences (a,b,g,f),(b,f,g), the procedure for solving the LCS array is as follows,

First, we initialize,

The arrow in the figure indicates the element from which LCS [I][j] is obtained.

The characters corresponding to the yellow grid in the figure form the longest common subsequence, and the final length of the longest common subsequence is 2, and there are two longest common subsequences, respectively :(b,f), (b,g).

public static int getLCSLength(String sequeue1, String sequeue2) {
    int seq1Len = sequeue1.length();
    int seq2Len = sequeue2.length();
    int[][] lcs = new int[seq1Len + 1][seq2Len + 1];
    int i = 0, j = 0;
    /** Initialize the LCS array */
    for (; i < lcs.length; ++i) {
        for (; j < lcs[0].length; ++j) {
            lcs[i][j] = 0; }}/** Dynamic programming to solve the longest common substring length */
    for (i = 1; i < lcs.length; ++i) {
        for (j = 1; j < lcs[0].length; ++j) {
            if (sequeue1.charAt(i - 1) == sequeue2.charAt(j - 1)) {
                lcs[i][j] = lcs[i - 1][j - 1] + 1;
            } else {
                lcs[i][j] = Math.max(lcs[i - 1][j], lcs[i][j - 1]); }}}return lcs[i - 1][j - 1];
}
Copy the code

Solve for the longest common subsequence

Method 1: Use LCS array traceback

Using the LCS array directly, starting at the bottom right and working backwards, you get the longest common subsequence. This is opposite to the above process of solving the length, and the specific algorithm is as follows:

Enter the LCS array, I,jwhile i! =0 && j! = 0If X = = Y [j] [I] # X [I] is the first insert the subsequence X [I] to join the public subsequence, I -, j - else if LCS [j] [I - 1] > = LCS [I]] [j - 1 - else j - ICopy the code

So how do you get all the longest common subsequences?

For X [I]! = Y[j], if LCS [i-1][j] == LCS [I][j-1], then the top and the left should be searched, for example, using the above LCS array backtrace, can find two longest common subsequences :(b,f),(b,g).

Note: when I =4 and j=3, since LCS [4][2]== LCS [3][3], backtracking should be performed in both directions.

Method two: Use the DIR direction array

In addition to the above method, we can also use another two-dimensional array with the same dimension as the LCS, called dir[I][j], which records the source of the current LCS [I][j] value, as defined below:

dir[i] [j] = 1 # LCS [i] [jFrom the LCS []i-1] [j-1+ 1 get dir []i] [j] = 2 #i] [j] from the previous row, LCS [i] [j] = lcs[i-1] [j]
dir[i] [j] = 3 #i] [j] from the previous column, LCS [i] [j] = lcs[i] [j-1]
Copy the code

While evaluating the LCS array, you can evaluate the DIR array as defined above. Similarly, starting from the bottom right of the DIR array and working backwards, the longest common subsequence can be obtained using the following algorithm:

Enter dir array, I,j
whilei! =0&& j! =0
    if dir[i][j] == 1
    	# X[I] is a common subsequence of header insertsX[I] joins the common subsequence, I --,j--else if dir[i][j] == 2
    	i--
    else 
    	j--
Copy the code

Improve the LCS array

Since each row in the calculation of the LCS is only related to the previous row and the current row, it is perfectly possible to use only a two-row array to calculate the LCS, overwriting the previous row after each calculation.

The process of calculating LCS above is improved as follows:

This method improves the time and space complexity of the algorithm, but is only suitable for the problem of finding the longest common subsequence length, which cannot be achieved if a specific subsequence is required (a specific subsequence can also be found if combined with dir arrays).

AC code

/** * The longest common substring result of two sequences, including length and sequence set *@paramSequeue1 sequence 1 *@paramSequeue2 2 *@return LCSResult
 */
public static LCSResult getLCS(String sequeue1, String sequeue2) {
    LCSResult result = new LCSResult();
    int seq1Len = sequeue1.length();
    int seq2Len = sequeue2.length();
    int[][] lcs = new int[seq1Len + 1][seq2Len + 1];
    int i = 0, j = 0;
    /** Initialize the LCS array */
    for (; i < lcs.length; ++i) {
        for (; j < lcs[0].length; ++j) {
            lcs[i][j] = 0; }}/** Dynamic programming to solve the longest common substring length */
    for (i = 1; i < lcs.length; ++i) {
        for (j = 1; j < lcs[0].length; ++j) {
            if (sequeue1.charAt(i - 1) == sequeue2.charAt(j - 1)) {
                lcs[i][j] = lcs[i - 1][j - 1] + 1;
            } else {
                lcs[i][j] = Math.max(lcs[i - 1][j], lcs[i][j - 1]);
            }
        }
    }
    result.setLength(lcs[i - 1][j - 1]);
    result.addResult(getLCS(lcs,sequeue1,sequeue2));
    return result;
}

/** * search a longest common subsequence, iteration implementation * to obtain all subsequences to be completed... *@paramLCS LCS array *@paramSeq1 original sequence 1 *@paramSeq2 original sequence 2 *@returnLongest common subsequence */
private static String getLCS(int[][] lcs,String seq1,String seq2) {
    StringBuilder subsequeue = new StringBuilder();
    int i = lcs.length - 1;
    int j = lcs[0].length - 1;
    while(i! =0&&j! =0) {if(seq1.charAt(i-1) == seq2.charAt(j-1)) {/** Note: each insertion is at the head of the sequence */
            subsequeue.insert(0,seq1.charAt(i-1));
            i--;
            j--;
        }else {
            if(lcs[i-1][j]>=lcs[i][j-1]){
                i--;
            }else{ j--; }}}return subsequeue.toString();
}

/** * The longest common subsequence result * includes the length of the LCS and the corresponding subsequence list */
oublic class LCSResult {
    private int length;
    private List<String> results;

    / / omit getter/setter/toString

    public void addResult(String result) {
        if(this.results == null) {this.results = new ArrayList<>();
        }
        this.results.add(result); }}Copy the code

The above code acquires only one longest common subsequence by default.


This article is participating in the “Nuggets 2021 Spring Recruitment Campaign”, click to see the details of the campaign