How can we transform the LCS problem into an LIS problem? How can we prove the correctness of LIS greedy solution
This article is participating in Python Theme Month. See the link to the event for more details
Topic describes
This is 1713 on LeetCode. The minimum number of operations to get a subsequence is of medium difficulty.
Tag: "longest common subsequence", "longest ascending subsequence", "greedy", "dichotomy"
You are given an array, target, that contains a number of different integers, and another array of integers, ARR, that may contain duplicate elements.
At each operation, you can insert any integer anywhere in the ARR. For example, if arr = [1,4,1,2], then you can add 3 in the middle to get [1,4,3,1,2]. You can add integers at the beginning or the end of an array.
Return the minimum number of operations to make target a subsequence of arR.
A subsequence of an array is an array that removes some elements (perhaps none) of the original array without changing the relative order of the remaining elements. For example, [2,7,4] is a subsequence (bold elements) of [4,2,3,7,2,1,4], but [2,4,2] is not a subsequence.
Example 1:
Input: target = [5,1,3], arr = [9,4,2,3,4] output: 2 explanation: you can add 5 and 1 to make arr [5,9,4,1,2,3,4], target is a subsequence of arr.Copy the code
Example 2:
Input: target =,4,8,1,3,2 [6], arr =,7,6,2,3,8,6,1 [4] output: 3Copy the code
Tip:
 1 = target.length, arr.length = $10^5$
 1 = target[i], arr[i] = $10^9$
 Target does not contain any duplicate elements.
Fundamental analysis
For convenience, we set the length of targettargettarget to NNN, the length of Arrarrarr to MMM, and the longest common subsequence length of Targettargettarget and Arrarrarr to maxmaxmax. It is not difficult to find the final answer as n−maxn  maxn− Max.
So in terms of the problem, this is the longest common subsequence problem.
But the complexity of simple LCS problem is O(n∗m)O(n * m)O(n∗m), Use the state definition "f[I][j]f[I][j]f[I][j]f[I][j]f[I][j] to consider the longest common subsequence length of the first iii elements of array A and the first JJJ elements of array B".
In this case, the data range is 10510^5105, so the naive method of solving the LCS will inevitably time out.
One obvious pointcut is that the elements of the Targettargettarget array are different, and there are some interesting properties when you add some conditions to the LCS problem.
One of the classical properties is that the longest common subsequence problem (LCS) can be transformed into the longest ascending subsequence problem (LIS) when one of the array elements is different. At the same time, there exists a greedy solution to the longest ascending subsequence problem (LIS) with the complexity of O(nlogn)O(n\log{n})O(nlogn) by maintaining monotone sequence + dichotomy.
Abstract as an LCS problem$target$
After the basic direction is determined, we prove the rationality and correctness of step 222 and step 333.
prove
1. Why can an LCS problem be transformed into an LIS problem when the elements of one array are different?
The essence is to use "when one of the array elements is different, then each" common subsequence "corresponds to an array of nonrepeating elements of the index array" ascending subsequence, "and vice versa.
We can understand this by using the two arrays given in the question (targettargettarget and Arrarrarr).
Since each targettargettarget element is different, first of all, the targettargettarget element and its corresponding subscript have a unique mapping.
We can then focus on the common elements of both (ignoring the noncommon elements), and each "common subsequence" naturally corresponds to an "ascending subsequence" of the subscript array, and vice versa.
Note: The figure below shows only a fragment of the two arrays. Do not mistake the two arrays for the same length.
If there is a "common subsequence", according to the definition of "subsequence", then the corresponding subsequence must increase, that is, there is an "ascending subsequence".
Conversely, for an ascending subsequence of an array with a subscript, it first means that the elements have appeared in the targettargettarget and in ascending order, conforming to the common subsequence definition, that is, corresponding to a common subsequence.
So far, we transform the original problem LCS into an LIS problem.
2. Proof of correctness of greedy LIS problem solving?
To solve the naive LIS problem, we need to define an array f[I]f[I]f[I] which represents the length of the longest ascending subsequence ending in nums[I]nums[I]nums[I].
For a certain f[I]f[I]f[I]f[I], we need to check back in [0, I −1][0, I  1][0, I −1] all the positions where nums[I]nums[I]nums[I]nums[I][I] can be connected to JJJ, Update f[I]f[I]f[I]f[I] by taking the maximum value of all f[j]+1f[j] +1f[j] +1. Therefore, the complexity of a naive LIS problem is O(n2)O(n^2)O(n2).
The greedy solution to LIS is to maintain an extra$g$
To rearrange, we have two arrays:
 FFF metric array: the meaning of the metric array is the same as that of the naive LIS solution. F [I]f[I]f[I]f[I]f[I] indicates the maximum length of the ascending subsequence ending in nums[I]nums[I]nums[I];
 GGG greedy array: g[len]=xg[len] =xg[len] =xg[len] =x indicates that the "minimum ending element" of the ascending subsequence of lenlenlen length is XXX.
When we calculate f[I]f[I]f[I], we need to find the position where nums[j]
Instead of linear traversal, we expect to pass through the GGG array.
G [idx]
We can easily prove that GGG arrays have the "monotonically increasing" property by combining the definition of GGG arrays by contradiction.
Suppose there are certain positions III and JJJ, and I

G [I]=g[j]=xg[I]=g[j]=xg[I]=xg[I]=g[j]=xg[I]=g[j]=x: this means that some value XXX can be used as the last digit of the ascending subsequence of length iii as well as the last digit of the ascending subsequence of length JJJ. According to our definition of GGG array, g[I]=xg[I]=xg[I]=x means that the "minimum ending element" in all the ascending subsequence of length iii is XXX, but at the same time, because g[j]=xg[j] =xg[j] =x, and the "ascending subsequence" must be "strictly monotonous". So we can find a valid value smaller than g[I]g[I]g[I]g[I]g[I] by removing the elements following a subsequence of length JJJ (adjusting for a subsequence of length III). In other words, we find an ascending subsequence of length III, and the last element must be strictly less than XXX. G [I]=g[j]= g[I]=g[j]=xg[I]=g[j]=xg[I]=g[j]=x

G [I]g[j]=xg[I]g[j]=xg[I]=xg[I]g[j]=xg[I]g[j]=xg[I]=xg[I]=xg[I]=xg[I]=xg[I]=xg[I]=xg[I]g[j]=x G [I]g[j]g[I]g[j]g[I]g[j]
According to the total order relation, in proving g [j] [I] = g g [I] [I] [j] g = g = g [j] and [j] [I] g g g [I] [I] g [j] g g [j] after constant was not available g [j] [I] g g [I] [I] [j] g g g [j] constant was established.
So far, we prove that the GGG array has monotonicity, thus proving that every f[I]f[I]f[I]f[I] is the same as the value obtained by the naive LIS solution, that is, the greedy solution is correct.
Dynamic programming + greed + dichotomy
According to "basic analysis proof", by maintaining a greedy array GGG, to update the rules array FFF, after "the longest increasing subsequence" length is obtained, using the "" public subsequence" and "increasing subsequence" "onetoone correspondence relationship, can be concluded that" the longest common subsequence length ", so as to find out the answer.
Java code:
class Solution {
public int minOperations(int[] t, int[] arr) {
int n = t.length, m = arr.length;
MapInteger, Integer map = new HashMap();
for (int i = 0; i n; i++) {
map.put(t[i], i);
}
ListInteger list = new ArrayList();
for (int i = 0; i m; i++) {
int x = arr[i];
if (map.containsKey(x)) list.add(map.get(x));
}
int len = list.size();
int[] f = new int[len], g = new int[len + 1];
Arrays.fill(g, Integer.MAX_VALUE);
int max = 0;
for (int i = 0; i len; i++) {
int l = 0, r = len;
while (l r) {
int mid = l + r + 1 1;
if (g[mid] list.get(i)) l = mid;
else r = mid  1;
}
int clen = r + 1;
f[i] = clen;
g[clen] = Math.min(g[clen], list.get(i));
max = Math.max(max, clen);
}
returnn  max; }}Copy the code
Python 3 code:
class Solution:
def minOperations(self, t: List[int], arr: List[int])  int:
n, m = len(t), len(arr)
map = {num:i for i,num in enumerate(t)}
lt = []
for i in range(m):
x = arr[i]
if x in map:
lt.append(map[x])
length = len(lt)
f, g = [0] * length, [inf] * (length + 1)
maximum = 0
for i in range(length):
l, r = 0, length
while l r:
mid = l + r + 1 1
if g[mid] lt[i]:
l = mid
else:
r = mid  1
clen = r + 1
f[i] = clen
g[clen] = min(g[clen], lt[i])
maximum = max(maximum, clen)
return n  maximum
Copy the code
 Time complexity: the subscript mapping relationship of targettargettarget is obtained by O(n)O(n)O(n) O(n) complexity; Get the mapping array listListList by O(m)O(m)O(m) O(m) complexity; The complexity of greedy LIS solution is O(mlogm)O(m\log{m})O(mlogm). O(n+mlogm)O(n +m \log{m})O(n+mlogm)
 Space complexity: O(n+m)O(n +m)O(n +m)
The last
This is article No.1713 in our "Brush through LeetCode" series, which began on 2021/01/01. As of the start date, there are 1916 questions on LeetCode, some with locks, and we will first brush through all the questions without locks.
In this series of articles, in addition to explaining how to solve the problem, I'll give you the most concise code possible. If the general solution is involved, the corresponding code template will be provided.
In order to facilitate the students to debug and submit the code on the computer, I set up a related warehouse: github.com/SharingSour... .
In the repository address, you can see the solution to the series, the corresponding code to the series, the LeetCode link, and other preferred solutions to the series.