This is the second day of my participation in the First Challenge 2022

The basic concept

  • Edit distance problem:
    • Editing distance problem is difficult, but the solution is very beautiful, and it is a rare practical algorithm
  • Edit distance usage scenario:
    • For modifying the misplaced content of the article. Limit the article can only modify 20 words, and support to add, delete, replace operations, seek the optimal solution to modify
    • For measuring the similarity of DNA, DNA sequence is composed of A, G, C and T, which can be likened to A string. The similarity of two DNA sequences can be measured by the editing distance. The smaller the editing distance, the more similar the two DNA sequences are

Thought analysis

  • Edit distance problem:
    • Given two strings s1 and s2, only three operations can be used to change s1 to s2 and find the smallest operand
    • I want to make sure that whether I change s1 to S2, or s2 to S1, I get the same result
  • In the longest common subsequence, the problem of dynamic programming of two strings is solved by using two Pointers I and j respectively to the end of the two strings, and then moving forward step by step to reduce the size of the problem
  • Calculate edit distance:
    • The algorithmbase case:
      • S1 and S2 have the same characters. In order to minimize the editing distance, there is no need to perform the task operation, just move
      • If j completes S2, I has not completed S1. In order to minimize the editing distance, s1 can only be shortened to S2 using the delete operation
      • If I completes S1 and j has not completed S2, in order to minimize the editing distance, s1 can only be increased to S2 using an insert operation

The problem solution

  • The base case of the algorithm: if I completes S1 or j completes S2, it can return the remaining length of the other string
  • For each pair of S1 [I] and S2 [j] there are four operations:
ifS1 [I] == s2[j]: no operation, skip I directly,j moves forward at the same timeelseOperation:31: insert -insert delete -delete replace -replaceCopy the code
  • For the 1 of 3 operations, you can use a recursive technique to try all three operations once, and then select the result of the operation based on which operation has the smallest edit distance
def minDistance(s1, s2) - >int:

	# return the minimum edit distance for s1[0,... I] and s2[0,... j]
	def dp(i, j) :
		# base case
		if i == -1: return j + 1 
		if j == -1: return i + 1

		if s1[i] == s2[j]:
			No task operation is required
				# s1 and s2 [0,..., I] [0,..., j] the minimum edit distance s1 [0,..., I - 1) and s2] [0,..., j - 1 the minimum edit distance
				Dp of I, j is dp of I minus one, j minus one.
			return dp(i - 1, j - 1)
		else:
			return min(
				# insert
					S2 [j] = s1[I]; s2[j] = s2[j]
					# move the characters of S2 forward by 1 bit to continue the comparison
					Insert operand increment by 1
				dp(i, j - 1) + 1
				# remove
					Select s1[I] from s1[I]
					# move the characters of S2 forward by 1 bit to continue the comparison
					Add 1 to delete operation
				dp(i - 1, j) + 1
				# replace
					S1 [I] = s2[j] = s2[j
					# At the same time, I and j move forward by 1 bit to continue the comparison
					# replace the operand by 1
				dp(i - 1, j - 1) + 1
			)
	return dp(len(s1) - 1.len(s2) - 1)
Copy the code
  • This method has overlapping sub-problem and needs to be optimized by dynamic programming algorithm
  • How to determine whether there are overlapping subproblems?
    • The algorithm framework of the solution is abstracted
    • Identify the different paths from the subproblem to the original problem
    • Once there are repeated paths, it means there are a large number of repeated paths, that is, there are overlapping sub-problems

Optimization problem

  • There are two ways of using dynamic programming optimization for overlapping subproblems:
    • The memo
    • DP Table

The memo

def minDistance(s1, s2) - >int:

# memo
memo = dict(a)def dp(i, j) :
	if (i, j) in memo:
		return memo[(i, j)]
	...
	if s[i] == s[j]:
		memo[(i, j)] = ...
	else:
		memo[(i, j)] = ...
	return memo[(i, j)]
return dp(len(s1) - 1.len(s2) - 1)
Copy the code

DP Table

  • The first thing is to understandDPThe meaning of array,DPAn array is a two-dimensional array
    • base case: dp[…] [0] and dp [0] […].
    • [I][J]
      • Def dp (I, j) – > int: return to s1 and s2 [0,…, I] [0,…, j] the minimum edit distance
      • Dp [I – 1] [1] : store s1 and s2 [0,…, I] [j] 0,…, the minimum edit distance. Because the values of I and j of the dp function are -1, and the minimum index of the array is 0. So the DP array will be offset by one
int minDistance(String s1, String s2) {
	int m = s1.length(), n = s2.length();
	int[][] dp = new int[m][n];
	// base case
	for (int i = 1; i <=m; i++) {
		dp[i][0] = i;
	}
	for (int j = 1; j <= n; j++) {
		dp[0][j] = j;
	}
	// Compute the DP Table from bottom up
	for (int i = 1; i <= m; i++) {
		for (int j = 1; j <= n; j++) {
			if (s1.isCharAt(i) == s2.isCharAt(j)) {
				dp[i][j] = dp[i - 1][j - 1];
			} else {
				dp = min(
					/ / delete
					dp[i - 1][j] + 1;
					/ / replace
					dp[i - 1][j - 1] + 1;
					/ / insert
					dp[i][j - 1] + 1;) ; }}}// Store the minimum edit distance for s1 and s2
	return dp[m][n];
}

int min(int a, int b, int c) {
	return Math.min(a, Math.min(b, c));
}
Copy the code

conclusion

  • Handle dynamic programming of two strings:
    • According to the algorithm of editing distance to solve
    • Create a DP Table. This makes it easier to find the state transition relationship
  • Because each dp[I][j] is only related to three nearby states, the spatial complexity can be compressed to O(min(m,n)), where M and n are the lengths of two strings O(min(m,n)), where m and n are the lengths of two strings O(min(m,n)), where m and n are the lengths of two strings O(min(m,n))
  • You can add additional information to the DP Table:
Node[][] dp;

class Node {
	// The value of the previous dp array
	int val;
	// Attribute manipulation
	int choice;
}
Copy the code
  • When making an optimal choice, write down the actions and then you can work backwards from the results
  • The final result is dp[m][n], where val has the minimum edit distance and choice has the last operation
  • Repeat this process, step by step back to the starting point dp[0][0], forming a path, according to this path for string editing, is the minimum editing distance
  • Minimum edit distance Java implementation
  • Minimum edit distance C++ implementation:
class DynamicProgramming {
	public:
		int minDistance(String s1, String s2) {
			int m = s1.size(), n = s2.size();
			vector<vector<int>> dp(m + 1, vector<int>(n + 1));
			// basecase
			// when s2 is empty, s1 needs to delete all characters to be the same as s2
			for (int i = 1; i <= m; i++) {
				dp[i][0] = i;
			}
			// When s1 is empty, s2 needs to delete all characters to be the same as s1
			for (int j = 1; j <= n; j++) {
				dp[0][j] = j;
			}
			// Solve from bottom up
			for (int i = 1; i <= m; i++) {
				for (int j = 1; j <= n; j++) {
					if (s1[i - 1] == s2[j - 1]) {
						// The current characters of the two strings are equal
						dp[i][j] = dp[i - 1][j - 1];
					} else {
						// The current characters of the two strings are different
						dp[i][j] = min({
							// delete s1[I]
							dp[i - 1][j] + 1.S1 [j] = s1[I]; s2[j] = s2[j
							dp[i][j - 1] + 1.// Replace s1[I] with s2[j]
							dp[i - 1][j - 1] + 1}); }}}// Store the minimum edit distance for s1 and s2
			returndp[m][n]; }};Copy the code