This is the second day of my participation in the First Challenge 2022

The basic concept

Edit distance problem:
- Editing distance problem is difficult, but the solution is very beautiful, and it is a rare practical algorithm
Edit distance usage scenario:
- For modifying the misplaced content of the article. Limit the article can only modify 20 words, and support to add, delete, replace operations, seek the optimal solution to modify
- For measuring the similarity of DNA, DNA sequence is composed of A, G, C and T, which can be likened to A string. The similarity of two DNA sequences can be measured by the editing distance. The smaller the editing distance, the more similar the two DNA sequences are

Thought analysis

Edit distance problem:
- Given two strings s1 and s2, only three operations can be used to change s1 to s2 and find the smallest operand
- I want to make sure that whether I change s1 to S2, or s2 to S1, I get the same result
In the longest common subsequence, the problem of dynamic programming of two strings is solved by using two Pointers I and j respectively to the end of the two strings, and then moving forward step by step to reduce the size of the problem
Calculate edit distance:
- The algorithmbase case:
  - S1 and S2 have the same characters. In order to minimize the editing distance, there is no need to perform the task operation, just move
  - If j completes S2, I has not completed S1. In order to minimize the editing distance, s1 can only be shortened to S2 using the delete operation
  - If I completes S1 and j has not completed S2, in order to minimize the editing distance, s1 can only be increased to S2 using an insert operation

The problem solution

The base case of the algorithm: if I completes S1 or j completes S2, it can return the remaining length of the other string
For each pair of S1 [I] and S2 [j] there are four operations:

ifS1 [I] == s2[j]: no operation, skip I directly,j moves forward at the same timeelseOperation:3 选 1: insert -insert delete -delete replace -replaceCopy the code

For the 1 of 3 operations, you can use a recursive technique to try all three operations once, and then select the result of the operation based on which operation has the smallest edit distance

def minDistance(s1, s2) - >int:

	# return the minimum edit distance for s1[0,... I] and s2[0,... j]
	def dp(i, j) :
		# base case
		if i == -1: return j + 1 
		if j == -1: return i + 1

		if s1[i] == s2[j]:
			No task operation is required
				# s1 and s2 [0,..., I] [0,..., j] the minimum edit distance s1 [0,..., I - 1) and s2] [0,..., j - 1 the minimum edit distance
				Dp of I, j is dp of I minus one, j minus one.
			return dp(i - 1, j - 1)
		else:
			return min(
				# insert
					S2 [j] = s1[I]; s2[j] = s2[j]
					# move the characters of S2 forward by 1 bit to continue the comparison
					Insert operand increment by 1
				dp(i, j - 1) + 1
				# remove
					Select s1[I] from s1[I]
					# move the characters of S2 forward by 1 bit to continue the comparison
					Add 1 to delete operation
				dp(i - 1, j) + 1
				# replace
					S1 [I] = s2[j] = s2[j
					# At the same time, I and j move forward by 1 bit to continue the comparison
					# replace the operand by 1
				dp(i - 1, j - 1) + 1
			)
	return dp(len(s1) - 1.len(s2) - 1)
Copy the code

This method has overlapping sub-problem and needs to be optimized by dynamic programming algorithm

How to determine whether there are overlapping subproblems?

The algorithm framework of the solution is abstracted

Identify the different paths from the subproblem to the original problem

Once there are repeated paths, it means there are a large number of repeated paths, that is, there are overlapping sub-problems

Optimization problem

There are two ways of using dynamic programming optimization for overlapping subproblems:
- The memo
- DP Table

The memo

def minDistance(s1, s2) - >int:

# memo
memo = dict(a)def dp(i, j) :
	if (i, j) in memo:
		return memo[(i, j)]
	...
	if s[i] == s[j]:
		memo[(i, j)] = ...
	else:
		memo[(i, j)] = ...
	return memo[(i, j)]
return dp(len(s1) - 1.len(s2) - 1)
Copy the code

DP Table

The first thing is to understandDPThe meaning of array,DPAn array is a two-dimensional array
- base case: dp[…] [0] and dp [0] […].
- [I][J]
  - Def dp (I, j) – > int: return to s1 and s2 [0,…, I] [0,…, j] the minimum edit distance
  - Dp [I – 1] [1] : store s1 and s2 [0,…, I] [j] 0,…, the minimum edit distance. Because the values of I and j of the dp function are -1, and the minimum index of the array is 0. So the DP array will be offset by one

int minDistance(String s1, String s2) {
	int m = s1.length(), n = s2.length();
	int[][] dp = new int[m][n];
	// base case
	for (int i = 1; i <=m; i++) {
		dp[i][0] = i;
	}
	for (int j = 1; j <= n; j++) {
		dp[0][j] = j;
	}
	// Compute the DP Table from bottom up
	for (int i = 1; i <= m; i++) {
		for (int j = 1; j <= n; j++) {
			if (s1.isCharAt(i) == s2.isCharAt(j)) {
				dp[i][j] = dp[i - 1][j - 1];
			} else {
				dp = min(
					/ / delete
					dp[i - 1][j] + 1;
					/ / replace
					dp[i - 1][j - 1] + 1;
					/ / insert
					dp[i][j - 1] + 1;) ; }}}// Store the minimum edit distance for s1 and s2
	return dp[m][n];
}

int min(int a, int b, int c) {
	return Math.min(a, Math.min(b, c));
}
Copy the code

conclusion

Handle dynamic programming of two strings:
- According to the algorithm of editing distance to solve
- Create a DP Table. This makes it easier to find the state transition relationship
Because each dp[I][j] is only related to three nearby states, the spatial complexity can be compressed to O(min(m,n)), where M and n are the lengths of two strings O(min(m,n)), where m and n are the lengths of two strings O(min(m,n)), where m and n are the lengths of two strings O(min(m,n))
You can add additional information to the DP Table:

Node[][] dp;

class Node {
	// The value of the previous dp array
	int val;
	// Attribute manipulation
	int choice;
}
Copy the code

When making an optimal choice, write down the actions and then you can work backwards from the results
The final result is dp[m][n], where val has the minimum edit distance and choice has the last operation
Repeat this process, step by step back to the starting point dp[0][0], forming a path, according to this path for string editing, is the minimum editing distance

Minimum edit distance Java implementation
Minimum edit distance C++ implementation:

class DynamicProgramming {
	public:
		int minDistance(String s1, String s2) {
			int m = s1.size(), n = s2.size();
			vector<vector<int>> dp(m + 1, vector<int>(n + 1));
			// basecase
			// when s2 is empty, s1 needs to delete all characters to be the same as s2
			for (int i = 1; i <= m; i++) {
				dp[i][0] = i;
			}
			// When s1 is empty, s2 needs to delete all characters to be the same as s1
			for (int j = 1; j <= n; j++) {
				dp[0][j] = j;
			}
			// Solve from bottom up
			for (int i = 1; i <= m; i++) {
				for (int j = 1; j <= n; j++) {
					if (s1[i - 1] == s2[j - 1]) {
						// The current characters of the two strings are equal
						dp[i][j] = dp[i - 1][j - 1];
					} else {
						// The current characters of the two strings are different
						dp[i][j] = min({
							// delete s1[I]
							dp[i - 1][j] + 1.S1 [j] = s1[I]; s2[j] = s2[j
							dp[i][j - 1] + 1.// Replace s1[I] with s2[j]
							dp[i - 1][j - 1] + 1}); }}}// Store the minimum edit distance for s1 and s2
			returndp[m][n]; }};Copy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Classical dynamic programming problem detailed analysis: the shortest editing distance algorithm principle analysis and solution

The basic concept

Thought analysis

The problem solution

Optimization problem

The memo

DP Table

conclusion

Classical dynamic programming problem detailed analysis: the shortest editing distance algorithm principle analysis and solution

The basic concept

Thought analysis

The problem solution

Optimization problem

The memo

DP Table

conclusion

Related Posts

Centos7 deploys Python, Mysql, Redis, and Java

How does Redisson implement distributed locking?

Why is Hbase query speed still high in the face of tens of billions of data?