Be careful with one thing about depth-first search

1. Introduction

This blog post is the result of reading 20 or so LeetCode questions last week and coming up with some small insights on depth-first search. The general idea this time is:

The working process of depth-first search algorithm
How to traverse using depth-first search algorithm
Two interesting questions
Relationship between depth-first search algorithm and loop

So let’s get started.

2. Depth-first search

Deep First Search (DFS) is a sequential traversal in which the First encountered node is visited, and then, starting from this node, the traversal continues until all nodes are visited (in the case of the connected component == 1). Therefore, IN my opinion, DFS can be regarded as a violence enumeration algorithm. The following example may give you a taste of DFS.

This is a directed, acyclic graph with a connected component of 1. We can store it with an adjacency matrix:

The meaning of this matrix is that if graph[I][j] == 1, then the relation between node I and node j is I -> j, i.e. there is a path from node I to node j. Instead of putting the code in place, I will follow the DFS approach to the diagram above, starting with node 1 and traversing it depth-first. Graph [0][1] is not zero, indicating that there are edges between nodes 1 and 2. Since DFS is accessed first and then traversed downward starting from this node, we first save node 1 and node 2 and then traverse downward starting from node 2, as shown in the following figure:

Then, scan graph[1][0…5] and find that graph[1][2] is not 0, indicating that node 2 is connected to node 3, then save node 3 and traverse down from node 3:

Graph [2][0…5] has no element of value 1, indicating that node 3 is not connected to any other node, i.e., the output degree is 0. Therefore, it is time to go back to the previous level:

Now, we need to scan graph[1][3…5] and find graph[1][3] is not 0, indicating that node 2 is connected to node 4, so we should save node 4 and continue to traverse from node 4 as a starting point:

Graph [3][4] is not 0, indicating that node 4 is connected to node 5, we should save node 5, and then start from node 5 and traverse down:

At this point, we need to scan graph[4][0…5] and find graph[4][5] is not 0, indicating that node 5 is connected to node 6, which should be saved. Then, with node 6 bit nodes, traverse down:

Graph [5][0…5], graph[5], graph[0…5], graph[5], graph[0…5], graph[5], graph[0…5], graph[5]

Graph [4][0…5] is the next level of the graph[3][5…5], but there are no non-0 elements in the graph[3][5…5]. So let’s go back to the previous level:

Graph [1][4…5], graph[1][4…5], graph[1][4…5], graph[1][4…5], graph[1][4…5], graph[1]

Graph [0][4] is not 0, but node 5 (graph[0][4] 4 represents node 5) has been accessed, so we do not need to save node 5 anymore, so we ignore graph[0][4] and continue scanning. Finally, the scan is finished and the traversal is finished.

This process is easy to write if you understand it:

private static void deepFirstSearch(int[][] graph, List<Integer> result, int[] accessFlag, int startVertex) { if (startVertex < graph.length) { result.add(startVertex + 1); accessFlag[startVertex] = 1; for (int i = 0; i < graph[startVertex].length; i++) { if (graph[startVertex][i] == 1 && accessFlag[i] ! = 1) { deepFirstSearch(graph, result, accessFlag, i); }}}}Copy the code

To clarify, the for loop corresponds to the scan described above, and startVertex is the current starting point. Graph [startVertex][I] is not 0, indicating that there is a path from startVertex to I, that is, startVertex -> I, and if node I is not accessed (accessFlag[I] will be set to 1 when accessing node I), then node I is used as the starting point to traverse downward. The if statement checks whether the input data is valid. The method is already written, so we just need to call it as follows to get the result of DFS:

List<Integer> result = new ArrayList<>();
deepFirstSearch(graph, result, new int[graph.length], 0);
Copy the code

To summarize, DFS is encountered first, accessed first, and continues to access from there until all access is completed, much like sequential traversal.

3. How should DFS be used for traversal?

From the above instructions, you must have a preliminary understanding of DFS. So let’s move on. As I said earlier, DFS can be thought of as a violence enumeration algorithm. Combining the above example, we can see that all elements of the adjacency matrix Graph are scanned by the DFS algorithm, so at least from this example, DFS is a violent enumeration algorithm. I think so. OK, now that we know that DFS can be regarded as a violent enumeration algorithm, how should we use it for enumeration? Here, I want to analyze the sequential traversal process of binary trees and even the traversal process of multi-tree, and then extend to the general case to illustrate.

First define the data structure of the tree:

private class Tree {
    int data;
    Tree leftChild;
    Tree rightChild;
}
Copy the code

Naturally, the sequential traversal of the binary tree could be written as follows:

private static void preOrderTraversal(Tree parent) {
    if(tree != null) {
        accessTreeVertex(parent);
        preOrderTraversal(parent.leftChild);
        preOrderTraversal(parent.rightChild);
    }
}
Copy the code

How do you generalize binary tree traversal to multi-tree traversal? It would be nice if we could use a for loop to list all the children of this node. You can. We just need to rewrite the data structure slightly:

private class Tree {
    int data;
    Tree children[];
}
Copy the code

Then iterate as follows:

private static void preOrderTraversal(Tree parent) { if(tree ! = null) { accessTreeVertex(parent); for (int i = 0; i < parent.children.length; i++) { preOrderTraversal(tree.children[i]); }}}Copy the code

Why, what’s the thinking behind this code? ifchildren.length == 2That’s the case with binary trees. Naturally, ifchildren.lengthIs any value, that’s the case for any multi-fork tree, which means you can traverse any multi-fork tree. The idea is to convert the child node into a form that can be enumerated by the for loop. Now, let’s look at the substructure of the multi-fork tree:

The substructure of the multi-fork tree is a parent node with n children (n is any integer greater than 0). How do we traverse this substructure?

First we access the parent node, i.eaccessVertex(parent)
Since the parent node has n children, we access them one by one. Since the child node exists in an array, we can use a for loop.

So iterating over a substructure can be written like this:

// traversal the substructure of tree
// Input: SubStructure of tree

function traversalSubStructure(Tree subTreeRoot):
	access(subTreeRoot)
	for i in 0 to subTreeRoot.children.length
		traversalSubStructure(subTreeRoot.children[i])
    end
end
Copy the code

Since all the substructures of a multifork tree together constitute the entire multifork tree (that is, can be defined recursively), we can recursively complete the traversal. How to use recursion? You just need to confirm the boundary conditions. The initial condition, of course, is to put in a tree root, and to stop recursing downward is to encounter a leaf node whose children field is empty. So:

// Traversal tree
// Input: root of tree

function traversalTree(Tree treeRoot):
	if treeRoot not null then
		access(treeRoot)
		for i in 0 to treeRoot.children.length
			traversalSubStructure(treeRoot.children[i])
        end
    end
end
Copy the code

Having said that, what I want to say is that if a node whose children can be listed in a loop, then we can traverse the graph in a loop + recursion form. Further, we can use recursion + a for loop to implement an n-loop and then iterate. N loops are a natural tool for enumeration. So, we can write the problem as an n-loop, and then turn it into a recursion, and we can use DFS to solve the ergodic problem. Therefore, DFS can be written as follows:

Void DFS(Graph Graph) {if(boundary condition) {for(I; Cyclic conditions; i++) { DFS(graph.child.get(i)); }}}Copy the code

4. Two questions

In the process of grinding, I encountered two very interesting problems, which can be solved by DFS. Here I share them, which are the maze problem and the non-repeating character string.

4.1. Maze problem

To make a long story short, the maze problem is described this way (the description on LeetCode is long, so I won’t post it) : Given a two-dimensional array to represent a maze. The elements in this array are either 0 or 1; 0 indicates that the device can pass, and 1 indicates that the device is a wall and cannot pass. For example, given a two-dimensional array:

int[][] maze = { {1, 0, 0, 0, 0, 1}, {1, 1, 1, 1, 1, 1}, {0, 1, 0, 1, 1, 0}, {0, 1, 0, 0, 1, 0}, {1, 1, 1, 0, 1, 1}, {0, 0, 1, 1, 0}};Copy the code

It represents the maze, with grey for walls and white for paths:

Now, given a starting point and an end point, find a feasible path to get out of the maze.

For example, the starting point is (1, 1) and the ending point is (6, 5).

How to solve the maze problem with the IMPLEMENTATION of DFS? When we reach a point (which, of course, has a value of 0, which is the path), we can choose to go up, down, left or right; The previous problems are all cases of a parent node with n children. We can follow the example of the parent node being the current node and the children being the upper, lower, left and right nodes. The diagram looks like this:

First, iterate over the substructure, pseudo code:

// Traversal currentPoint and its North, South, West and East Point
// Input: maze, currentPoint

function traversalCurrentPoint(Maze maze, Point currentPoint):
	for point in {currentPoint.North, currentPoint.South, currentPoint.West, currentPoint.East} :
		traversalCurrentPoint(point)
    end
end
Copy the code

And then we define the boundary conditions, and the initial conditions of course are given a starting point, and the conditions that stop the downward recursion are:

In the wall
In the end

So, the pseudocode traversed for each substructure:

// Solve Maze Problem
// Input: maze, start, end

function solveMaze(Maze maze, Point start, Point end, Result result):
	if start == end then 
		result.add(start)
		return
	else if is not wall then 
		result.add(start)
		for Point in {start.North, start.South, start.West, start.East} :
			solveMaze(maze, start, end, result)
		result.remove(start)
    end
end
Copy the code

Therefore, the corresponding Java code implementation is:

private static Point[] getPoints(Point currentPoint) {
    int currentX = currentPoint.x;
    int currentY = currentPoint.y;
    return new Point[]{
            new Point(currentX - 1, currentY), // North
            new Point(currentX + 1, currentY), // South
            new Point(currentX, currentY - 1), // West
            new Point(currentX, currentY + 1)  // East
    };
}

private static boolean solveMaze(int[][] maze, List<Point> result, Point start, Point end) {
    int x = start.x;
    int y = start.y;
    if (x == end.x && y == end.y) {
        result.add(start);
        return true;
    } else {
        result.add(start);
        Point[] points = getPoints(start);
        for (int i = 0; i < points.length; i++) {
            if (isValidPoint(maze, points[i])
                   && hasNotAccess(result, points[i])
                   && solveMaze(maze, result, points[i], end)) {
                return true;
            }
        }
        result.remove(start);
    }
    return false;
}
Copy the code

IsValidPoint () determines if it is out of bounds or against a wall; HasNotAccess () determines whether a node has been accessed. This prevents North -> South -> North… Or East -> West -> East… The endless cycle of.

The idea is to turn all the child nodes into something that can be enumerated by the for loop, and at each step, continue enumerating with the current node as the parent (that is, walk down) until you reach an end or a wall. In fact, this DFS code can be converted to an N loop.

4.2. String of non-repeating characters

Actually, this problem is not in LeetCode, I heard it from my classmate. After listening to the problem, I felt very interesting. I thought DFS could be used to solve the problem. Then I coded the code and found it could be solved. Here’s how it goes:

Given a string s with no repeating characters, the ith character of the string is represented by si, that is, s = s1s2S3… si… Sn Indicates k unique character strings consisting of S [1…n] and no repeated characters. Example 1: Input: s = “01234567”, k = 3 Output: “01234567”, “74625310”, “45367201”

Example 2: Input: s =”abc”, k = 5 Output: “abc”, “bca”, “cba”, “acb”, “bac”

The given string S is one-dimensional and obviously cannot be searched directly with DFS. To use DFS, the data must be two-dimensional. So we can construct a matrix, which is two-dimensional, taking s = “abcd” as an example, as follows:

Why would I want to construct a matrix? It’s not hard to see why. I keep saying that DFS code can be converted to an N loop. The n loop is easy to enumerate, so I constructed a 4 by 4 matrix with s in each row so THAT I could write it as follows:

for i in 0 ~ 3:
    for j in 0 ~ 3: 
        for m in 0 ~ 3: 
            for n in 0 ~ 3: 
                rs = s[i] + s[j] + s[m] + s[n]
            end
        end
    end
end
Copy the code

Then I can turn this into a recursive form:

for i in 0 ~ 3: 
	DFS(s, i)
end
Copy the code

If you put it in recursive form, it’s a little bit harder to see, so let’s look at this:

When we get to the ith string, we should go to S1… Sn goes, so we can use loops to enumerate.

So, the corresponding implementation code:

private static void bruteSearch(String[] targetCharMatrix, StringBuilder stringBuilder, List<String> result, int[] accessFlag, int depth) {
    if (depth < targetCharMatrix.length) {
        for (int i = 0; i < targetCharMatrix[depth].length(); i++) {
            if (accessFlag[i] != 1) {
                stringBuilder.append(targetCharMatrix[depth].charAt(i));
                accessFlag[i] = 1;
                bruteSearch(targetCharMatrix, stringBuilder, result, accessFlag, depth + 1);
                accessFlag[i] = 0;
                stringBuilder.deleteCharAt(stringBuilder.length() - 1);
            }
        }
    } else {
        result.add(stringBuilder.toString());
    }
}
Copy the code

First explain the parameter list: targetCharMatrix is the matrix constructed according to S; A stringBuilder is an object used to concatenate characters; Result is the object that holds the result; AccessFlag stores access results. If the ith element is 0, it indicates that the ith character has not been accessed (concatenated); otherwise, if it is 1, it indicates that the ith character has been accessed (concatenated). Depth indicates the number of concatenated characters.

Then there is the body of the method: first, determine whether the maximum depth is exceeded, that is, whether depth is equal to 4. If less than 4: iterate over targetCharMatrix[depth][0…3]. If targetCharMatrix[depth][I] is not accessed, set accessFlag[I] to 1, and use stringBuilder to concatenate targetCharMatrix[depth][I]. Set accessFlag[I] to 0 and remove targetCharMatrix[depth][I] from stringBuilder.

Otherwise, if depth = 4: stringBuilder.length() == 4, add it to result. It’s actually an n cycle problem.

In fact, it is not necessary to pass a matrix (String[] targetCharMatrix) directly, because every row of the matrix is the same. We just need to pass an original string (any row of the matrix) and use depth to control the maximum depth (the number of rows of the matrix). The code is as follows:

private static void bruteSearch(String targetChar, StringBuilder stringBuilder, List<String> result, int[] accessFlag, int depth) {
    if (depth > 0) {
        for (int i = 0; i < targetChar.length(); i++) {
            if (accessFlag[i] != 1) {
                stringBuilder.append(targetChar.charAt(i));
                accessFlag[i] = 1;
                bruteSearch(targetChar, stringBuilder, result, accessFlag, depth - 1);
                accessFlag[i] = 0;
                stringBuilder.deleteCharAt(stringBuilder.length() - 1);
            }
        }
    } else {
        result.add(stringBuilder.toString());
    }
}
Copy the code

5. Reflections triggered by two questions

As explained above, DFS can be written in the form of an n-repeat loop. So what does DFS have to do with n repetition? In its basic form, DFS calls itself within a loop. That is, a loop + recursion:

function DFS(graph, depth):
    if depth < maxDepth then
        for i in 0 ~ graph.get(depth).length():
            access(graph, depth)
            DFS(graph, depth + 1)
        end
    end
end
Copy the code

If graph is 4*4, if DFS(graph, 0) is called, then the first layer:

for i in 0 ~ 3:
    if 0 < 4 then
        access(graph, 0)
        DFS(graph, 0 + 1)
        end
end
Copy the code

The second layer:

for i in 0 ~ 3:
    if 1 < 4 then
        access(graph, 1)
        DFS(graph, 1 + 1)
        end
end
Copy the code

The third layer:

for i in 0 ~ 3:
    if 2 < 4 then
        access(graph, 2)
        DFS(graph, 2 + 1)
    end
end
Copy the code

The fourth floor:

for i in 0 ~ 3:
    if 3 < 4 then
        access(graph, 3)
        DFS(graph, 3 + 1)
    end
end
Copy the code

Replace layer I DFS(graph, (i-1) + 1) with layer I + 1 code:

for i in 0 ~ 3:
    if 0 < 4 then
        access(graph, 0)
        for i in 0 ~ 3:
            if 1 < 4 then
                access(graph, 1)
                for i in 0 ~ 3:
                    if 2 < 4 then
                        access(graph, 2)
                        for i in 0 ~ 3:
                            if 3 < 4 then
                                access(graph, 3)
                                DFS(graph, 3 + 1)
                            end
                        end
                    end
                end
            end
        end
    end
end
Copy the code

As can be seen from the above, DFS is essentially an N-cycle, and DFS can naturally be represented by n-cycle.

In fact, if you look at each layer, the only difference between each layer is the difference in the values of the parameters that are passed, you can also see that this is an n cycle, kind of like a recurrence of mathematical induction. In addition, the above mentioned sub-structure, in fact, I personally think that the problems can be solved recursively, are very similar to the fractal in mathematics.

6. Summary

If an enumeration problem whose substructures collectively make up the problem, like the maze problem mentioned above, can go up, down, left, and right after each step, which is a substructure that collectively makes up the original problem, then the problem can be written in the form of an n-repeat loop. If you can write it as an n cycle, you can convert it to a recursive form, and you can solve it with the idea of DFS.
If an enumeration problem, not its substructure of the problem, we can convert it, if it can be converted into can consist of a substructure, as mentioned above do not repeat the character string, a one-dimensional string natural no substructure could make this problem, then we will construct a matrix. So we can access the first, the second… The NTH element, written in the form of an n-loop, is converted to recursion and solved with the DFS implementation.
DFS is the basis for solving enumeration problems, but not all problems can be solved by copying DFS code. It needs to be analyzed on a case-by-case basis.

My feelings 7.

Before this, my understanding of DFS was just traversal of binary trees. After brushing a certain amount of problems, I gradually found that recursion can solve problems requiring multiple loops, namely the implementation of DFS, as long as the initial conditions and stop conditions are given. So brushing is still useful! To have an understanding of the algorithm, it is still necessary to brush through the questions (no need to see Discussion, which can solve the problems through various operations). That’s it. If this article has any question, please advise, common progress, thank ~!