Contact us: Youdao Technical team assistant: ydtech01 / email: [email protected]

preface

Tree structure, especially binary tree, is frequently used in our daily development process. However, we have not had a systematic and comprehensive understanding and cognition of tree structure before, so we take this opportunity to sort it out.

This article is part of the “Do you Really Know binary trees” series, which mainly introduces the basics of tree structure. After reading this article, if you want to become more proficient in binary trees, you can read another article “Do you really Know binary trees (Hand Tearing algorithm)” (next week).

First, the tree structure foundation

Whereas each node in a linked list can only point to the next node (the linked list here is a one-way list), each node in a tree can have several child nodes. Therefore, a tree structure can be expressed as follows:

interface TreeNode {
  data: any;
  nodes: TreeNode[]
}
Copy the code

1.1 degrees of the tree

PS: In the graph structure, there is also the concept of degree, which is divided into out degree and in degree. If the tree is regarded as a part of the graph, then strictly speaking, the degree of the tree is actually out degree. However, in a tree structure, we usually use the concept of degree to describe how many children the current tree node has.

That is, each node has several children, so the degree of a binary tree is at most 2, and the degree of a linked list (which can be viewed as a tree with only one child) is at most 1.

  • Theorem: In a binary tree, nodes with degree 0 have one more node than nodes with degree 2.

  • Proof: If a tree has n nodes, then the tree must have n-1 edges, that is, the number of points = the number of edges +1(this is true for all trees, not just binary trees) :

This tree has seven nodes, and the lines between nodes, which are edges, only have six.

So, we assume that the number of nodes with degree 0 is N0, the number of nodes with degree 1 is N1, and the number of nodes with degree 2 is n2. Since it is a node with degree 0, it indicates that the number of its edges is N0=0, the number of node edges with degree 1 is N1= N1 * 1, and the number of node edges with degree 2 is n2 =n2 * 2. Then the total number of nodes is:

# As can be seen from the above, the expression for the number of edges is as follows

N0=0;

N1=1*n1;

N2=2*n2;

# Number of nodes = number of edges +1

n0 + n1 + n2 = N0 + N1 + N2 + 1;

# Substitute N0,N1,N2 to get:

n0 + n1 + n2 = 0 + n1 + 2*n2 + 1;

# Simplification:

n0 = n2 +1;

The number of nodes with degree 0 is always one more than the number of nodes with degree 2

Thus, we have proved the above theorem, which may be easier to understand if described in a different way:

In a binary tree, as long as you know how many leaves there are, the number of leaves with degree 2 is the number of leaves minus 1, and conversely, the number of leaves with degree 2 is the number of leaves with degree 2 plus 1.

1.2 Tree traversal

    5
   / \
  1   4
     / \
    3   6
Copy the code

1.3 Tree traversal idea

Trees are inherently good data structures for recursive traversal, because every time you process a left subtree or a right subtree, you’re actually doing recursive traversal.

  • Pre-order traversal: “root node” “Output result of recursive traversal of left subtree” “Output result of recursive traversal of right subtree”

  • Middle order traversal: “Recursively traverses the output of the left subtree” “root node” “recursively traverses the output of the right subtree”

  • Post-order traversal: “Recursively traverses the output of the left subtree” “recursively traverses the output of the right subtree” “root node”

1.4 Divergent thinking

See here, some of you may feel deja vu, is there some knowledge about trees.

In fact, we discussed this topic earlier when we studied the stack data structure. We know that the stack is naturally suitable for expression evaluation, so what is the logical structure of the stack in the process of expression evaluation?

For example, 3*(4+5). In fact, although we are using the idea of a stack in our solution, we are actually logically simulating the operation of a tree. Don’t believe me? Let’s see:

[*] [3] [+] [4] [5]Copy the code

Above, we borrow this expression into a tree structure. When we encounter () in the expression, it indicates that the subexpression in the expression needs to be processed first, so we regard it as a subtree of our binary tree. As we all know, the idea of tree traversal is recursive traversal, which is to solve the problem layer by layer from bottom to top. In this way, in the process of recursive call, he will first solve the subproblem of the right subtree, and then perform the final calculation with the result calculated by the left subtree to get the final result. If you are interested in the results of stack data, you can refer to another article: 0003- Recursion and stack (Solving expression evaluation Problems).

1.5 Restoring the binary Tree

If we know any two of the pre-order traversal results, middle-order traversal results and subsequent traversal results, we can restore a binary tree completely. Such as:

1, 2, 3, 4Copy the code

Above is the output of the two traversal methods. We know that the first node of the preceding traversal must be the root node, so at this point, we already know that the root node of the original binary tree is 1. Next, we take the node of 1 into the output of the middle traversal and find the position of 1. Since the output result of middle-order traversal is left root right, it is not difficult to know that the middle-order traversal output to the left of 1 is the middle-order traversal output of the left subtree of the original binary tree, and the middle-order traversal output to the right of the original binary tree is the middle-order traversal output of the right subtree of the original binary tree. Thus, we can divide the middle-order traversal output into the following blocks:

The left subtree has only one node, 5, but the right subtree is still a sequence, so we continue to go down. We already know the original # by the left subtree of the binary tree, and right subtree sequences, so we have to cut the following preorder traversal results 1 2 3 4 5 # root left subtree Right subtree # after cutting the preorder traversal results, we find the right subtree of the sequence, the sequence of his first is right subtrees of root node, which is 2, Once the root node is found, it is easy to repeat the above steps. In the right subtree of the middle order traversal of the binary tree, the left and right subtrees of the right subtree are 3 and 4 respectively. At this point, we have restored the binary tree 1 / \ 5 2 / \ 3 4Copy the code

Wouldn’t it be easy to have a tree with five nodes? Now, let’s do a slightly harder thinking problem:

Given the pre-order traversal results and middle-order traversal results of the binary tree with 10 nodes, restore the binary tree.

Preceding traversal output sequence: 1, 2, 4, 9, 5, 6, 10, 3, 7, 8

Middle-order traversal output sequence: 4 9 2 10 6 5 1 3 8 7

# from 2. We know that 1 is the root node, so the left subtree sequence is: 4 9 2 10 6 5; Right subtree sequence: 3 8 7 1. Middle order: 4 9 2 10 6 5 1 3 8 7 From 1.2, 2 is the root node, so the sequence of the left subtree is 4 9; 9 is the root of the tree. 9 is the root of the tree. 9 is the root of the tree. 4 is the root node 1.2.1 preorder: 4 9 # 5 is the root node 1.2.1 preorder: 4 9 # The sequence of the right subtree is 8, 7, and 2.1. The sequence of the right subtree is 8, 7, and 2.1. The sequence of the right subtree is 8, 7, and 2.1. 3 8 7 # Assert: 3 is the root node 2.2 prefix: 3 7 8 # 7, 8 # Finally the binary tree looks like this: 1 / \ 2, 3 / \ 4, 5, 7, / / 9, 6, 8, 10Copy the code

1.6 Common classification of binary trees

1.6.1 Complete Binary Tree

Only a binary tree that lacks nodes to the right of the last level is called a complete binary tree, that is, the left side of a complete binary tree is full, and only the right side is allowed empty nodes.

                  1
            /           \
           2             3
          /   \         /  \
         4     5       6
Copy the code

A complete binary tree is an excellent data structure that has two main characteristics that give us a better experience in terms of performance and implementation.

The node number can be calculated

From the complete binary tree above, we can see a rule:

For a node numbered n, its left child root node must be numbered 2n, and its right child’s root node must be numbered 2n+1. For example, the left child root node of figure 2 is numbered 4, which means 2 * 2=4. The right child root node is numbered 5, that is, 2*2+1=5.

So what can we do with this?

As we know, a normal binary tree, in addition to the data domain for storing data, also needs additional storage space for storing Pointers to the left and right subtrees, that is, pointer domain. If we can directly calculate the number of the left and right subtrees of the current node by the rule above, then there is no need to store the storage address of the left and right subtrees. When a tree is large enough, this can save us a considerable amount of storage space.

The above method of replacing the address of record storage by calculation extends an algorithm idea that we often use in our daily work:

Above through the calculation to replace the method of recording storage address, extended a we often used in the daily work of an algorithm idea: the conversion of recording and computing ideas

  • Record (save time, consume space, no calculation, direct value, namely: space for time) : the information stored, when used to take out for use

  • Calculation formula (saving space, consuming time, no need to store, calculating value, namely: time for space) : obtained by calculation, such as 1+1=2 in 2 is the result obtained by calculating 1+1 expression

These two methods have their own advantages and disadvantages. It is meaningless to compare the advantages and disadvantages of these two methods without considering the problem itself. We should consider the specific problem and see which method can bring you more benefits.

Scenario 1: When memory space is limited and computation time is not required, such as running a program on a machine with small memory, we will choose the calculation formula and exchange time for space.

Scene 2: when we memory space is enough big, and the computing speed is required, such as enterprise application server running on a real-time calculation data, we will choose the recording, in space, in time, because of the application of an enterprise, the general memory is large enough, can also be dynamic expansion, at that time, the time the benefits will far outweigh the benefits of space.

Continuous storage space can be used for storage

In addition to the node number (node address) to calculate the features, fully binary tree because his number is continuous, ascending and continuous sequence from top to bottom, so we can put the complete binary tree stored in a continuous storage area, such as: in the array, the array subscript zero deposit element node 1, deposit of 1 elements 2 nodes.

Using this feature, we in the implementation of a complete binary tree, can need not like realize common binary tree alone define a structure, and define the data and pointer domains respectively to store data and pointer respectively, we can use an array to store data directly, this also is we the most common form of full binary tree.

Let’s imagine this: you are in the program implementation using a one-dimensional linear structure, namely the array to say, but in your mind, should be to convert it into two-dimensional tree structure to thinking about the problem, it is also a relatively advanced programming logic thinking ability, let the data structures we see can in the heart of the mind “compiled” into its true runtime.

Of course, it takes a lot of practice to acquire this kind of ability. At least, as I write about this trip, I can’t achieve this level.

1.6.2 Full Binary Tree

A binary tree with no nodes of degree 1 is called a full binary tree, meaning that all nodes have either no children or two children.

PS: We often see many articles and blogs on the Internet that put the definition of perfect binary tree on the full binary tree, which is actually wrong. See the specific definition of perfect binary tree below.

                  1
            /            \
           2              3
         /   \           /  \
        4     5	        6    7
             / \
            8   9
Copy the code

1.6.3 Perfect Binary Tree

All nodes have a degree of 2. This shows that the definition of a perfect binary tree is different from that of a full binary tree. We can say that a perfect binary tree is a special full binary tree.

                  1
            /           \
           2             3
         /   \          /  \
        4     5	       6    7
Copy the code

2. In-depth understanding of tree structure

2.1 the node

The nodes of a tree represent a set, and the child nodes represent disjoint subsets under the parent set. This may be difficult to understand, but let’s look at the following graph:

The binary tree above 5 / \ 2, 3 #, 5 nodes, we can think of it as a complete set, and the two children below, 2 and 3, are two disjoint subsets of the complete set, and the sum of the two subsets should equal the complete setCopy the code

From the graph above, we can draw a conclusion:

A node of the tree represents a set, while child nodes represent the disjoint subsets below the set. The sum of all subsets gives the set.

2.2 the side

Each edge of the tree represents the relationship.

Learn the function of binary trees

3.1 Search operations applicable to various scenarios

Due to the binary tree structure including natural recursive structure, and the characteristics of binary thinking perfectly, make the structure of binary tree and its various variant is very suitable for efficient lookup operation under the various scenarios, we have many design computer underlying variations based on binary tree and binary tree structure, is due to its excellent performance can provide efficient and stable search efficiency.

3.2 Helps understand the basics of high-level data structures

  • Complete binary tree (maintain set maximum value of the magic weapon)

    • The heap
    • Priority queue
  • Multi-fork tree/forest

    • Solve string and related conversion problems of the magic weapon

      • The dictionary tree
      • AC automata
    • A magic weapon to solve connectivity problems

      • Check and set
  • Binary sort tree

    • An underlying implementation of an important data retrieval container in the language standard library

      • AVL tree (binary balanced tree)
      • 2-3 tree (binary equilibrium tree)
      • Red-black tree (binary balanced tree)
  • Important data structures underlying file systems and databases

    • B tree /B+ tree (multi-fork equilibrium tree)

3.3 Best choice for practicing recursive skills

Learning the top-level thinking of recursion:

Design/understand a recursive program:

1. Mathematical induction => structural induction

If k0 is true, assuming ki is true, then k of I plus 1 is also true. As in solving the Fibonacci sequence:

function fib(n) {
	// Make sure that k0 is correct, i.e. the preconditions (boundary conditions) are correct. In this case, k0 is 1 for n=1 and 2 for n=2
  if(n <= 2) return n;
  return fib(n - 1) + fib(n - 2);
}
Copy the code

If you were asked to design a binary tree traversal program, what would you do?

  1. Function meaning: traversal the binary tree with root as the root node.
  2. Boundary conditions: if root is empty, root is returned directly without traversal.
  3. Recursive process: pre-traversal the left subtree and pre-traversal the right subtree respectively.
// Traversal the binary tree with root as the root node
function pre_order(root) {
  // Boundary condition: if root is empty, return root directly
  if(! root)return root;
  console.log(root.val);
  // Recursive process: preorder traversal left subtree and preorder traversal right subtree respectively
  pre_order(root.left);
  pre_order(root.right);
}
Copy the code

2. Give recursive functions a clear meaning

In the code above, fib(n) represents the value of the NTH Fibonacci sequence.

3. Think about boundary conditions

In the above code, our boundary is given as 1 for n=1 and 2 for n=2, requiring special treatment for this boundary.

4. Implement a recursive process

Once you’ve dealt with the boundary problem, you can continue recursively.

3.4 Use the left child has brother method to save space

To convert any non-binary tree to a binary tree, such as a trinomial tree to a binary tree:

# note that we must always ensure that the left side of the binary tree is a child node, node # 1 original trigeminal tree on the right is the brother / | \ 2 3 4 5 6 / \ # according to the left child right brother way into binary tree 1/2 / \ \ 3 4 5\6 # 2 is one child, so in the left subtree, Because 3 is the brother of 2, we put it on the right subtree of 2, 4 is the brother of 3, we put it on the right subtree of 3, 5 is the child of 3, we put it on the left subtree of 3, 6 is the brother of 5, so we put it on the right subtree of 5Copy the code

As you can see, the right subtree of the root node is always empty when only a tree is converted into a binary tree by the left child and right brother method. Then, can we effectively use this right subtree to merge multiple trees into a binary tree? For example, the following example merges two binary trees together to form a forest.

# if you want to combine the two trees into a binary tree? 7 / | \ / 1, 2, 3, 4, 8, 9, 5 6 7 1/2 \ \ / 3 / \ \ 5 4 9\6 # 8 in this way, we will be merged into one tree, two trees or forest. This tree looks like a binary tree, but it actually represents a forest of two treesCopy the code

Known as the Alpha Go monte carlo algorithm source code in the implementation of the concrete implementation of the algorithm, tree search algorithm framework called confidence limit tree algorithm (UCT) is to use the left child right brother method to implement a search tree, used to represent the entire board, normally, if you want to store the situation of a chessboard, Will be stored in a tree structure, but because there are too many checkerboard situations, it is possible to form a tree with more than 100 forks. In Alpha Go, to avoid this situation, the tree with more than 100 forks is converted into a binary tree by the representation of the left child has a brother. Those of you who are interested can check out Pachi.

So why does this save space? You think about it, a fork tree, his each node has three hands domain is used to store the subtree, regardless of whether there is a subtree, all want to reserve the space, the trigeminal tree as those listed above, there are 6 nodes, a total of 18 pointer field, including five effective pointer field only (the so-called effective pointer is a pointer domains is not pointing to the empty, That is, number of edges = number of nodes -1), then there are 18-5=13 Pointers that are empty. If there are 12 valid pointer fields and 5 valid pointer fields, then only 12-5=7 pointer fields are empty, which obviously saves a lot of space compared to the previous 13.

If a k-tree has n nodes, it will have at most K * N edges, and its edges are actually only n-1, then it will waste: K *n – (n-1)=(k-1)*n+1 edges, which means that as we branch more, we waste more space, so we need to convert a k-tree into a binary tree, because the wasted edges of a binary tree are: n+1, depending on which node we actually store data on.

Four, conclusion

At this point, we have talked about some basic knowledge of binary trees, in order to control the length and the acceptance of different bases of partners, I will not carry out a further discussion. I would have to brush the binary tree algorithm with you to consolidate some knowledge of the binary tree, but this will lead to a smelly and long article, so I will split it into two articles. The next article will skip the basics of trees and go straight to binary tree algorithms. Stay tuned.