This is the fifth day of my participation in the August More text Challenge. For details, see:August is more challenging

Postgraduate entrance examination data structure tree – reading extract summary

A Tree is a finite set of n (n≥0) nodes. When n=0, it’s called an empty tree. In any non-empty tree :(1) there is only one specific node called Root; (2) When n > 1, the remaining nodes can be divided into m (m > 0) non-intersecting finite sets T 1, T 2… , T M, where each set is itself a tree and is called a SubTree of the root.

The definition of the tree

We’ve been talking about one-to-one linear structures, but in reality, there are a lot of one-to-many cases that need to be dealt with, so we need to look at this one-to-many data structure, the tree, and think about its properties to solve the related problems that we encounter in programming.

A Tree is a finite set of n (n≥0) nodes. When n=0, it’s called an empty tree. In any non-empty tree :(1) there is only one specific node called Root; (2) When n > 1, the remaining nodes can be divided into m (m > 0) non-intersecting finite sets T 1, T 2… , T M, where each set itself is a tree and is called a SubTree of the root, as shown in Figure 6-2-1.

Figure 6-2-1

The definition of a tree is essentially the recursive approach that we talked about when we talked about stacks. In other words, in the definition of a tree we use the concept of a tree, which is a relatively new way of defining a tree. The subtrees T1 and T2 of Figure 6-2-2 are subtrees of the root node A. Of course, the tree composed of D, G, H and I is a subtree of node B, and the tree composed of E and J is a subtree of node C.

There are two more things to emphasize about the definition of a tree:

1. When n>0, the root node is unique, there can’t be more than one root node, don’t mix it with a real tree, a real tree has many roots, that’s a real tree, a tree in a data structure can only have one root node.

2. When m>0, there is no limit to the number of subtrees, but they must not intersect each other. Two structures like the one in Figure 6-2-3 do not fit the definition of a tree because they both have intersecting subtrees.

Figure 6-2-3

6.2.1 Node classification

A node in a tree contains a data element and branches that point to its subtree. The number of subtrees a node has is called its Degree. Nodes of degree zero are called Leaf or terminal nodes; Nodes whose degree is not zero are called non-terminal or branch nodes. In addition to the root, branch nodes are also called internal nodes. The degree of the tree is the maximum degree of each node in the tree. As shown in Figure 6-2-4, since the maximum degree of this tree node is the degree of node D, which is 3, the degree of the tree is also 3.

Figure 6-2-4

6.2.2 Relationships between Nodes

The root of the subtree of a node is called the Child of that node and, accordingly, the node is called the Parent of the Child. Well, why not father or mother, parents? Well, for a node its parent is the same, the only one, so it can only be called a parent. Children with the same parent are called brothers. The ancestor of a node is all the nodes on the branch from the root to that node. So for H, D, B, and A are its ancestors. Conversely, any node in a subtree whose root is a node is called its descendant. The descendants of B are D, G, H, and I, as shown in Figure 6-2-5.

Figure 6-2-5

6.2.3 Other concepts related to trees

The Level of a node is defined from the root. The root is the first layer, and the children of the root are the second layer. If a node is at level L, then the root of its subtree is at level L +1. Nodes whose parents are on the same level are Cousins to each other. It is clear that D, E and F in Figure 6-2-6 are Cousins, and so are G, H, I and J. The maximum level of nodes in a tree is called the Depth or height of the tree, and the current tree Depth is 4.

Figure 6-2-6 If each subtree of a node in a tree is considered to be ordered from left to right and not interchangeable, the tree is called an ordered tree; otherwise, it is called a disordered tree.

A Forest is a collection of m (m≥0) trees that do not intersect each other. For each node in the tree, the set of its subtrees is the forest. For the tree in Figure 6-2-1, the two subtrees in Figure 6-2-2 can actually be understood as forests.

Comparing the structure of a linear table with a tree, they are quite different, as shown in Figure 6-2-7.

Figure 6-2-7

6.3 Abstract data types of trees

Compared to linear structures, operations on trees are completely different, and here we show some basic and common operations.

6.4 Storage structure of the tree

When we think about storage structures, we think about sequential storage and chained storage, which we discussed in previous chapters.

Let’s start with the sequential storage structure, which stores the data elements of a linear table sequentially in a sequence of storage locations with contiguous addresses. This is natural for linear tables, but what about trees, which are many pairs?

A node in a tree can have more than one child, which means that no matter what order you store all the nodes in the tree into an array, the location of the nodes does not directly reflect the logical relationship. If you think about data elements stored one by one, who is their parent, who is their child? A simple sequential storage structure is not sufficient for tree implementation.

However, by making full use of the characteristics of sequential storage and chain storage structure, we can fully realize the representation of the tree storage structure. We are going to introduce three different notations here: the parent notation, the child notation, and the child sibling notation.

6.4.1 Parent representation

We may not have children because of various reasons, but no matter who can not jump out of the stone, sun Wukong is obviously not a person, so people must have parents. The tree structure is no exception, except for the root, every node, it may not have children, but it must have and only one parent.

We assume that the nodes of the tree are stored in a set of continuous Spaces, and in each node, there is an indicator indicating the location of the parent node in the linked list. In other words, each node knows not only who it is, but also where its parents are. Its nodal structure is shown in Table 6-4-1.

Table 6-4-1

Data is the data domain, which stores the data information of nodes. Parent is a pointer field that stores the subscripts of the node’s parents in the array.

Here is the nodal structure definition code for our parent notation.

#define MAX_TREE_SIZE 100 typedef int TElemType; #define MAX_TREE_SIZE 100 typedef int TElemType; /* struct PTNode */ struct PTNode */ {TElemType data; /* int parent; /* Parent location */} PTNode; PTNode nodes[MAX_TREE_SIZE]; struct PTNode nodes[MAX_TREE_SIZE]; /* array of nodes */ int r,n; */} PTree;Copy the code

With this structure defined, we can implement the parent representation. Since the root node has no parents, we agree that the location field of the root node is set to -1, which means that all of our nodes have the locations of their parents. The tree structure in Figure 6-4-1 and the tree parent representation in Table 6-4-2 are shown.

The 6-4-1 table of 6-4-1

With such a storage structure, we can easily find its parent node based on the parent pointer of the node. The time complexity is O(1) until the parent is -1, indicating that the root of the tree is found. But if we want to know what the child of the node is, sorry, we have to go through the whole structure.

This is really troublesome, can we improve it?

B: Sure. Let’s add a domain for the leftmost child of the node, let’s call it the firstborn domain, so that we can easily get the child of the node. If there are no children, the firstborn domain is set to -1, as shown in Table 6-4-3.

Table 6-4-3

For nodes with zero or one children, this structure solves the problem of finding children. Even if you have two children, knowing who the eldest is, the other is of course the second son.

Another problem scenario, we’re concerned about the relationship between brothers, and the parental representation doesn’t represent that relationship, so what do we do? Well, you can add a right sibling field to represent the sibling relationship, that is, every node if it has a right sibling, records the subscript of the right sibling. Similarly, if the right brother does not exist, the value is -1, as shown in Table 6-4-4.

Table 6-4-4

But if you have a lot of children, more than two. We also pay attention to the parents of the node, the children of the node, and the brothers of the node, and the time traversal requirements are relatively high, so we can also expand this structure to have the parent domain, the eldest son domain, and the right brother domain. The design of the storage structure is a very flexible process. Whether a storage structure design is reasonable depends on whether the operation based on the storage structure is suitable, convenient, and the time complexity is good. Note that more is not always better; design structures when necessary. Just like listening to good music for thousands of times will get bored, and watching a good movie for hundreds of times in a period of time will also get bored, don’t you think?

6.4.2 Child representation

Let’s think about it a whole different way. Since each node in the tree may have more than one subtree, we can consider using a multilinked list, that is, each node has multiple pointer fields, where each pointer points to the root of a subtree. This method is called multilinked list representation. But each node of the tree has a different degree, or number of children. So there are two solutions that can be devised.

Plan a

One is that the number of pointer fields is equal to the degree of the tree, and just to review, the degree of the tree is the maximum degree of each node in the tree. The structure is shown in Table 6-4-5.

Table 6-4-5

Where data is the data field. Child1 through childd are pointer fields that point to children of that node.

For the tree in Figure 6-4-1, the degree of the tree is 3, so the number of our pointer fields is 3. This approach is implemented as shown in Figure 6-4-2.

Figure 6-4-2

This is obviously a waste of space when you have different degrees of nodes in the tree, because there are a lot of nodes, and the pointer field is empty. However, if the degree of the tree is very small, it means that the space is being used, and then the disadvantage of the storage structure becomes the advantage.

Since many pointer fields can be empty, why not allocate space as needed. So we have a second option.

Scheme 2

In the second scenario, the number of pointer fields of each node is equal to the degree of the node. We specially take a position to store the number of pointer fields of the node, and its structure is shown in Table 6-4-6.

Table 6-4-6

Where data is the data field, degree is the degree field, that is, the number of children of the node stored, child1 to childd is the pointer field, pointing to the nodes of each child of the node.

For the tree in Figure 6-4-2, this approach is implemented as shown in Figure 6-4-3.

Figure 6-4-3

This method overcomes the disadvantage of wasting space and has a high utilization rate of space. However, due to the different structure of the linked list of each node and the number of degrees to be maintained, it will bring loss of time in operation.

Can there be a better way to reduce the waste of null Pointers and make the node structure the same?

If we look closely, it makes sense to put each node in an array of sequential storage structures in order to traverse the tree, but it’s not certain how many children each node has, so we create a single linked list of the children of each node to show their relationship.

That’s the child notation we’re talking about. The specific method is to arrange the children of each node, with a single linked list as the storage structure, then n nodes have N children linked list, if it is a leaf node, the single linked list is empty. Then the n head pointer forms a linear table, which adopts the sequential storage structure and is stored in a one-dimensional array, as shown in Figure 6-4-4.

Figure 6-4-4

Therefore, two kinds of node structures are designed. One is the child node of the child linked list, as shown in Table 6-4-7.

Table 6-4-7

Where child is the data field that stores the index of a node in the header array. Next is a pointer field that stores Pointers to the next child of a node.

The other is the header node of the header array, as shown in Table 6-4-8.

Table 6-4-8

Data is the data domain, which stores the data information of a certain node. Firstchild is the header pointer field, which stores the header pointer to the linked list of children of the node.

The following is the structure definition code for our child representation.

*/ #define MAX_TREE_SIZE 100 typedef struct CTNode /* struct CTNode */ {int child; struct CTNode *next; } *ChildPtr; Typepedef struct /* struct */ {TElemType data; ChildPtr firstchild; } CTBox; {CTBox nodes[MAX_TREE_SIZE]; /* array of nodes */ int r,n; */} CTree;Copy the code

So if we want to find a child of a node, or if we want to find a sibling of a node, all we have to do is find a single linked list of children of that node. It’s also handy for traversing the entire tree by looping through an array of head nodes.

But there’s also the problem, how do I know who the parents of a node are? It’s a lot of trouble because you have to go through the whole tree, but can’t you synthesize the parent representation and the child representation? Of course you can. See Figure 6-4-5.

Figure 6-4-5

We call this the parent child representation, which is sort of an improvement on the child representation. As for the specific structural definition of this notation, I will skip over here and leave it to students to design their own.

6.4.3 Sibling representation of children

We looked at the tree’s storage structure from the perspective of the parents and from the perspective of the children, but what if we looked at the tree’s siblings? Of course, in a hierarchy like a tree, you can’t just study the brothers of a node, but we observe that the first child of a node of any tree is unique if it exists, and the right brother is also unique if it exists. Therefore, we set two Pointers to the node’s first child and the node’s right sibling.

The node structure is shown in Table 6-4-9.

Table 6-4-9

Where data is the data domain, firstchild is the pointer domain, storing the storage address of the node’s firstchild, rightsib is the pointer domain, storing the storage address of the node’s right brother.

The structure definition code is as follows.

/* typedef struct CSNode {TElemType data; struct CSNode *firstchild,*rightsib; } CSNode,*CSTree;Copy the code

For the tree in Figure 6-4-1, a schematic of this implementation is shown in Figure 6-4-6.

Figure 6-4-6

This representation, to find a child of a node brought convenience, only through fistChild to find the eldest son of the node, and then through the rightsib of the eldest son of the node to find its second brother, and then go on, until find the specific child. Of course, if you want to find the parents of a node, this notation is also flawed, so what can you do?

Well, if you really need to, you can add a parent pointer field to solve the problem of finding parents quickly, but I won’t go into details here.

And actually the best thing about this notation is that it turns a complex tree into a binary tree. We deformed picture 6-4-6 to look like picture 6-4-7.

Figure 6-4-7