This is the fifth day of my participation in the August Wen Challenge.More challenges in August

This series of articles is a summary of personal learning, if you find mistakes or questions, welcome to comment!

This article is the fifth in the series of relearning data structures. The series of articles are as follows: 1. Time complexity and space complexity of algorithms 2. Relearn data structure — Linked list 3. Relearn data structure — Queue 4. Relearn data structure — Stack 5. Relearn data structures — trees

1. Why is a tree needed

Array storage mode analysis

  • Advantages: Fast access to elements by subscript. Binary lookup can also be used for ordered arrays to speed up retrieval
  • Disadvantages: If a value needs to be retrieved, it needs to be retrieved one by one. Inserting or deleting a value requires overall movement, which is inefficient

Chain storage analysis

  • Advantages: Optimization of array storage to a certain extent (for example: insert a node, just insert the node into the linked list, delete efficiency is also very good)
  • Disadvantages: Retrieval, such as retrieving a value, requires retrieval from the beginning node, which is still inefficient.

Tree storage structure analysis

It can improve the efficiency of data storage and reading. For example, the binary sorting tree can ensure the speed of data retrieval, data insertion, deletion, and modification.

2. Terminology and related concepts

2.1 Related Terms

  • node
  • The root node
  • The parent node
  • Child nodes
  • Leaf nodes (nodes with no children)
  • Weight of node (value of node)
  • Path (find the path from the root node to the corresponding node)
  • layer
  • Degree: For a node, the number of subtrees it has (how many branches it has) is called the degree of the node
  • subtree
  • Height of tree (maximum number of layers)
  • Forest (multiple sub-trees make a forest)

2.2 Related Concepts

Binary tree

There are many kinds of trees, each node has at most two children of a tree called a binary tree. The children of a binary tree are divided into left and right nodes

Full binary tree

A binary tree is said to be full if all leaf nodes are at the last level and the total number of nodes =2^n-1, where n is the number of levels.

Complete binary tree

A binary tree is called a complete binary tree if all the leaves are in the last or penultimate layer and the last leaf is continuous on the left and the penultimate leaf is continuous on the right.

There are two storage structures of binary tree: sequential storage and chain storage.

3. Operations related to chain storage of binary trees

The overall structure of the binary tree is linked list, and the related operations of the tree are increase, deletion, change and check, but as long as you master the tree traversal, delete other basically simple.

3.1 traversal

There are three methods for tree traversal, namely, pre-order, middle-order and post-order (the pre-order and post-order indicates the traversal order of the root node).

  • Sequential traversal: root node, left node, right node
  • Middle order traversal: left node, root node, right node
  • Back-order traversal: left node, right node, root node

implementation

Public HeroNode preTraverSearch(int no){if(this.no == no){return this; } HeroNode resNode = null; if(this.left! =null){ resNode = this.left.preTraverSearch(no); } if(resNode! =null){ return resNode; } if(this.right! =null){ resNode = this.right.preTraverSearch(no); } return resNode; } public HeroNode midTraverSearch(int no){HeroNode resNode = null; if(this.left! =null){ resNode = this.left.midTraverSearch(no); } if(resNode! =null){ return resNode; } if(this.no == no){ resNode = this; } if(resNode! =null){ return resNode; } if(this.right! =null){ resNode = this.right.midTraverSearch(no); } return resNode; } public HeroNode postTraverSearch(int no){HeroNode resNode = null; if(this.left! =null){ resNode = this.left.postTraverSearch(no); } if(resNode! =null){ return resNode; } if(this.right! =null){ resNode = this.right.postTraverSearch(no); } if(resNode! =null){ return resNode; } if(this.no == no){ resNode = this; } return resNode; }Copy the code

3.2 Sequence traversal

  • 1. Sequence traversal first needs to know how many layers there are in the binary tree (recursive implementation)

    Public int depth(HeroNode node){int leftDepth=0; int rightDepth=0; if(node==null){ return 0; } if(node.left! =null){ leftDepth = depth(node.left); } if(node.right! =null){ rightDepth = depth(node.right); } return (leftDepth>rightDepth? leftDepth:rightDepth)+1; }Copy the code
  • 2. Iterate over the output for each layer

    / / sequence output public void levelPrint (HeroNode node, int level) {/ / empty or unreasonable hierarchy tree if (node = = null | | level < 1) {return; } if(level==1){ System.out.print(node+"\t"); } levelPrint(node.left,level-1); levelPrint(node.right,level-1); } public void levelTraverse(HeroNode node){if(node==null){return; } int depth = depth(node); for (int i = 1; i <= depth ; i++) { levelPrint(node,i); System.out.println(); }}Copy the code

3.3 delete

For chain-stored binary trees, you’re going to iterate over them, but it’s really easy to delete them. You simply find the corresponding node to delete and empty it.

implementation

Public void delNode(int no){if(this.left! =null && this.left.no == no){ this.left = null; return; } // The right child is not empty and the right child is the data to delete if(this.right! =null && this.right.no == no){ this.right = null; return; } // The left child is not empty, the left recursive if(this.left! =null){ this.left.delNode(no); } // The right child is not empty, recurse to the right if(this.right! =null){ this.right.delNode(no); }}Copy the code

Chain storage binary tree concrete implementation code

4. Store binary trees in sequence

Sequential storage of binary trees refers to the use of sequential tables (arrays) to store binary trees. It is important to note that sequential storage only works with complete binary trees. In other words, only complete binary trees can use sequential table storage. Therefore, if we want to store ordinary binary trees sequentially, we need to convert ordinary binary trees into full binary trees in advance.

Turning a normal binary tree into a full binary tree is as simple as adding some extra nodes to the tree and “piecing” it together.

Features:

  • The left node of the NTH element is2*n+1
  • The right node of the NTH element is2*n+2
  • The parent of the NTH element is(n-1)/2
  • N represents the number of elements in the binary tree (counting from top to bottom, left to right, starting at 0)
  • Arrays can be converted to binary trees, and binary trees can be converted to arrays (sequential traversal)

Concrete implementation code

5. Cue binary tree

5.1 Introduction

Why is there a cued binary tree? When we want to get the preceding node (precursor node) or the following node (back-end node) of a node in the binary tree, the ordinary binary tree cannot be obtained directly, but can only be obtained by traversing the binary tree once. It is inconvenient to traverse the binary tree every time it involves finding a precursor node or a back-end node. Do you want to have a bidirectional list at this point? Each node stores information about the previous node and the next node. Cue binary tree is based on this structure.

Looking at the binary tree structure below, we can see that the left and right Pointers of nodes 6, 8, 10 and 14 are not fully utilized.

For a binary linked list with n nodes, each node has two pointer fields pointing to the left and right nodes, so there are 2n pointer fields. However, the binary tree with n nodes has n-1 branches (except the root node, no branches point to a node), that is, there are 2n-(n-1)= N +1 null pointer fields. Space for these null pointer fields is wasted, so empty chain fields can be used to hold the precursor and successor of the node. The clue binary tree uses N +1 empty chain domain to store the information of precursor and successor nodes.

We call the pointer to the precursor and successor a clue, and the binary tree with the clue is called a clue binary tree.

The process of traversing an ordinary binary tree in some order to make it a cued binary tree is called cueing. Because the precursor and successor nodes can only be obtained in the traversal of the binary tree, the specific process of cueing is to modify the null pointer in the traversal of the binary tree.

The middle sequence clue is shown in the figure below:

  • The successor of 8 is 3
  • The precursor of 10 is 3 and the successor of 10 is 1
  • The precursor node of 14 is 1 and the successor node of 14 is 6

5.2 Implementation

If you only use empty nodes on the basis of the original binary tree, then you have the problem of not knowing whether the left pointer points to its left child or its predecessor, and the right pointer points to its right child or its successor. So we can add their pointing and setting flags to distinguish them.

If leftType == 0 points to the left subtree, and if 1 points to the precursor

If rightType == 0 points to the right subtree, and if 1 points to the successor

Concrete implementation code

reference


6. Practical application of tree structure

6.1 the heap sort

Before learning about heap sorting, we need to understand what a heap is: a sequence of n elements can be called a heap if the elements in the sequence satisfy one of the following relationships.

  • Small push: ki ≤ k2i and KI ≤ k2i+1 (in the range of N records, the value of the ith keyword is less than the 2nd keywordI keywords, also less than the secondI +1 keyword)Each node has values less than or equal to the values of its left and right nodes
  • Large top heap: Ki ≥ K2i and KI ≥ K2I +1 (in the range of N records, the value of the ith keyword is greater than the value of the 2ndI keywords, but also greater than the secondI +1 keyword)Each node has a value greater than or equal to the value of its right child

The definition of a heap can also be explained using a complete binary tree, because in a complete binary tree the left child of the ith node is exactly the 2i node, and the right child is exactly 2i+1 node. If the sequence can be called a heap, then in a complete binary tree built with the sequence, the value of each root node must be no less than (or no greater than) the values of the left and right child nodes.

For the unordered table {49,38,65,97,76,13,27,49}, its corresponding heap is represented by a complete binary tree:

When the heap is represented as a complete binary tree, the representation is not unique, but it is certain that the root of the tree is either the smallest or the largest value in the unordered table.

By moving the unordered list into a heap, can find the maximum or minimum value in the table directly, then extracted, to the rest of the record to rebuild a heap, take great value or a small value, so can get an ordered sequence, the process for heap sort (ascending commonly use big heap, descending using small heap).

Train of thought

  • To build an unordered sequence into a heap, the heap condition is satisfied (starting from the last non-leaf node), and the large or small top heap is selected according to ascending or descending requirements
  • Swap the top element with the end element, “sinking” the largest element to the end of the array
  • Rearrange the structure so that it meets the heap definition, and then continue swapping the top heap element with the current end element, repeating the adjust + swap step until the sequence is in order.

Illustrative illustration

You are given an array {4,6,8,5,9} and asked to use heap sort to sort the array in ascending order.

Step 1: Construct the initial heap. The given unordered sequence is constructed into a big top heap (generally the big top heap is used for ascending order and the small top heap is used for descending order). Raw array [4, 6, 8, 5, 9]

  • Suppose the given sequenceless structure is as follows

  • At this point, we start from the last non-leaf node (leaf node naturally need not adjust, the first non-leaf node arr. Length /2-1=5/2-1=1, namely the following 6 nodes), from left to right, from bottom to top. In [6,5,9], element 9 is the largest, and 6 and 9 are swapped

  • Find the second non-leaf node 4, since the 9 element in [4,9,8] is the largest, 4 and 9 are swapped

  • At this time, swapping leads to structural disorder of the child roots [4,5,6], further adjustment,6 is the largest in [4,5,6], and swapping 4 and 6.

At this point, we construct a large top heap from an unordered sequence.

Step 2: Swap the top element with the end element to make the end element the largest. We then continue to adjust the heap, swapping the top element with the bottom element to get the second largest element. And so on and so forth.

  • Swap the top element 9 with the bottom element 4

  • Readjust the structure so that it continues to meet the heap definition

  • Then swap the top element 8 with the bottom element 5 to get the second largest element 8

  • The follow-up process, continue to adjust, exchange, so repeated, eventually making the sequence order

The specific implementation

Concrete implementation code

Heap sort is a selective sort, which has the worst, best, average time complexity of O(nlogn), and it is also unstable sort.

6.2 The Huffman Tree

Huffman tree, also known as Huffman tree, Optimal tree, and Optimal binary tree. Before we learn about Huffman trees, we need to know a few nouns.

Nouns related to Huffman tree

  • Path: A path from one node to another in a tree.
  • Path length: In a path, the path length is increased by 1 each time it passes through a node. For example, in a tree, if the number of layers of the root node is 1, the length of the path from the root node to the i-layer node is i-1. The path from root to c in the figure below is 3.
  • Node weight: Assign a new value to each node, which is called the node weight. In the figure below, the weight of node A is 7 and that of node B is 5
  • The weighted path length of a node: the product of the length of the path from the root node to the node and the weight of the point. For example, the weighted path length of node B in the following figure is 2 x 5=10.

The weighted path length of the tree is the sum of the weighted path length of all leaf nodes in the tree. Usually written as “WPL”. For example, the weight path length of the tree shown below is:

WPL = 7 * 1 + 5 * 2 + 2 * 3 + 4 * 3
Copy the code

What is a Huffman tree

When trying to construct a tree with n nodes (all as leaf nodes and all with their own weights), if the tree constructed has the smallest weighted path length (WPL), the tree is called “optimal binary tree”, also known as “Huffman tree”.

When constructing a Huffman tree, to minimize the length of the tree’s weighted path, there is only one principle to follow, that is: the node with greater weight is closer to the root.

implementation

Concrete implementation code

6.3 Binary Sorting Tree

thinking

You are given a sequence of numbers (7, 3, 10, 12, 5, 1, 9) that can be efficiently queried and added to data.

Solution Analysis
An array of

Array not sorted, advantage: quick to add directly to the end of the array. Disadvantages: Slow search (front to back search)

Array sorted, advantage: fast binary lookup. Disadvantages: In order to keep the array in order, the data needs to move slowly when inserting

The list

Linked lists, whether ordered or unordered, are slow to look up, faster to add than arrays, and do not require the data to move as a whole.

Binary sort tree

The demand is being met.

What is a binary sort tree?

Binary Soft(Search) Tree (BST) For any non-leaf node of a binary tree such that the value of the left child is smaller than the value of the parent, and the value of the right child is larger than the value of the parent, such a binary tree is called a binary sort tree (the middle-order traversal is exactly ascending).

Note: If you have the same value, you can put the node on the left child node or the right child node.

For example, for data (7, 3, 10, 12, 5, 1, 9), the corresponding binary sorting tree is:

Binary sort tree deletes nodes

Creating a binary tree (recursively) and traversing a binary tree (middle-order traversal) are relatively simple. The relatively difficult thing to understand is deleting nodes because there are so many things to consider. Because the binary sorting tree needs to meet the condition that the left child node is smaller than the parent node, and the right child node is larger than the parent node, two nodes should be found every time, one is the node to be deleted, and the other is the parent node of the node to be deleted. After finding these two nodes, there are the following three cases:

1. The node to be deleted is a leaf node

  • How to determine: the node to be deleted has no left child and no right child

  • Setting the value of the leaf node to null removes the corresponding node

    If (targetNode.left==null&&targetNode.right==null){// If (parent.left! Value ==value){// targetNode is the left child of the parent node. }else if(parent.right! Value ==value){// The targetNode is the right child of the parent node. }}Copy the code

3. The node to be deleted has two subtrees

  • How to determine: The node to be deleted has both right and left subtrees

  • Find the node corresponding to the minimum value in the right child node and delete it, and replace its node value with the node to be deleted. For example, in the sequence shown in the figure above (7, 3, 10, 12, 5, 1, 9), to delete node 7, the minimum value of its right child node 9 needs to be deleted, and the deletion is completed by replacing 7 with 9 (which satisfies the ascending order of the middle-order traversal result S).

2. The node to be deleted has only one subtree

  • How to read: In addition to the above two cases left is this case

  • There are two cases to consider here

    • The only node to delete is the left subtree

      • The node to be deleted is not the root node
      • The node to be deleted is the root node
    • The only node to delete is the right subtree

      • The node to be deleted is not the root node
      • The node to be deleted is the root node

      Here is a diagram with only left child nodes for easy comprehension. There’s only the right child analogy.

    If (targetNode.left! =null){// If the node to be removed has left children if(parent! =null){ if(parent.left.value==value){ parent.left = targetNode.left; }else {// targetNode is the right child of parent; parent.right = targetNode.right; }}else {// The node to be removed is the root node root = targetNode.left; }}else {// If the node to be removed has a right child if(parent! =null){if(parent.left. Value ==value){// If targetNode is the left child of parent; }else{// If targetNode is the right child of parent parent.right = targetNode.right; }}else {// The node to be deleted has the root node root = targetNode.right; }}Copy the code

Concrete real code

6.4 Balanced binary trees

Look at a problem

You are given a sequence {1,2,3,4,5,6} and asked to create a binary sort tree (the constructed binary sort tree is shown below) and analyze the problems.

Existing problems:

  • The left subtree is completely empty and looks more like a singly linked list in form
  • The query speed is significantly reduced (requiring sequential comparisons) and the BST advantage cannot be taken advantage of because the left subtree is compared each time, which is slower than a single linked list
  • Solution: Balanced binary tree (AVL)

What is a balanced binary tree

Balanced binary tree is also called balanced binary search tree (Self – balancingbinarysearchtree) is also known as AVL tree, which can ensure that the query efficiency is higher.

It has the following characteristics:

  • Balanced binary trees satisfy the conditions of binary sorting trees (balanced binary trees are also binary sorting trees)
  • The absolute value of the height difference between the left and right subtrees is no more than 1, and both the left and right subtrees are balanced binary trees
  • The common realization methods of balanced binary tree are red black tree, AVL, scapegoat tree and stretch tree

A binary sorting tree that is not a balanced binary tree can be converted to a balanced binary tree by rotation. There are three kinds of rotation: left rotation, right rotation and double rotation.

Left rotation

For example, the binary sorting tree constructed by {4,3,6,5,7,8} can be converted to a balanced binary tree by rotation to the left.

Left rotation condition :(right subtree height – left subtree height)>1

Left rotation idea:

  • 1. Create a node based on the value of the current node
  • 2. Set the left subtree of the new node to the left subtree of the current node
  • 3. Set the right subtree of the new node to the left subtree of the right subtree of the current node
  • 4. Replace the value of the current node with the value of the right subtree of the current node
  • 5. Set the value of the right subtree of the current node to that of the right subtree of the current node
  • 6. Set the left subtree of the current node to the value of the new node

Friendly reminder: so many steps are a little dizzy? It doesn’t matter. You don’t have to worry about it. You’ll get it if you draw it a few times. Just like playing rubik’s cube as a child, at the beginning of the back formula, after playing more, without the formula can also put it together.

So here’s my left rotation.

Code implementation:

Public void leftRotate(){// Create a new node with the value of the current Node3 newNode = new Node3(value); // set the left subtree of the newNode to the left subtree of the current node newnode. left = left; // set the right subtree of the newNode to the left subtree of the right subtree of the current node newnode. right = right.left; // Replace the value of the current node with the value of the right child node value = right.value; // Set the right subtree of the current node to the left subtree of the current node's right subtree. // set the left subtree of the current node to the newNode. }Copy the code

Right rotation

For example, a binary sorting tree constructed from the {10,12, 8, 9, 7, 6} sequence can be converted to a balanced binary tree by right rotation.

Right rotation condition :(left subtree height – right subtree height)>1

Right rotation thinking:

  • 1. Use the value of the current node to create a new node
  • 2. Set the left subtree of the new node to the right subtree of the current node
  • 3. Set the right subtree of the new node to the right subtree of the current node
  • 4. Replace the value of the current node with that of the left child node of the current node
  • 5. Set the left child of the current node to the value of the left child in the left subtree of the current node
  • 6. Set the right subtree of the current node to the new node

The rotation process is as follows.

Code implementation:

   public void rightRotate(){
        Node3 newNode = new Node3(value);
        newNode.right = right;
        newNode.left = left.right;
        value = left.value;
        left = left.left;
        right = newNode;
    }
Copy the code

Double rotation

Some sequences cannot be converted into balanced binary trees by a single rotation, so we can achieve this by a double rotation. For example, a sorted binary tree constructed from {10, 11, 7, 6, 8, 9} can be replaced by a balanced binary tree by a double rotation.

Double rotation condition:

  • 1. When the right rotation condition is met: the height of the right subtree of its left subtree is greater than that of the left subtree of its left subtree
  • 2. When the left rotation condition is met: the left subtree height of its right subtree is greater than the right subtree height of its right subtree

Ideas:

  • At the same time, the left subtree of the current node is rotated left, and then the current node is rotated right
  • At the same time, the right subtree of the current node is rotated right, and then the current node is rotated left

Diagram process:

Code implementation:

If (rightHeight()-leftHeight()>1){if(right! =null&& right.leftheight ()> rightHeight()){// Right.rightrotate (); // Rotate the current node leftRotate(); }else {// Rotate leftRotate() directly; } return; } // If (leftHeight()-rightHeight()>1){// If (left! =null&& lef.rightheight ()> lef.leftheight ()){lef.leftrotate (); // Rotate the current node rightward rightRotate(); }else {// Go right to rightRotate(); }}Copy the code

Concrete implementation code

7. Multi-path lookup tree (Understand)

1. Binary tree and B tree

Analysis of binary tree problem

Binary tree is efficient, but it also has problems. Since binary trees need to be loaded into memory, if there are few nodes, there is no problem. If there are many nodes in binary trees, there are the following problems:

  • During binary tree construction, I/O operations (massive data is stored in database files) need to be performed for several times and there are a large number of nodes. Therefore, the speed of binary tree construction is affected.
  • A large number of nodes will also cause a large height of the binary tree, which will slow down the operation speed.

Many tree

In a binary tree, each node has data items and up to two child nodes. Allowing each node to have more items and more children is a multi-fork tree. 2-3 trees, 2-3-4 trees and B trees are multi-fork trees. Multi-fork trees can optimize binary trees by reorganizing nodes and reducing the height of the tree.

The basic introduction of B tree

B-tree improves efficiency by reorganizing nodes, reducing tree height, and reducing I/O operations.

The picture below shows a B tree.

  • B trees reduce the height of the tree by reorganizing the nodes.
  • Designers of file and database systems take advantage of disk prefetch and set the size of a node to be equal to a page (typically 4k) so that each node can be fully loaded with only one IO read.

2. 2-3 tree

The 2-3 tree is the simplest b-tree structure and has the following characteristics

  • 2-3 trees all leaf nodes in the same layer (as long as B trees meet this condition)
  • A node with two nodes is called a two-node node, and a two-node node has either no children or two children
  • A node with three nodes is called a three-node point, and a three-node point has either no node or three nodes.
  • A 2-3 tree is a number made up of two nodes and three nodes.
  • 2-3 trees still satisfy the rules of binary sort tree (BST)

So in addition to the two – and three-four trees, we have a two – and three-four tree, which is a kind of B tree.

3. B tree, B+ tree, and B* tree

B tree introduction

B tree B tree B tree B-tree B-tree B-tree B-tree B-tree It’s a b-tree. When we learn about MySQL, we often hear that some type of index is based on B tree or B+ tree. B tree is shown below

Note to the above image:

  • Order of B tree: the maximum number of nodes. Let’s say that 2 minus 3 trees have order 3, 2 minus 3 minus 4 trees have order 4
  • B tree search: binary search is carried out for the keyword sequence in the node starting from the root node. If the search result is matched, the search ends; otherwise, the son node whose search keyword belongs to the range is entered; Repeat until all sons are grounded where Pointers are null or already leaf nodes.
  • Keyword sets are distributed in the whole tree, that is, both leaf nodes and non-leaf nodes store data.
  • The search may end at a non-leaf node.
  • Search performance is equivalent to a binary search in the keyword set.

Introduction to B+ trees

The B+ tree is a variant of the B tree and a multipath search tree.

Illustration of the above figure:

  • The search of B+ tree is basically the same as that of B tree, but the difference is that B+ tree only hits the leaf node (B tree can hit the non-leaf node), and its performance is equivalent to a binary search in the whole set of keywords.
  • All keywords appear in the linked list of leaf nodes (i.e. data can only be in the leaf node (also called dense index)), and the keywords in the linked list happen to be ordered.
  • Impossible to hit on a non-leaf node
  • Non-leaf nodes are equivalent to all (sparse indexes) of leaf nodes, and leaf nodes are equivalent to data layers that store (keyword) data.
  • B+ trees are more suitable for file indexing systems
  • B trees and B+ trees have their own application scenarios. It is not true that B+ trees are better than B trees and vice versa.

B * tree

The B* tree is a variant of the B+ tree that adds Pointers to siblings at non-root and non-leaf nodes.