Welcome to join the code farming architecture technology sharing family ~

Today we talk about another special kind of tree, the Heap. Heap data structures can be used in many different ways, the most classic of which is heap sorting. Heap sort is an in situ sorting algorithm with O(nlogn) time complexity.

We learned that quicksort, on average, is order nlogn. Although the time complexity of these two sorting algorithms is O(nlogn), even heap sort is more stable than the time complexity of fast sort, but in the actual software development, the performance of fast sort is better than heap sort, why?

Now, you may not be able to answer it, or even a little confused about the question itself. That’s okay. With that question in mind, let’s move on to today’s lesson. Maybe you’ll be able to answer it when you’re done.

How to understand “heap”?

As we mentioned earlier, a heap is a special kind of tree. Now let’s see what kind of tree is a heap. I’ve listed two requirements, and as long as those two requirements are met, it’s a heap.

  • The heap is a complete binary tree;
  • The value of each node in the heap must be greater than or equal to (or less than or equal to) the value of each node in its child tree.

Let me explain these two points separately.

First, the heap must be a complete binary tree. Remember our definition of a complete binary tree? A complete binary tree requires that the number of nodes in all layers be full except for the last layer, which is arranged to the left.

Second, the value of each node in the heap must be greater than or equal to (or less than or equal to) the value of each node in its child tree. In fact, we can put it another way: the value of each node in the heap is greater than or equal to (or less than or equal to) the value of its left and right children. These two statements are equivalent.

A heap whose value of each node is greater than or equal to the value of each node in the subtree is called a “big top heap”. A heap whose value is less than or equal to the value of each node in the subtree is called a “small top heap”.

Now that the definition is clear, let’s see, are the following binomial trees piles?

The first and second are big top heaps, the third is small top heaps, and the fourth is not a heap. In addition, as you can see from the figure, we can build many different types of heap for the same set of data.

How do YOU implement a heap?

To implement a heap, we need to know what operations the heap supports and how to store a heap.

As I said before, a complete binary tree is a good place to store it in arrays. Using arrays to store full binary trees is very storage efficient. Because we don’t need to store Pointers to the left and right child nodes, we can find the left and right child and parent nodes of a node simply by using the index of the array.

I’ve drawn an example of using an array to store a heap, so you can look at it first.

As can be seen from the figure, the left child of the node with subscript I in the array is the node with subscript I ∗2, the right child is the node with subscript I ∗2+1, and the parent node is the node with subscript I /2.

Now that we know how to store a heap, what are the operations on the heap? I listed a few very core operations, namely, inserting an element into the heap and deleting the top of the heap. (Unless otherwise specified, I’m going to use the top pile.)

1. Insert an element into the heap

After we insert an element into the heap, we need to continue to satisfy two characteristics of the heap.

If we put the newly inserted element at the end of the heap, you can see what I’ve drawn here, isn’t that going against the heap? We then need to adjust it to meet the characteristics of the heap again, a process we call heapify.

There are actually two kinds of heap, bottom up and top down. I’m going to start with the bottom-up heap-up method.

Heapification is very simple, just follow the path of the nodes, up or down, compare, and swap.

I’ve drawn a breakdown of the heap process here. We can compare the size of the newly inserted node to the parent node. If the child is less than or equal to the parent, we swap the two nodes. This process is repeated until the size relationship between the parent and child nodes is satisfied.

I’ve translated what I said above about inserting data into the heap into code, so you can look at it together.

public class Heap { private int[] a; Private int n; private int n; Private int count; private int count; Public Heap(int capacity) {a = new int[capacity + 1]; n = capacity; count = 0; } public void insert(int data) { if (count >= n) return; ++count; a[count] = data; int i = count; While (I / 2 > 0 && a [I] > [I / 2] a) {/ / heap from down to up the swap (a, I, I / 2); // Swap () swaps elements with subscripts I and I /2. }}}Copy the code

2. Delete the heaptop element

From the second section of the heap definition, where any node is greater than or equal to (or less than or equal to) the value of a subtree node, we can see that the top element stores the maximum or minimum value of the heap.

Let’s say we’re building a big top heap, and the top element is the largest element. After we remove the top element, we need to put the second largest element at the top of the heap, which must be in the left and right child nodes. We then iteratively remove the second largest node, and so on, until the leaf node is removed.

I’ve drawn a breakdown here as well. The problem with this approach, however, is that the resulting heap does not satisfy the properties of a complete binary tree.

In fact, we can solve this problem by slightly changing our thinking. Look at the picture I drew at the bottom. We put the last node on top of the heap and use the same parent-child comparison method. For those that do not satisfy the parent-child node size relationship, the two nodes are switched and the process is repeated until the parent-child node size relationship is satisfied. That’s how you stack it from the top down.

Because we remove the last element in the array, and in the heapification process, all swap operations, there will be no “holes” in the array, so the result of this heapification method must meet the characteristics of a complete binary tree.

I have also translated the above deletion process into code and posted it here so you can see it in combination.

public void removeMax() { if (count == 0) return -1; A [1] = a[count]; --count; heapify(a, count, 1); } private void heapify(int[] a, int n, int I) {// From top to bottom heap while (true) {int maxPos = I; if (i*2 <= n && a[i] < a[i*2]) maxPos = i*2; if (i*2+1 <= n && a[maxPos] < a[i*2+1]) maxPos = i*2+1; if (maxPos == i) break; swap(a, i, maxPos); i = maxPos; }}Copy the code

We know that for a perfect binary tree with n nodes, the height of the tree is no higher than log base 2n. The heapification process is commutated along the path of nodes, so the time complexity of heapification is proportional to the height of the tree, that is, O(logn).

The primary logic for inserting data and removing heaptop elements is heapization, so the time complexity for inserting an element into the heap and removing the heaptop element is O(logn).

How do I sort based on the heap?

We’ve talked about a couple of sorting algorithms, and let’s recall, bubble sort, insertion sort, selection sort, merge sort, quick sort, O nlogn, and linear sort.

The sorting algorithm that we implement here with the help of the heap data structure is called heap sort. The time complexity of this sorting method is very stable, O(nlogn), and it is an in-place sorting algorithm. So good, how did it do it?

We can roughly break down the heap sort process into two large steps, build heap and sort.

1. Build the heap

We start by building the array into a heap in place. “In place” means operating on the original array without resorting to another array. There are two ways to build a heap.

The first is using the idea of inserting an element into the heap that we talked about earlier. Although the array contains n data, we can assume that the heap initially contains only one data, the data with subscript 1. We then invoke the insert operation described earlier, inserting the data with subscripts from 2 through n into the heap. So we’re organizing an array of n numbers into a heap.

The second approach, which is the opposite of the first, is what I’m going to talk about in detail here. The first heap-building approach is to process the array data post-processing, and as each data is inserted into the heap, it is heaped from the bottom up. The second approach is to process the array from back to front, with each data heaped from top to bottom.

I gave an example, and I drew a heap decomposition step diagram for the second implementation, which you can look at. Since the leaf nodes are heaped down only compared to themselves, we simply heap them from the last non-leaf node.

It might be easier for programmers to understand this by looking at the code, so translate the second implementation idea into code, and you can take a look.

private static void buildHeap(int[] a, int n) { for (int i = n/2; i >= 1; --i) { heapify(a, n, i); } } private static void heapify(int[] a, int n, int i) { while (true) { int maxPos = i; if (i*2 <= n && a[i] < a[i*2]) maxPos = i*2; if (i*2+1 <= n && a[maxPos] < a[i*2+1]) maxPos = i*2+1; if (maxPos == i) break; swap(a, i, maxPos); i = maxPos; }}Copy the code

As you may have noticed in this code, we heap data with subscripts n/2 up to 1, nodes with subscripts n/2+1 up to n are leaf nodes, so we don’t need to heap. In fact, for complete binary trees, all nodes with subscripts n/2+1 to n are leaf nodes.

Now, what is the time complexity of the heap building operation?

If the time complexity of heapification per node is O(logn), is the total time complexity of heapification for n/2+1 nodes O(logn)? This answer is also correct, but it is not precise enough. In fact, heapsort’s heapbuilding process takes O(n) time. Let me take you through it.

Because leaf nodes do not need to be heaped, nodes that need to be heaped start at the second-to-last layer. During the heapification of each node, the number of nodes to be compared and exchanged is proportional to the height k of this node.

Plot the number of nodes in each layer and their corresponding heights, so you can see. We just need to sum the heights of each node to get the heap time complexity.

We sum the heights of each non-leaf node, which is the following formula:

This is a little tricky, but I think we all learned in high school that if you multiply both sides of the equation by 2, you get another formula, S2. If we misalign S2, and subtract S1 from S2, we get S.

The middle part of S is a geometric sequence, so it can be calculated using the summation formula of geometric sequence, and the final result is the one drawn in the figure below.

Because h=log2^n, if you plug in S, you get S=O(n), so the heap time is O(n).

2. The sorting

After the heap is built, the data in the array is organized according to the characteristics of the big top heap. The first element in the array is the top of the heap, which is the largest element. We swap it with the last element, so the largest element is going to be at index n.

This process is similar to the “delete top element” operation described above. After the top element is removed, we put the element with subscript N on the top of the heap, and then rebuild the heap with the remaining N −1 elements by heapization. After the heap is heaped, we take the top element of the heap, place it at index n−1, and repeat the process until there is only one element with index 1 left in the heap. The sorting is complete.

The heapsort process, I’ve translated into code. It should be easier for you to understand when you look at the code.

// n indicates the number of data in array A from subscript 1 to position n. public static void sort(int[] a, int n) { buildHeap(a, n); int k = n; while (k > 1) { swap(a, 1, k); --k; heapify(a, k, 1); }}Copy the code

Now, let’s analyze the time complexity, space complexity, and stability of heap sort.

The whole process of heap sort requires very little temporary storage, so heap sort is an in-place sort algorithm. Heap sort consists of two operations: heap build and sort. The time complexity of heap build is O(n), and the time complexity of sort is O(nlogn). Therefore, the overall time complexity of heap sort is O(nlogn).

Heap sort is not a stable sorting algorithm because it is possible to change the original relative order of data with the same value by swapping the last node of the heap with the top node during sorting.

That’s all for today. I’m going to explain a little bit here, but in the previous lecture and in the code, I’m going to assume that the data in the heap starts at index 1. So if you start at zero, there’s really no change in the way you’re going to do it, and the only thing that’s going to change, maybe, is when you implement this code, the formula for calculating the subscripts of the children and the parents has changed.

If the subscript of the node is I, the subscript of the left child node is 2∗ I +1, the subscript of the right child node is 2∗ I +2, and the subscript of the parent node is (I −1)/2.

To solve the opening

Now let’s look at the opening question, why is quicksort better than heap sort in real development?

I think there are two main reasons.

First, heapsort data access is not as friendly as quicksort.

For quicksort, data is accessed sequentially. For heap sort, the data is accessed in a hop. For example, one of the most important operations in heap sort is the heapification of data. For example, in this example, heap-top nodes access the elements of the array with subscripts 1,2,4,8 in sequence, rather than in partial order like quicksort, so this is not cpu-cache-friendly.

Second, the heapsort algorithm swaps more data than quicksort for the same data.

When we talked about sorting, we talked about two things, order and reverse order. For comparison-based sorting algorithms, the whole sorting process consists of two basic operations, compare and swap (or move). Quicksort data is exchanged no more than inversions.

However, the first step of heap sorting is to build a heap. The process of heap building will disrupt the original relative order of data, leading to the reduction of the order of the original data. For example, for an already ordered set of data, after the heap is built, the data becomes even more unordered.

On the second point, you can do an experiment for yourself. So we have a variable that’s going to be the number of swaps, and in our code, every time we swap, we’re going to add one to that variable, and when we’re done sorting, the value of that variable is going to be the total number of swaps. And just so you understand intuitively what I just said, heap sort is swapped more than quicksort.

Contents summary

Today we talked about the data structure heap. A heap is a complete binary tree. Its most important feature is that the value of each node is greater than or equal to (or less than or equal to) the value of its children. Therefore, heaps are divided into two categories, big top heaps and small top heaps.

The two most important operations in the heap are the insertion of a data and the deletion of a heap top element. Both of these operations use heapification. When we insert data, we put the newly inserted data at the end of the array and heap it from the bottom up; To delete the top of the heap, we place the last element of the array on the top of the heap and heap it from the top down. Both operations are O(logn) in time.

In addition to that, we talked about a classic use of heaps, heap sorting. Heap sort consists of two processes, build heap and sort. We heap nodes with subscripts from n/2 to 1 from top to bottom, and then we can organize the data in the array into heaps. Next, we iteratively place the top element to the end of the heap, subtracting the size of the heap by one, and then heap again, repeating the process until there is only one element left in the heap and the entire array is sorted.

Welcome to join the code farming architecture technology sharing family ~