Key warning key warning ⚠️ sort I think at least as a programmer, is not unfamiliar. And sorting is also a must in daily development. Most of us call the language’s Sort API directly, perhaps if we’re not familiar with the logic behind it, or the whole sorting algorithm. After reading this article, you will certainly have your own understanding in mind.

Common sorting algorithms

Here introduces an article, interested can fine taste.

Ten Classic Sorting algorithms (GIF demo)

Look at the picture 👇, a simple summary of the picture.



At first glance, there are too many sorting algorithms, how to remember how to learn? How many algorithms does one sort use?

In fact, every algorithm has some of its own characteristics, has its logic to deal with data.



Only from the shallow to the deep, or the first simple common understanding will be read, and then to think about some complex.

Here I personally subjective feeling can be divided into a priority learning order.

  1. Bubble sort
  2. Insertion sort
  3. Selection sort
  4. Merge sort
  5. Quick sort
  6. Bucket sort
  7. Count sorting
  8. Radix sort
  9. Heap sort
  10. Hill sorting

Recognize and analyze a sorting algorithm

There are many kinds of sorting algorithms, and we can look at them from multiple angles, to find their commonalities or characteristics. In this way, the memory will be more profound in the follow-up study. There are four ways to think about a sorting algorithm.

  • The efficiency of this algorithm if
  • What is the memory consumption of this algorithm
  • The stability of this algorithm
  • The approximate execution logic of this algorithm

Learning an algorithm with four questions will lead to a deeper understanding of it. Let’s explain each of the four directions.

1. Execution efficiency

Time complexity analysis (best, worst, average)

It is relatively objective to consider the execution efficiency of an algorithm from these three situations. In fact, in most cases, the most direct way to distinguish an algorithm is to use its average time complexity to “simply” judge its efficiency. But the data that needs to be sorted is multiple possibilities, nearly ordered, completely unordered. These differences make a big difference to the efficiency of the algorithm. So when considering an algorithm, the key is to know its average time complexity, and secondly to know and care about its best and worst case.

Time complexity coefficient, constant, low order

In the previous study, we came to the conclusion that coefficients, constants and low orders could be ignored when analyzing time complexity, because what was analyzed was the influence of the increasing trend of data size N on execution efficiency. And the simple way to think about it is, you could say this is a big n case. But in actual work or software applications, there are many, small number of cases where we directly sort data. Handling hundreds of thousands of cases is often sent. So we have to think about these coefficients and constants, low order. We don’t have to be strict, but it’s best to consider them.

Number of comparisons and moves

Sorting algorithms must be based on comparison, exchange, or movement, and only by making these logical judgments or data manipulations can we construct an ordered list. Different algorithms, their operation comparison times are also different, this can be included in our understanding of the category.

2. Memory consumption

That’s the space complexity of this algorithm. Let’s see how much extra space we have to request in order to sort our data, in addition to the space we already have. In sorting algorithms, you will often hear a “sort in place”, as the name implies, it needs to apply a constant space, space complexity is O(1). Bubbling, insertion and selection sort are in place sorts.

3. Stability

This is a critical point, especially in business development, that must be considered. How do we understand stability? The stability of sorting means that if there are data with the same comparison value before sorting, their relative positions will not be changed after sorting.

[{age:10.name:'aa'},
 {age:13.name:'dd'},
 {age:10.name:'bb'},
 {age:15.name:'cc'},
...]
 
 [{age:10.name:'aa'},
 {age:10.name:'bb'},
 {age:13.name:'dd'},
 {age:15.name:'cc'},
...]
Copy the code

In the example above we sort the array by age, and after sorting we want the relative position of the data of the same age to remain the same. So that we can achieve the desired effect in dealing with business or logic.

4. Implement logic

The implementation logic of all kinds of algorithms is also what we need to learn to understand, even if we can not write directly restore, the principle of logic always have to understand it, or are embarrassed to say you have seen the sorting. The key logic of how sorting algorithms compare and swap is still essential.


In a hurry? Let’s look at the conclusion.

In a hurry? And if you want to get the most intuitive sense of the sorting algorithm, fast forward and look at the statistics.




Bubble sort

Logical analysis

The data that needs to be sorted, compare the size in pairs, switch positions, the big ones will be moved to the rear. After the first bubble, the largest element is placed at the end. Do this n times, and all the data is sorted. [a,b,c,d,e]

  • A and B compare their sizes and, assuming a is greater than B, switch places. [B, A, C, D, E]
  • A and C compare the size, assuming a> C, then switch positions. [B, C, A, D,e]
  • .
  • Assuming that a is the largest element in the array,a will be at the end of the first round [b, C, D,e,a].
  • Do that four more times, and the array is sorted

Code implementation

function sort (arr) {
  for (let i = 0; i < arr.length - 1; i++) {
    let s = false
    for (let k = 0; k < arr.length - i; k++) {
      if (k + 1 < arr.length - i && arr[k] > arr[k + 1]) {
        [arr[k], arr[k + 1]] = [arr[k + 1], arr[k]]
        s = true}}if(! s)break}}Copy the code

In this code, we’re actually going to put the largest number in the last square every time, so every time we go through it, we’re actually going to have one less number that’s already at the bottom. And if the swap position is not performed during a bubbling comparison. Prove that the current sort is already in order. You can just exit the loop and return the result.

Time complexity, stability, memory consumption

  • Time complexity

    Best case scenarioWe bubble the first time if there is no position swap. Prove that it’s already an ordered array.Time complexity O(n).

    Worst-case scenario, the array is completely backwards, we have to do n bubbling, the number of elements swapped is 1 to n-1 (the first swap is n-1, the second swap is n-2),(n - 1 + 1) * (n - 1) / 2. Simplified downTime complexity is O(n).

    On average, assuming that some numbers are already in order, there are half as many swaps as in the worst case. Is at least
    n*(n-1)/4.Time complexity is O(n).
  • The stability of

    If we encounter the same number is ok, choose not to exchange, do not destroy the original order of the same data. So you can do stable sorting.
  • Memory consumption

    In the js code, we can see that only one variable is applied to indicate whether the bubble has a swap. The swap itself is done on the original array, which is sort in place. The space complexity is O(1).

Insertion sort

Logical analysis

The implementation logic of insert sort is pretty much the same as its name, inserting the currently traversed data into the appropriate place. The insertion sort process can be understood as splitting the array into two paragraphs. One is the sorted interval, and the second is the unsorted interval. Insert an element into the sorted interval each time you iterate over it. [A,b, C, D,e] we still use this sequence

  • So step one, we’re going to iterate over the first element, a, at this point[a]It’s a sorted interval, it has only one element,[b,c,d,e]These are all unsorted intervals.
  • The second step is to traverse b, and if B is less than a, insert B into a before the sorted interval becomes[b, a].
  • Step 3, iterate to C, if C >b,c<a, then C inserts into the sorted interval becomes[b,c,a]
  • .
  • When all elements are iterated, the sorted interval is the result returned.

In the case of arrays, we know that inserting and deleting data involves moving data. Of course, in JS language, we do not have to deal with the real memory of data relocation, but in this sort, we can handle the relocation of elements in the array, so that we can have a deeper understanding of the logical process of insertion sort.

Code implementation

function insert (arr) {
  for (let i = 1; i < arr.length; i++) {
    // If less than the previous element, the proof needs to be inserted
    if (arr[i] < arr[i - 1]) {
      // The element to be inserted has been sorted and inserted
      let theI = arr[i]
      for (let k = i - 1; k >= 0; k--) {
        if (theI >= arr[k]) {
          // The element to be inserted is larger than the current element
          arr[k + 1] = theI
          break
        } else {
          // Data move, the current element is larger than the element to be inserted, move one bit later to make space
          arr[k + 1] = arr[k]
          if (k === 0) arr[k] = theI
        }
      }
    }
  }
}
Copy the code

Time complexity, stability, memory consumption

  • Time complexity

    In the best case, the array is ordered and no insertions are required to traverse,Time is order n..

    In the worst case, the array is completely backwards, with insertions and data moves starting from the second element. Again, the number of moves is once for the second element and twice for the third element. The last element is moved n-1 times. Sum upn*(n-1)/2.Time complexity is O(n).

    On average, similarly, it feels like you’re moving at least half of your data, so let’s insert.The time is also O(n).
  • Stability In the insertion operation, we only move data when the element is smaller than an element in the sorted interval. In the case of greater or equal data, we directly insert the data behind it, so as to maintain the relative order of equal data. So it’s stable sort.
  • In the same memory consumption, we processed the data move, but only applied a comparison variable, which belongs to the in-place sort. Its space complexity is O(1).

Hill sorting

Hill sort is actually another operation of insert sort, which is more efficient than insert, but also introduces instability. So let’s get to know it.

Logical analysis

Hill sort, in fact, is a “step” to partition the array, the partition of the subset of data for insertion sort. After the first run, the “step size” is reduced by the formula, and the array is redivided and the subset is inserted into order again. Over and over again. Until the “step size” is 1, the array is not divided, the last insertion sort, the final result. Why go round and round and partition the array to do this? The purpose is to make the array more orderly, so that the efficiency of the last insertion sort can be greatly improved. How to understand this “step size”, the key is the change of this step size, directly affects the efficiency of sorting. **[a,b,c,d,e]** Assuming that the initial step size is 2, the original data will be divided in such a way that every subscript increased by 2 will be divided into a subset. [a,c,e] and [b,d] with subscripts 0,2,4 and 1,3 are divided into two subsets respectively. Insert sort these two subsets separately. And then change the step size to 1. So that subset is all the elements, and then you insert sort all the elements of the last step, and you get the result. Yes, it is not easy to understand, look at different articles and code, so that can be transformed into their own thinking. Go to Baidu, to dig gold search.

Code implementation

function shell (arr) {
  let step = Math.floor(arr.length / 2)
  while (step > 0) {
    for (let i = 0; i < arr.length; i++) {
      // If the length of the element is smaller than that of the previous step, the proof needs to be inserted
      if (i - step >= 0 && arr[i] < arr[i - step]) {
        // Element to be inserted
        let theI = arr[i]
        for (let k = i - step; k >= 0; k -= step) {
          / / insert
          if (theI >= arr[k]) {
            arr[k + step] = theI
            break
          } else {
            // Data move, the current element is larger than the element to be inserted, move back one step to make space
            arr[k + step] = arr[k]
            if (k - step <= 0) arr[k] = theI
          }
        }
      }
    }
    // Step size decreases
    step = Math.floor(step / 2)}}Copy the code

Time complexity, stability, memory consumption

  • At best, the time complexity is the same as insertion sort, ordered array, hill just a few more steps of the loop. The time is also O(n). In the worst case, the array is in reverse order, the number of moves and inserts is the same, and the time is O(n). In the average case to be honest, this case is extremely difficult to analyze. Hill sort, you might want to look at the encyclopedia. Its time complexity actually depends on the logic of its step size change, and the answer to the conclusion is O(n).
  • Stability By the time it divides the array into different subsets by step size, it’s already unstable. The insertion sort of a subset only ensures that the subset is stable at the moment, but as the step size changes. Subsets intersect with subsets, so stable sorting is not guaranteed.
  • Memory consumption

    If you look at the code, sort in place, only one variable applied for the step size. So the space is O(1).

Finally, let’s look at the encyclopedia to deepen the impression.

Pros and cons

Doesn’t require a lot of auxiliary space and is just as easy to implement as merge sort. Hill sort is an algorithm based on insertion sort, which adds a new feature to improve efficiency. The time complexity of Hill sort time is O(n), and the lower bound of Hill sort time complexity is n*log2n. Hill sorts are not as fast O(n(logn)) as quicksort algorithms, so medium sizes perform well and are not optimal for sorting very large data. But it’s much faster than the O(n) algorithm. And hill sort is very easy to implement, the algorithm code is short and simple. In addition, hill’s worst-case performance is not much different from average performance, while quicksort’s worst-case performance is very poor. Experts advocate starting almost any sort work with Hill sort and moving to a more advanced sort algorithm like quicksort if it proves to be not fast enough in practice. In essence, Hill sort algorithm is an improvement of direct insert sort algorithm, which reduces the number of copies and is much faster. And the reason for that is, when n is large, there are very few moves per sort, but the distance between the items is very long. When the value of n decreases, the number of data to be moved in each trip increases, and it is close to its final position after sorting. It’s the combination of these two things that makes Hill sort much more efficient than insertion sort. The performance of Shell algorithm is closely related to the selected block length sequence. The comparison times of keywords and movement times of objects can be accurately estimated only for a particular record sequence to be sorted. It is still a mathematical problem to find out the relationship between keyword comparison times and record movement times and incremental selection, and to give a complete mathematical analysis.


Selection sort

The logic of selection sort is similar to that of insert sort. Insert after selection. Insert sort is, insert logically when you insert. But it rarely appears in our view. Why? It’s actually unstable sort, and in the same time complexity as bubble sort or insert sort, it loses its advantage. (And its time efficiency does not improve in a relatively ordered array)

Logical analysis

In the sorting process, the original array is divided into two ranges, sorted and unsorted. Each time the smallest element in the unsorted interval is selected, the end of the sorted interval is inserted. Thieves are easy to understand.

Code implementation

// Select sort
function select (arr) {
  let tail = 0
  while (tail <= arr.length - 1) {
    let min = tail
    for (let i = tail; i < arr.length; i++) {
      min = arr[min] < arr[i] ? min : i
    }
    [arr[tail], arr[min]] = [arr[min], arr[tail]]
    tail++
  }
}
Copy the code

Time complexity, stability, memory consumption

  • Time complexity due to its logic is to compare the unsorted interval, get the smallest element for insertion. So even if the array itself is ordered, it will run through all the logic. So it’s order n in all three cases.
  • Stability It is unstable because after selecting elements, it will swap positions with the sorted nodes, which may cause the positions of equal elements to change.
  • According to the memory consumption code, there is no external application for space equivalent to data, so the space complexity is O(1).

Merge sort

Merge sort is an efficient and stable sorting algorithm based on merge operation. Is a typical use of divide-and-conquer.

Logical analysis

The main logic is in understanding merge operations. You can easily divide it into three points.

  • Split the array you want to sort in half.
  • Continue to merge the two partitioned sets of numbers, and after the processing is complete, both sets of numbers are in order.
  • Merge the two already ordered arrays.

The hard part is in the middle, how do you make a split array sorted? So we’re just going to have to keep splitting until we get to the point where there’s at most one element on the left and right. You can determine the order. And then two by two, the first time you combine, you get an ordered arrangement of two elements. The second merge gives you an ordered arrangement of four elements. Until you merge all the elements, merge sort is done.

Code implementation

Let’s use pseudo code to understand, how to restore the above mentioned logic?

// merge sort
// Pass in the array, start index, end index
function sort(arr,s,e){
  // Array split, take the middle
  let mid = Math.floor((s + e) / 2)
  // Split left and right, get left and right sorted array
  let left = sort()
  let right = sort()
  // Merge left and right arrays return
  return merge(left,right)
}
function merge(l,r){
  // Merge two ordered arrays
}
Copy the code

So let’s first understand the pseudocode, so that the next time we use merge algorithms, we can remember the logical framework of the pseudocode, and then we can just refine the boundary cases. Take a look at the full code at 👇. Read know to pay attention to perfect point pull.

function sort (arr, s, e) {
  // The boundary condition
  if (s > e) return []
  if (s === e) return [arr[s]]
  let mid = Math.floor((s + e) / 2)
  let left = sort(arr, s, mid)
  let right = sort(arr, mid + 1, e)
  return merge(left, right)
}
function merge (l, r) {
  // Merge two ordered arrays
  let arr = []
  while(l.length ! = =0&& r.length ! = =0) {
    if (l[0] <= r[0]) arr.push(l.shift())
    else arr.push(r.shift())
  }
  if (l.length === 0 || r.length === 0) arr = [...arr, ...l.length === 0 ? r : l]
  return arr
}
Copy the code

Time complexity, stability, memory consumption

  • Time complexity

    Assume that the time required to merge n array lengths isT(n), the time of dividing an element isC

    Then T(n) is the split array operation. Assume that the merge operation consists of the split operation (the split operation consists of adding the left and right array times) :T(n) = 2 times T(n/2) + n; n>1


T(n) = 2*T(n/2) + n
     = 2* (2*T(n/4) + n/2) + n = 4*T(n/4) + 2*n
     = 4* (2*T(n/8) + n/4) + 2*n = 8*T(n/8) + 3*n
     = 8* (2*T(n/16) + n/8) + 3*n = 16*T(n/16) + 4*n
     ......
     = 2^k * T(n/2^k) + k * n
     ......
Copy the code

When T(n/2^k) is equal to 1, we don’t divide anymore. When k = log2n. Plug k into the formula above. T(n)=Cn+nlog2n so its time is O(nlogn) and it operates at the best and worst times the logic is the same, there’s no missing space.

  • Stability When merging left and right arrays, if we encounter the same elements, we make sure that the left array is inserted first, so it’s stable sort.
  • Memory consumption obviously, when we look at the code, when we merge the left and right arrays, we apply an extra array to hold the merge result, so it’s a lot of memory consumption. If we were working with linked lists, this memory consumption could be avoided by merging the two lists in place. So, O (n)

Quick sort

Quicksort, we simply call it quicksort. It is similar to merge sort in that it uses the idea of divide and conquer and recursion to solve the problem of sorting all day.

Logical analysis

How do you do that? It compares the elements of the sequence to the differentiator by finding a differentiator, placing the smaller ones to the left and the larger ones to the right. And then for the left and the right, again with their new points of differentiation. Until there’s only one element left in both the left and right subsets. Everything is ordered. The key lies in the choice of distinguishing points when comparing the exchange operations. It is a kind of algorithm that can realize in situ sort. Sorting efficiency is also affected by the choice of discriminant points, if the discriminant points can divide the current sequence into left and right subsets of elements of similar length. It’s the most efficient. To make it simple, you can take any element between them. Alternatively, you can use the middle element of three or four elements as a point of distinction. The implementation of quicksort is divided into two big points.

  • Find the differentiation point, pivot, between the two subscripts
  • According to this distinction point, the elements between two subscripts are divided into left and right subsets

This is repeated until there is only one element left in the left and right subset.

Code implementation

Let’s start by writing pseudocode and looking for a framework

// Pass in the array, start index, end index
function sort(arr,s,e){
	if(s>=e)return 
  // Find the segmentation point and split the array, return the final index of the segmentation point
  let p = partition(arr,s,e)
  sort(arr,s,p-1)
  sort(arr,p+1,e)
}
function partition(arr,s,e){
  // To find the point of differentiation, we can, first of all, pick an arbitrary one
  let pivot = arr[e]
  // Then comes the key place swap, how to place the elements between s and e on the left and right sides
}
Copy the code

We already have the idea of pseudo-code, and the next step is to supplement the entire code and improve the boundary case.

// Pass in the array, start index, end index
function sort (arr, s, e) {
  if (s >= e) return
  // Find the segmentation point and split the array, return the final index of the segmentation point
  let p = partition(arr, s, e)
  sort(arr, s, p - 1)
  sort(arr, p + 1, e)
}
function partition (arr, s, e) {
  // We can take an arbitrary (last element)
  let pivot = arr[e]
  // Then comes the key place swap, how to place the elements between s and e on the left and right sides
  // define j as the starting subscript of the left less than distinguishable point
  let j = s
  for (let i = s; i < e; i++) {
    if (arr[i] < pivot) {
      [arr[i], arr[j]] = [arr[j], arr[i]]
      j++
    }
  }
  // Distinguish the point element from the final j permutation
  [arr[j], arr[e]] = [arr[e], arr[j]]
  return j
}
Copy the code

Time complexity, stability, memory consumption

  • Time complexity is also solved by recursion, time complexity can also refer to the recursive formula. It’s actually the same thing as merge sort. O(nlogn) in the case that the distinction points divide the sequence into subsets of equal length. Of course, in the worst case, for example, when our partition point happens to be the maximum value of the current sequence, we split two parts of the interval every time, and the other part is an empty array, so we need to partition approximately n times (the first scan of N-1 elements, the second scan of n-2,… Last scan of 1 element). So the average number of scanned elements is n over 2. The time is reduced to O(n2).
  • Stability during operation, our switching operations may affect relative stability, such as [1,4,3,6,4,2]. When we choose 3 as the distinguishing point, the two elements 4 and 2 will exchange, which affects the relative stability.
  • Memory consumption

In terms of spatial performance, although quicksort requires only the auxiliary space of one element, quicksort requires a stack space to implement recursion. In the best case, each order of quicksort splits the sequence of elements evenly into two subtables of similar length, requiring a maximum stack depth of log(n+1); But in the worst case, the maximum stack depth is N. Thus, the space complexity of quicksort is O(logn).


Heap sort

What of?

Before we understand heap sorting, let’s look at what a heap is. A heap can be understood as a special tree structure. A tree satisfying the following two points can be called a heap.

  1. It’s a complete binary tree.
  2. The value of each node in the heap needs to be greater than (or less than or equal to) the value of its children.

What is a complete binary tree?

All the nodes are full except for the last layer. And at the last level, the nodes are arranged from left to right. The 👇 graph is a complete binary tree.

The red number indicates the order of the heap nodes.



So we know the heap, what if we store the heap?

It’s a perfect binary tree, and it’s better to store it in arrays, so we’re going to store it in arrays in our heap.

,10,9,8,7,6,5,4,3 [0]

Why is the first element of the array 0? It’s actually in order to match the order of the nodes in our heap. The first node, 10, is the element of the array with subscript 1. The second node, 9, is the element of the array at subscript 2.



And through binary trees and arrays we can get the following conclusions:

The serial number of the parent node is I

The serial number of the child node is 2* I and 2* I +1⚠️⚠️


And this is particularly critical, because it’s the key to how we build the heap and sort the heap.

How to implement a heap?

With the data from the original array, we can build a heap.

  • There must be some insertion of the heap
  • Heap top delete operation

1, heap insert operation

During each element insertion, we must maintain two properties of the heap, complete binary trees, where nodes are greater than or equal to (or less than or equal to) two child nodes. So every time we insert a node element, we need to determine the size relationship between it and the parent node based on its subscript, if it fits, we don’t need to deal with it. If not, you need to switch places. Suppose we want to maintain a big top heap.

Every time we insert a piece of data, we have to maintain the relationship between the heap, which is just constantly determining how big it is relative to its parent. The operation of 👇 code is what we call bottom-up maintenance (judging its relationship to the parent node).

class Heap {
  constructor(num) {
    // Maximum number of elements in the heap
    this.max = num
    // Array stores heap data
    this.arr = [0]
    this.count = 0
  }
  insert (el) {
    if (this.max <= this.count) return this.arr
    this.count++
    this.arr[this.count] = el
    let i = this.count
    while (true) {
      // Determine the size relationship between the current node and the parent node
      if (i / 2> =1) {
        if (i % 2= = =0 && this.arr[i] > this.arr[i / 2{[])this.arr[i], this.arr[i / 2]] = [this.arr[i / 2].this.arr[i]]
          i = i / 2
        } else if (i % 2= = =1 && this.arr[i] > this.arr[(i - 1) / 2{[])this.arr[i], this.arr[(i - 1) / 2]] = [this.arr[(i - 1) / 2].this.arr[i]]
          i = (i - 1) / 2
        } else break
      } else break
    }
    return this.arr
  }
}
Copy the code

2,Heap top delete operation

The top of the heap must be the largest or smallest. Let’s say we’re maintaining a big top heap. The top element of the heap must be the largest. When we remove the heaptop element, we need to move the second largest element to the top of the heap, and the second largest element must be an element of the child node. And so on. From top to bottom, we can maintain a heap again. A trick we can use here is that when deleting the top of the heap, we can take the last element of the heap and insert it into the top of the heap, and then start maintaining parent-child relationships. This will not create a hole in the heap. 👇 can take a look at the code for what we call top-down maintenance (to determine its relationship to child nodes).

function removeTop (arr) {
  if(arr.length === 2)return [0]
  // Delete the heap top, replace it with tail data, start maintenance from the heap top
  arr[1] = arr.pop()
  heapify(arr)
  console.log(arr)
}
function heapify (arr) {
  let len = arr.length - 1
  let i = 1
  while (true) {
    let change = i
    if (i * 2 <= len && arr[i] < arr[i * 2]) change = i * 2
    if (i * 2 + 1 <= len && arr[change] < arr[i * 2 + 1]) change = i * 2 + 1
    // The parent node satisfies the relationship between the two children
    if (change === i) break
    [arr[i], arr[change]] = [arr[change], arr[i]]
    i = change
  }
}
Copy the code

In fact, these two operations are a heap, maintenance heap process, but the order is different.

3. Build the heap

Think of it as a pyramid, starting at the top of the pyramid and maintaining layers one by one, making sure the layers are enough for the heap. Or maintain layer by layer from the bottom up to ensure that the maintained layer meets the requirements of the heap)

Or start at the top of the heap and work backwards, maintaining the build up for each node.

Either start with the last parent node and traverse forward, establishing maintenance top-down for each node.



To get an array from raw data, we can heap it in place, and here’s a trick.



As shown, assume that 8 nodes are taken from the original data. The only nodes we’re going to heap are the ones in red. For example, we heap from the bottom up and maintain each node from the top down (judging its relationship to its children). For this operation, the leaf node does not need to be judged because it has no children.

So we can start withN / 2 or (n - 1) / 2Start walking forward one by one to heap.

See 👇 code ~ ~

function build (arr) {
  let len = arr.length - 1
  let start = len % 2= = =0 ? len / 2 : (len - 1) / 2
  for (let i = start; i >= 1; i--) {
    heapify(arr, i, len)
  }
}
function heapify (arr, index, end) {
  // Heap, top down
  let len = end
  let i = index
  while (true) {
    let change = i
    if (i * 2 <= len && arr[i] < arr[i * 2]) change = i * 2
    if (i * 2 + 1 <= len && arr[change] < arr[i * 2 + 1]) change = i * 2 + 1
    // The parent node satisfies the relationship between the two children
    if (change === i) break
    [arr[i], arr[change]] = [arr[change], arr[i]]
    i = change
  }
}
Copy the code

Ok, so this gives you a general idea of how to build and maintain a heap. How do we sort with the heap?

The sorting

In fact, to maintain a heap, one thing we can definitely ensure is that the top element of the heap is maximum or minimum. Using this point, we can swap the top element with the tail element over and over again so that the subscript of the maximum value is n, reheap, swap the top element with the n-1 element again, and over and over again, from the last element to the first element. So you get an increased permutation. With the logic of the heap top delete operation, let’s refine the whole sorting code.

// Delete the top element of the heap
const heap = [0.6.4.2.5.3.1.15.8]
function removeTop (arr, end) {
  // Delete the heap top, replace it with tail data, start maintenance from the heap top
  [arr[1], arr[end]] = [arr[end], arr[1]]
  heapify(arr, 1, end - 1)}function heapify (arr, index, end) {
  // Heap, top down
  let len = end
  let i = index
  while (true) {
    let change = i
    if (i * 2 <= len && arr[i] < arr[i * 2]) change = i * 2
    if (i * 2 + 1 <= len && arr[change] < arr[i * 2 + 1]) change = i * 2 + 1
    // The parent node satisfies the relationship between the two children
    if (change === i) break
    [arr[i], arr[change]] = [arr[change], arr[i]]
    i = change
  }
}
/ / heap sort
function heapSort (arr) {
  // Build the heap, starting with the last parent node and working up and down, layer by layer
  let len = arr.length - 1
  let start = len % 2= = =0 ? len / 2 : (len - 1) / 2
  for (let i = start; i >= 1; i--) {
    heapify(arr, i, len)
  }
  // Replace the top element of the heap and heap again
  for (let i = arr.length - 1; i >= 1; i--) {
    removeTop(arr, i)
  }
  return arr
}
heapSort(heap)
Copy the code

Analysis of the analysis

Time complexity

The process of heap sort can be divided into two steps, both of which include the process of node heap. We mainly consider the number of heaps per process node. I’ll give you a sense of time.

  • Building the heap
  • The sorting

The heapification time per node is approximately O(logn). It takes n over 2 minus 1 nodes to build the heap, so the heapification time should be the product of the two. But in fact, the heaptime of each node is proportional to the height of the node. It’s a little bit too much, so I’m going to skip it. It turns out that the heap building time is O(n) sorting time, so let’s see. Sorting is required to traverse the entire array, using the delete heap top operation. The heaping-time complexity at the top of the heap is naturally O(logn). So sort of time complexity is O (n * logn) heap sort of time complexity is O (nlogn) + (n) ~ ~ ~ ~ ~ ~ ~ ~ O (nlogn)

Memory consumption

As can be seen from the code, the operation of the entire day is basically in place of the array operation. So the space is order one.

The stability of

Relative order has been broken in heapification and sorting of heaps. So it’s unstable.


Bucket sort

The significance of bucket sorting is that, under appropriate circumstances, the time complexity of sorting can be optimized to O(n). How do you say?

Logical analysis

The main idea is to divide data into several “ordered” buckets. There is a natural sequential relationship between buckets. Only when the data is in the bucket, can we order the data in the bucket by fast sorting, merging, heap sorting, etc. Then we pull the data one by one, forming the finished ordered data. How does it optimize the time complexity to O(n)? Let’s say I have n pieces of data and I have m buckets. In good times, the data is evenly distributed to each bucket, with k=n/m elements per bucket. The sorting time complexity of each bucket is O(k*logk), and the overall time complexity of m buckets is O(mklogk)~~~~~O(n*logk). When the number of buckets is very close to the total number of buckets n, k gets smaller and smaller, approaching 1, and logk is a very small constant. We can view it as optimizing to order n. Of course, its performance is good, but its premise is also very important, affecting its efficiency all day.

  • The ability to divide data naturally into several ordered buckets.
  • The more evenly the data is distributed in the bucket, the better.

There is another application scenario for bucket sort, external sort. For example, a large amount of data is stored on an external disk and the memory is limited. Therefore, data cannot be loaded into the memory for sorting.


Count sorting

It’s kind of like sorting buckets, and buckets can be divided more clearly. For example, college entrance examination score ranking. The score must be distributed between 0 and 750 (partial regions of the score). In this way, we can divide them into 751 buckets (a two-dimensional array with a length of 751), directly iterate over the scores of the examinees and insert them into the corresponding buckets, and finally expand all the buckets to get the sorting result. And of course there are a lot of counting sorting tricks. When I understand more of a solution, I understand better. Of course, there are some limitations

  • The number of split buckets should be a visible constant, certainly not larger than the total amount of data, so it is meaningless.
  • The sorted elements are properly inserted into the index specified in the array, such as the fraction, which is inserted into the array with the index 1.

Radix sort

Let’s look at radix sort based on a question. Let’s say we have 10,000 phone numbers, and we need to sort them by size. Quicksort and merge sort go straight up to order nlogn. How about bucket sort counting sort? The 11-digit cell phone number is a large number, and buckets and counts are hard to divide. So the other way to think about it is sort by radix. I can optimize to order n. The phone number can be divided into 11 digits. We can sort by bit, for example, the second digit of a’s cell phone number is 3, and the second digit of B’s cell phone number is 8. We don’t have to worry about the numbers that follow. Overall sorting logic, we can use the stable sorting algorithm, sorting each digit of the machine, so that after 11 sorting to get all the ordered results. For example, if we sort first, we can use bucket sort, 10 buckets, 0 to 9. All mobile numbers are merged by the first digit only. 11 times by analogy. The bucket sorting case number is O(n), sorting 11 times, time complexity O(11*n). So the results are good. Notice three things here

  • Data is suitable for sorting by bit, which has a natural progressive relationship
  • When sorting per bit, bucket sort or count sort is appropriate, so that the time of per bit sort goes to O(n).
  • Sort by bits, the number of bits is constant

Next party ~ ~ ~!

Next article will summarize some of the leetcode encountered real problems, into the actual combat operation of sorting!!