preface

As a non-computer professional background of the dregs small front end, algorithm is a front-end skills on the way to a round the mountain. I tried to crack leetcode, but found that it took hours to solve a problem. Later, I searched some articles related to algorithm learning methods. Many people mentioned that algorithms need to be studied systematically before they are done. And I usually use JavaScript, so I choose “Data Structure and Algorithm JavaScript Description” as an introduction to the book.

After reading this book, I had the same feeling as the author of Sorting Algorithms for JJS: there are many small mistakes in the book, not only in the text description, but also in the code. But I have to admit that this book is very suitable for front-end developers to learn how to get started with algorithms, because it is concise and basic, it explains the principles of each algorithm in a concise way, and does not involve optimization, and some of the difficulties in algorithm investigation. So, if you need to learn more advanced, or need to eat with other books.

In this article, when I learn the notes in the process of algorithm, I will start from the basic category of algorithm: sorting algorithm. If there is any problem, please point out that I will correct it as soon as possible to avoid misleading others.

First, build a simple performance test platform:

// Use a set of functions to hold all sorting algorithms used for testing
let funcs = {
  // Tools: Swap array elements
  toolExch (a, lo, hi) { let temp = a[lo]; a[lo] = a[hi]; a[hi] = temp }
}
// Generate an array with length 10000 and values between 0 and 99 for testing
// Note: The performance of the sorting algorithm is often dependent on the properties of the sorted array
// For example, the number of duplicate data, the distribution of data size, the overall variance of data, etc
// The main direction of this article is to explain the principle of various sorting algorithms
// So generate a set of random numbers directly as test data
let arr = Array.from({ length: 10000 }, v => Math.floor(Math.random() * 100))
// Execute all functions in the collection
for (let key in funcs) {
  // The function marked with tool is considered a tool function and skipped
  if(! key.indexOf('tool')) { continue }
  let temp = Array.from(arr)
  // Use the console functions time and timeEnd to output the code execution time
  console.time(key)
  funcs[key](temp)
  console.timeEnd(key)
}
Copy the code

Bubble sort

Bubble sort is one of the slowest sorting algorithms because it swaps elements so many times, but it is also the easiest to implement. During operation, a data value bubbles from one end to the other, for example in ascending order, the data is compared to its neighbor on the right, and if it is larger than the neighbor on the right, it bubbles to the right until it encounters something larger than it.

Dynamic graph demonstration

Bubble Sort GIF demo algorithm visualization source: Visualgo.net/

Code implementation

let funcs = {
  // Bubble sort
  bubbleSort (arr) {
    // Use a two-level loop to perform sorting
    // Each time the inner loop is executed, the pointer I to the outer loop moves forward, indicating that the previous data is confirmed to have been sorted
    for (let i = 0; i < arr.length - 1; i++) {
      // The inner loop ensures that the smallest data is moved to the left of the array each time
      for (let j = arr.length - 1; j > i; j--) {
        // If the current data value is smaller than the previous one, the two data positions are switched
        // If not, proceed to the next digit
        if (arr[j] < arr[j - 1]) {
          this.toolExch(arr, j - 1, j)
        }
      }
    }
    return arr
  }
}
Copy the code

Selection sort

The principle of selection sort, take ascending sort as an example, is to start from the beginning of the array, compare the first data with other data, take the smallest data, exchange with the data of the first position, and then compare the data behind with the second data…… And so on, when the comparison is done on the penultimate second of the array, the whole sort is done.

Like bubble sort, selection sort uses a two-layer loop, but it is much faster than bubble sort because it swaps data positions only once per traverse.

Dynamic graph demonstration

Selection Sort GIF demo algorithm visualization Source: Visualgo.net/

Code implementation

let funcs = {
  // Select sort
  selectionSort (arr) {
    // The outer loop maintains a pointer I, and each time the inner loop completes a swap, the pointer to the outer loop moves forward
    // End the loop when the pointer moves to the penultimate position arr. length-2
    for (let i = 0; i <= arr.length - 2; i++) {
      // index maintains the position of the minimum value in the current inner loop
      let index = i
      // The inner loop looks for the smallest data from the position of pointer I
      for (let j = i; j < arr.length; j++) {
        // Update index whenever smaller data is found
        if (arr[j] < arr[index]) index = j
      }
      // Swap the smallest data position at index with the current pointer position at I
      this.toolExch(arr, index, i)
    }
    return arr
  }
}
Copy the code

Performance comparison

A rough run of the above code (10,000 pieces of data) shows that selection sort performs much better than bubble sort.

Insertion sort

Insert sort also uses a two-level loop, using ascending sort as an example: the outer loop maintains a pointer I that moves to the right from the second data. The inner loop maintains a pointer J to move to the left from the position of I. If the data to the left of J is larger than j, the data to the left of J will be moved to the right one space until the data to the left of J is smaller than J. The pointer stops moving and inserts the data on I originally used for comparison into this position. This repetition ensures that the data to the left of I is in order at the end of each inner loop, and then the sorting can be completed after the outer loop is executed.

Dynamic graph demonstration

1. Demo algorithm visualization

Code implementation

let funcs = {
  // Insert sort
  insertionSort (arr) {
    // The outer loop moves to the right
    for (let i = 1; i < arr.length; i++) {
      // Declare the inner loop pointer
      let j = i
      // Record the current data for comparison
      let curr = arr[i]
      // Inner loop to keep current data moving to the left
      // Until a value smaller than the current value is encountered or moved to the left of the array
      while (j > 0 && arr[j - 1] > curr) {
        // Push more data to the right
        arr[j] = arr[j - 1]
        // The pointer moves left
        j--
      }
      // Insert the current data into the correct position so that the data between 0 and I is in order
      arr[j] = curr
    }
    return arr
  }
}
Copy the code

Performance comparison

Image from algs4.cs.princeton.edu

According to the visual trajectory diagram comparing insertion sort and selection sort in Algorithm (4th edition), it is found that insertion sort adds much less comparison data than selection sort. Therefore, the performance of insertion sort is better than that of selection sort.

A rough run with the above code (10,000 pieces of data) shows that insertion sort is much faster than selection sort.

Hill sorting

Book JavaScript data structure and algorithm description will hill sorting in to the opening position of the advanced algorithm, in fact, hill sorting is improved on the basis of the insertion sort, it defines a interval sequence, let algorithm is larger interval data, make far from correct position of the element can place faster, thus reducing the more the number of times, Then shrink the interval sequence for comparison until the interval sequence is 1 and the array is ordered.

Robert Sedgewick, co-author of Algorithm (4th Edition), defines the interval sequence in Hill sorting dynamically through a formula, which is used in our code implementation to define the interval sequence. The original book referred to this approach as “neat Hill sorting”, and in fact, the performance of Hill sorting is closely related to the definition of interval sequences.

Compare the data with an interval of 4

Dynamic graph demonstration

Hill sort GIF demo

Code implementation

let funcs = {
  // Hill sort
  shellSort (arr) {
    // Define the interval sequence gap
    let len = arr.length
    let gap = 1
    while (gap < len / 3) {
      gap = gap * 3 + 1
    }
    // Insert sort by interval in interval sequence
    while (gap >= 1) {
      // Perform insert sort
      for (let i = gap; i < len; i++) {
        let j = i
        let curr = arr[i]
        while (j >= gap && arr[j - gap] > curr) {
          arr[j] = arr[j - gap]
          // The number of steps forward is gap, which forms the use of spacing
          j -= gap
        }
        arr[j] = curr
      }
      // Generate the next interval
      gap = (gap - 1) / 3
    }
    return arr
  }
}
Copy the code

Performance comparison

The efficiency of Hill sorting has a great relationship with the selection of interval sequences. Algorithms (4th Edition) described: “The performance of the algorithm depends not only on H (interval), but also on the mathematical properties of H, such as their common factors. There are many papers that look at different increasing sequences, but none of them can prove that a particular sequence is’ best. ‘”

Run the above code on 10,000 pieces of data and find that hill sort is much faster than insert sort at this volume.

Merge sort

Merge sort is a typical application in the design of efficient algorithm partition thought chestnut, it is the basic principle of the array of half-and-half, until the break up for a pair of a single element, and then arrange to order a pair of a single element, and then with a pair of orderly element adjacent to a large, orderly array, until the entire array orderly.

In code, it can be implemented in two ways, using either a recursive, top-down merge sort (see GIF demo: top-down merge sort) or a circular, bottom-up merge sort (see image demo: bottom-up merge sort). Each of them has its own advantages. The recursive method is easier to implement, but it takes up extra memory space. The loop logic is more complex, but occupies less memory and has better performance.

The article “JS’s Sorting algorithm” points out that:

Good news! Good news! ES6 has added support for tail-recursion optimization, so mom doesn’t have to worry about me writing recursions in JavaScript anymore. However, it is important to note that tail recursive optimization in ES6 is only turned on in strict mode.

In fact, on the browser side, none of the major browsers except Safari implement tail-recursive optimization. Tail-recursion optimization is not enabled by default in Node and requires the –harmony_tailcalls parameter to be manually enabled. Moreover, implicit optimization and call stack loss still exist in JS tail recursive optimization. Therefore, the use of recursive merge sort under the JS engine, there are still performance and stability concerns. See “A Follow-up inquiry into tail recursion” for details.

Dynamic graph demonstration

GIF demo: Top-down merge sort algorithm visualization source: Visualgo.net/

Image demonstration: bottom-up merge sort

Code implementation

let funcs = {
  // Top down merge sort
  merge (arr) {
    // The recursive sort method receives the array, the starting and ending positions of the sort
    let sort = (a, lo, hi) = > {
      // If hi <= lo, then the array can no longer be divided in half, that is, the recursive end, then the sorting starts
      if (hi <= lo) return
      // Calculates the middle of the array to sort
      // mid is the end of the first half of the sort
      // mid + 1 is the starting point for sorting the second half
      let mid = lo + Math.floor((hi - lo) / 2)
      // Recursive calls are made to the first and second halves until no more halves can be divided
      sort(a, lo, mid)
      sort(a, mid + 1, hi)
      // Merge the two halves of the array
      this.toolMerge(a, lo, mid, hi)
    }
    sort(arr, 0, arr.length - 1)
    return arr
  },
  // Bottom-up merge sort
  mergeBU (arr) {
    // Get the array length
    let len = arr.length
    // The outer loop maintains a merged unit size sz
    // Since it always splits in half, the array it merges should be twice as large, i.e., sz *= 2 for each increment
    for (let sz = 1; sz < len; sz *= 2) {
      // The inner loop maintains the starting position lo of each merged array
      Lo + sz is the first half of the array to merge. If there is no data to merge on the right side of lo + sz, the loop ends
      // The size of the array is sz * 2 each time the inner loop merges, so it increments sz * 2 each time
      for (let lo = 0; lo < len - sz; lo += sz * 2) {
        // merge the current array
        // The starting point is lo and the middle position is lo + sz-1
        If the subscript at the end of the array is smaller, the end of the array must be taken to end the merge of the entire array
        this.toolMerge(arr, lo, lo + sz - 1.Math.min(lo + sz * 2 - 1, len - 1))}}return arr
  },
  // Utility functions: in-place merge
  // It takes an ordered array of two halves, starting position, middle position, and end position
  // The output merges the left and right halves of the array, resulting in a large ordered array
  toolMerge (a, lo, mid, hi) {
    // declare Pointers I and j to represent subscripts traversing the left and right half of the groups, respectively
    let i = lo // Start the left array
    let j = mid + 1 // Start of the right array
    // Declare a temporary array and copy all the elements passed into the array
    // Return the elements from the temporary array to the original array, and output the original array
    let temp = []
    for (let k = lo; k <= hi; k++) temp[k] = a[k]
    // Iterate over the temporary array
    for (let k = lo; k <= hi; k++) {
      // If the left array is empty, the value must be taken from the right array and the pointer to the right array must be moved one step to the right
      if (i > mid) { a[k] = temp[j++] }
      // If the right array is empty, the value must be taken from the left array and the left array pointer moved one step to the right
      else if (j > hi) { a[k] = temp[i++] }
      // If the value of the right array is smaller, the value of the right array is returned to the original array, and the pointer to the right array is moved one step to the right
      else if (temp[i] >= temp[j]) { a[k] = temp[j++] }
      // If the value of the left array is smaller, the value of the left array is returned to the original array and the pointer to the left array is moved one step to the right
      else if (temp[i] < temp[j]) { a[k] = temp[i++] }
    }
  }
}
Copy the code

Performance comparison

Running the above code on 10,000 pieces of data, it turns out that merge sort is slower than Hill sort, but still faster than basic sorts like selection and insertion sort.

“Algorithms (4th Edition)” describes: “In practical application, the running time difference between merge sort and Hill sort is within constant level, so the relative performance depends on the specific implementation. Theoretically, no one has been able to prove that the running time of Hill sort is linearly logarithmic for random data, so it is possible that hill sort runs higher on average. In the worst case, the existence of this gap has been confirmed, but it has no impact on practical applications.”

In fact, merge sort can also optimize its performance by performing insertion sort on small arrays and assuming that the entire array is ordered if the leftmost of the left array is less than the leftmost of the right array. So, the above code has a lot of room for optimization, and it does not mean that merge sort is slower than Hill sort.

Quick sort

Many books give high marks to quicksort. Quicksort is an algorithm with very good average performance, and only requires a small auxiliary stack (small memory footprint), and the principle is very simple.

Quicksort is also divide-and-conquer. It requires a point of sharding, pivot. Take ascending sort as an example, place the remaining elements of the array larger than Pivot to its right and smaller than Pivot to its left. And then you recursively slice it until the whole array is sorted.

Dynamic graph demonstration

Quick Sort GIF demo algorithm visualization source: Visualgo.net/

Code implementation

let funcs = {
  // Use auxiliary arrays to split, implementation is very simple
  qSort (arr) {
    if (arr.length === 0) { return[]}// Declare an auxiliary array to hold data smaller than pivot and larger than Pivot
    // Declare pivot, which takes the first value of the array
    // In fact, pivot can be any value in the sorted array, and how does valuing it affect the algorithm's final performance
    let lesser = [], greater = [], pivot = arr[0]
    // Iterate over the number groups and place data smaller than Pivot into Lesser
    // Data greater than pivot is placed in greater
    for (let i = 1; i < arr.length; i++) {
      if (arr[i] < pivot) {
        lesser.push(arr[i])
      } else {
        greater.push(arr[i])
      }
    }
    // Final output [... data set smaller than Pivot, pivot,... data set larger than pivot]
    // Make recursive calls to the split left and right arrays to output the ordered left and right arrays
    // So after all the recursion is complete, the entire array will be sorted
    return this.qSort(lesser).concat(pivot, this.qSort(greater))
  }
}
Copy the code

Performance comparison

The above code makes a lot of use of the JS native API, performance is bound to be poor, let’s take a look at its performance.

Running the above code on 10,000 pieces of data, quicksort performed only slightly better than the above unoptimized merge sort, almost as well as the basic insert sort, and far worse than the hill sort implemented earlier.

Scaling up the data to 100,000 shows that the performance of the quicksort implemented here is even worse, running twice as long as merge sort above. However, it is interesting that when the amount of data increases, the performance of the basic sorting algorithm deteriorates sharply, while merge sort and quicksort based on the idea of divide-and-conquer show advantages.

The pros and cons of quicksort are clearly explained in Algorithms (4th Edition) : “The inner loop of quicksort is shorter than that of most sorting algorithms, which means it is faster in theory and in practice. Its main drawback is that it is very fragile and has to be implemented with great care to avoid poor performance.” Therefore, quicksort requires various methods of algorithm optimization to avoid these situations.

Optimization of quicksort

To reduce the number of operations, swap the array elements before and after the shard

Based on the implementation experience in the previous section, we should avoid using JS’s native API in the shard process, so we need to optimize the shard process by swapping front and back arrays.

Images demonstrate

Code implementation

let funcs = {
  // Use array elements before and after swap
  qSortOptimizeSegmentation (arr) {
    // Receive array A, start lo, end hi
    // Split the array into two parts with the left side less than the shard point and the right side greater than the shard point
    // The position of the final output shard point
    let partition = (a, lo, hi) = > {
      // declare Pointers I and j to traverse the array from front to back and from back to front, respectively; Declare the split point v
      let i = lo, j = hi + 1, v = a[lo]
      // The outer loop controls the movement of the I and J Pointers, and ends the loop when they meet
      while (true) {
        // Loop through the input array, I from front to back, j from back to front
        // As long as the value is greater than or equal to v, the loop I is finished, and the data position I needs to be moved to the right of the sharding point is retrieved
        // do the same for the j loop
        while (a[++i] < v) { if (i === hi) { break}}while (a[--j] > v) { if (j === lo) { break}}// Loop terminator condition, two Pointers meet, array traversal is complete
        if (i >= j) { break }
        // Will need to move to the two data exchange locations on the other side
        this.toolExch(a, i, j)
      }
      // The data is divided between j and j + 1 into left arrays smaller than v and right arrays larger than A
      // Switch the position between the point of segmentation and j data to get the segmented data
      this.toolExch(a, lo, j)
      // output shard point position j
      return j
    }
    let qs = (a, lo, hi) = > {
      // Process to the end of the array to end the recursion
      if (lo >= hi) { return }
      // Divide the array into two parts: the left part is smaller than the shard point, and the right part is larger than the shard point, and print the shard point position j
      let j = partition(a, lo, hi)
      // Sort the left and right arrays recursively
      qs(a, lo, j - 1)
      qs(a, j + 1, hi)
    }
    qs(arr, 0, arr.length - 1)
    return arr
  }
}
Copy the code

Performance comparison

In the test of 10000 pieces of data, it can be found that the speed is significantly accelerated after the segmentation method is optimized.

When we increased the test data to 100,000, we found that the performance of quicksort after the optimized shard method surpassed all previous implementations. It can also be found that the larger the data set, the more performance advantages of quicksort can be exploited.

To optimize the sorting speed of small data set by insertion sort

Algorithms (4th Edition) explains the reasons for using insertion sort in small data sets:

As with most recursive sorting algorithms, a simple way to improve quicksort performance is based on the following two points:

  • For small arrays, quicksort is slower than insert sort
  • Because of recursion, quicksort’s sort() method also calls itself in small arrays

Therefore, switch to insert sort when sorting small arrays. However, how small is an array that needs to switch to insert sort? The book explains that the best values for conversion parameters are system-specific, but any value between 5 and 15 is satisfactory in most cases.

Code implementation

let funcs = {
  // Insert sort for small data sets with size less than 10 (5-15)
  qSortOptimizeSmallDataSet (arr) {
    let partition = (a, lo, hi) = > {
      let i = lo, j = hi + 1, v = a[lo]
      while (true) {
        while (a[++i] < v) { if (i === hi) { break}}while (a[--j] > v) { if (j === lo) { break}}if (i >= j) { break }
        this.toolExch(a, i, j)
      }
      this.toolExch(a, lo, j)
      return j
    }
    let qs = (a, lo, hi) = > {
      // Use insertion sort and end the recursion if the distance between the start and end positions is less than or equal to 10
      if (hi <= lo + 10) { a = this.toolInsertionSort(a, lo, hi); return }
      let j = partition(a, lo, hi)
      qs(a, lo, j - 1)
      qs(a, j + 1, hi)
    }
    qs(arr, 0, arr.length - 1)
    return arr
  },
  // Perform insert sort in the specified range
  // Add the start-stop parameters based on the implementation in the insert sort section
  toolInsertionSort (arr, lo, hi) {
    for (let i = lo; i < hi + 1; i++) {
      let j = i
      let curr = arr[i]
      while (j > lo && arr[j - 1] > curr) {
        arr[j] = arr[j - 1]
        j--
      }
      arr[j] = curr
    }
    return arr
  }
}
Copy the code

Performance comparison

Article 10000 the data qSortOptimizeSmallDataSet processing

Article 100000 the data qSortOptimizeSmallDataSet processing

Article 1000000 the data qSortOptimizeSmallDataSet processing

Article 10000000 the data qSortOptimizeSmallDataSet processing

After more than one million data, optimize the small data set processing qSortOptimizeSmallDataSet function achieved better results.

Use three-way shards to optimize its efficiency in dealing with large amounts of repeated data

The focus of three-way segmentation is to deal with a large amount of repeated data. The standard quicksort is still comparison-based, meaning that it compares all elements to output no matter how many duplicate elements there are. Three-way partitioning, on the other hand, aggregates repeating elements into the middle of an array, with small elements on the left and large elements on the right.

Images demonstrate

Trace of three-way segmentation (array contents after each iteration loop)

Code implementation

let funcs = {
  // Three-way sharding moves all the data equal to the sharding point to the middle, avoiding duplicate sorting of all the data equal to the sharding point
  qSortThreeWayPartition (arr) {
    // Receive an array, a start position, and an end position
    let qs = (a, lo, hi) = > {
      // Insert sort on data sets less than or equal to 10
      if (hi <= lo + 10) { a = this.toolInsertionSort(a, lo, hi); return }
      // Maintain two Pointers lt and gt
      // the [0, lt] range stores data smaller than the shard value
      // [gt, arr. Length-1] range saves data larger than sharded data
      // [lt, gt] the range is the shard value repeated in the next iteration
      let lt = lo, gt = hi
      // Maintain a pointer I moving from left to right to iterate over the array
      let i = lo + 1
      // mark the current shard value v
      let v = a[lo]
      // Iterate through the array from left to right until you reach gt
      / / the reason is
      while (i <= gt) {
        // If the current value is less than the shard value, swap the current value with the shard value of the lt position
        // lt++, i.e. lt moves to the right to make room for the "new guy" in the left space
        // i++, traversal pointer moved right
        if (a[i] < v) { this.toolExch(a, lt++, i++) }
        // If the current value is greater than the shard value, swap the current value with the unknown value of gt position
        // gt--, i.e. Gt moves left to make room for the "new guy" in the right space
        // The I pointer does not need to advance, because the value swapped from GT is unknown and needs to be reevaluated
        else if (a[i] > v) { this.toolExch(a, i, gt--) }
        // If the current value is equal to the shard value, leave it in the [lt, gt] range
        // This position is processed, I pointer forward
        else { i++ }
      }
      // Perform recursive sorting on data that is not equal to the shard value
      qs(a, lo, lt - 1)
      qs(a, gt + 1, hi)
    }
    qs(arr, 0, arr.length - 1)
    return arr
  }
}
Copy the code

Performance comparison

Performance comparison of three-way shard Quicksort (100,000 pieces of data)

Performance comparison of three-way shard Quicksort (1,000,000 data)

Performance comparison of three-way shard Quicksort (10 million data)

It can be seen that the larger the amount of data, the more obvious the advantage of three-way segmentation, because there are more repeated data. Since the data is randomly generated in the range of 0 to 100, if the range of data generation is expanded or narrowed to reduce or increase the number of duplicates, will the performance of three-way segmentation based on categorization duplicates be affected?

Performance comparison of three-way shard quicksort (10 million data, data generated in the range of 0~10)

Performance comparison of three-way shard quicksort (10 million data, data generation in the range of 0~100)

Performance comparison of three-way shard quicksort (10 million data, data generated in the range of 0~1000)

Performance comparison of three-way shard quicksort (10 million data, data generated in the range of 0~10000)

We find that when the generation range is reduced to 0~10, the advantage of three-way sharding is further expanded, but when the generation range is expanded, the advantage of three-way sharding decreases rapidly, and when the generated data is in the range of 0~10000, the performance of three-way sharding is already lower than that of binary sharding. Therefore, the three-way segmentation scheme is still suitable for repeated data, such as gender ranking.

summary

In fact, just in the book “Algorithm (4th Edition)”, the optimization ideas of quicksort include: the timing of switching to insertion sort, the selection of sampling method of sharding point (such as using array median), the use of sentry instead of boundary checking in binary sharding, sampling and sharding, etc. Limited by time, so many optimization schemes can not be studied one by one, let alone heap sort, count sort, bucket sort and radix sort this series of sorting algorithm, if I have the opportunity to supplement one by one.

I have to say, the process of learning an algorithm always makes you feel the magic of a computer program (narrator: It works like that!). , this is probably the “beauty of algorithms” ~

Finally, attach the full version address of the test platform in the article: github.com/yyj08070631…

Refer to the content

  • JS’s Sorting algorithm
  • JavaScript Description of Data Structures and Algorithms
  • Algorithms (4th Edition)