Summary of sorting algorithm

O(n*n) : bubble, insert, select
O(nlogn) : merges quickly
O(n) : bucket, count, base

The sorting algorithm is mainly considered from three aspects: time complexity, space complexity and stability.

Bubble sort

Bubble sort only operates on two adjacent pieces of data. Each bubbling operation compares two adjacent elements to see if they meet the size relationship requirements. If you don’t, switch them. One bubble moves at least one element to its proper position, n times, and n numbers are sorted.

In fact, the bubbling process can be optimized. If no data is exchanged during a bubbling operation, the order is complete and subsequent bubbling operations do not need to be performed.

Public void bubbleSort(int[] a, int n) {if (n <= 1) return; public void bubbleSort(int[] a, int n) {if (n <= 1) return; for (int i = 0; i < n; ++ I) {Boolean flag = false; for (int j = 0; j < n - i - 1; + + j) {if (a > [j] a [m + 1]) {/ / exchange int TMP = a, [j]. a[j] = a[j+1]; a[j+1] = tmp; flag = true; }} if (! flag) break; // No data exchange, exit early}}Copy the code

Time complexity: In the best case, the data to be sorted is already in order and we only need one bubble operation to finish, so the best case time complexity is O(n). In the worst case, the data to be sorted is in reverse order, so we need to bubble n times, so the worst case is O(n by n). The average time complexity is order n by n.

Space complexity: The bubbling process only involves the exchange operation of adjacent data and only needs constant level temporary space, so its space complexity is O(1) and it is an in-place sorting algorithm.

Stability: In bubble sort, only swapping can change the order of two elements. In order to ensure the stability of bubble sort algorithm, when there are two adjacent elements of the same size, we do not exchange, the same size of data before and after sorting will not change the order, so bubble sort is a stable sorting algorithm.

Insertion sort

The core idea of insertion algorithm is to take the elements in the unsorted interval, find the appropriate insertion position in the sorted interval, and ensure that the sorted interval data is always in order. This process is repeated until the unsorted interval is empty and the algorithm ends.

Public void insertionSort(int[] a, int n) {if (n <= 1) return; for (int i = 1; i < n; ++i) { int value = a[i]; int j = i - 1; // find the insertion position for (; j >= 0; --j) { if (a[j] > value) { a[j+1] = a[j]; } else {break; } } a[j+1] = value; // Insert data}}Copy the code

Time complexity: If the data to be sorted is already in order, we do not need to move any data. If we look up the insertion position from end to end in an ordered data set, we can determine the insertion position by comparing only one data at a time. So in this case, the best time is order n. If the array is in reverse order, each insert is equivalent to inserting new data into the first position of the array, so a large amount of data needs to be moved, so the worst-case time is O(n*n). Each insert is equivalent to inserting one data into an array, and the cycle is n inserts, so the average time complexity is O(n*n).

Space complexity: The insertion sort algorithm does not require additional storage to run, so the space complexity is O(1), that is, it is an in-place sort algorithm.

Stability: In insert sort, for elements with the same value, we can choose to insert the following elements after the previous elements, so that the original order can remain unchanged, so insert sort is stable sorting algorithm.

Selection sort

The implementation of selection sort algorithm is similar to insertion sort, which is divided into sorted and unsorted intervals. But select sort will find the smallest element in the unsorted range each time and place it at the end of the sorted range.

public static void selectionSort(int[] arr){ for(int i = 0; i < arr.length - 1; Int minIndex = I; int minIndex = I; // Each element is compared with the rest of the unsorted elements for(int j = I + 1; j < arr.length; j++){ if(arr[j] < arr[minIndex]){ minIndex = j; Int temp = arr[I]; int temp = arr[I]; arr[i] = arr[minIndex]; arr[j] = temp; }}Copy the code

Time complexity: The best-case time complexity, worst-case time complexity, and average case time complexity of the selection sort are O(n*n). Because no matter what the original order is, you have to find the minimum from the unordered array, and the minimum can only be found by comparing all of them once.

Spatial complexity: The selective sorting algorithm does not require additional storage space to run, so the spatial complexity is O(1), that is, it is an in-place sorting algorithm.

Stability: Selection sort is an unstable sorting algorithm. Selection sort destroys stability by finding the minimum of the remaining unsorted elements and swapping places with the previous element. Because of this, selection sort is somewhat inferior to bubble sort and insert sort.

In addition, from the point of view of code implementation, the data exchange of bubble sort is more complicated than the data movement of insert sort. Bubble sort requires three assignments, while insert sort requires only one. So if we want to maximize performance, insert sort is definitely the first choice.

If (a[j] > a[j+1]) {// swap int TMP = a[j]; a[j] = a[j+1]; a[j+1] = tmp; flag = true; If (a[j] > value) {a[j] = a[j]; } else {break; }Copy the code

Merge sort

The core idea of merge sort is pretty simple. To sort an array, we divide the array into two parts, sort the two parts separately, and merge the sorted parts together so that the entire array is sorted.

// merge sort algorithm, A is array, Merge_sort (A, n) {merge_sort_c(A, 0, n-1)} Merge_sort_c (A, p, q) merge_sort_c(A, q+1); R) / / will be A [p]... q and A [r] q + 1... into A [r] p... the merge (A [p]... r, A/p... q], A [r] q + 1...)} the merge (A [p]... r, A/p... q, Var I := p, j := q+1, // initialize variables I, j, Var TMP: k = new array [0... r - p] / / apply for A size like A [p]... r temporary array while I < = q AND j < = r do {if A [I] < = A [j] {TMP [k++] = [i++]} a. Else {TMP [k++] = A[j++]}} end := q if j<=r then start := j, End :=r while start <= end do {TMP [k++] = A[start++]} for I :=0 to r-p do { A[p+i] = tmp[i] } }Copy the code

Time complexity: T(a) = T(b) + T(c) + K. Where K is equal to the time consumed to combine the results of two sub-problems B and C into the results of problem A. Let’s say it takes T(n) to merge n elements, so it takes T(n/2) to split into two subarrays. We know that merge() merges two ordered subarrays in O(n) time. So, using the previous formula, the time complexity of merge sort is calculated by T(n) = 2*T(n/2) + n=O(nlogn).

Space complexity: Merge sort merge functions that require additional storage space to merge two ordered arrays into one ordered array. So the space complexity is order n, which means that this is not an in-place sorting algorithm.

Stability: During the merge process, if A[p…q] and A[q+1…r] have the same value, we can put the elements in A[p…q] into the TMP array as in the pseudocode. This ensures that elements with the same value are placed in the same order before and after the merge. So merge sort is a stable sorting algorithm.

Quick sort

Quicksort also uses the divide-and-conquer idea. The idea of quicksort is this: If we want to sort a set of data in an array with subscripts between P and R, we choose any data between P and R as the pivot (partition point). We go over the numbers between P and R, putting anything less than pivot on the left, anything greater than pivot on the right, and pivot in the middle. After this step, the data in the array p through R is divided into three parts, with p through Q-1 being less than Pivot in the front, pivot in the middle, and q+1 through R being greater than Pivot in the back.

const quickSort = (array) => { const sort = (arr, left = 0, Right = arr. Length-1) => {if (left >= right) {return} let I = left let j = right const BaseVal = arr[j]; baseVal = arr[j]; baseVal = arr[j]; baseVal = arr[j]; While (j > I &&arr [j] >= baseVal) {if (j > I &&arr [j] >= baseVal) {if (j > I &&arr [j] >= baseVal) { } arr[j] = baseVal [j];} arr[j] = baseVal [j] Sort (arr, left, j-1) // Repeat sort(arr, j+1, Const newArr = array.concat() const newArr = array.concat() sort(newArr) return newArr}}Copy the code

Time complexity: If each partition operation can exactly divide the array into two cells of approximately equal size, then the recursive formula for the time complexity of quicksort is the same as that for merge. So, the time complexity of quicksort is also O(nlogn). But the formula works only if, for each partition operation, we choose the right pivot to split the large interval equally. But in practice this is very difficult to achieve. Take an extreme example. If the data in the array is already ordered, say 1,3,5,6,8. If we choose the last element as pivot each time, the two intervals will not be equal for each partition. We need to partition approximately n times to complete the whole process of fast sorting. On average, we scan about n/2 elements per partition, in which case the time complexity of quicksort degrades from O(nlogn) to O(n*n).

Spatial complexity: The quicksort algorithm does not require additional storage to run, so the spatial complexity is O(1), that is, it is an in-place sorting algorithm.

Stability: Because partitioning involves swapping, if there are two identical elements in an array, such as sequences 6, 8, 7, 6, 3, 5, 9, 4, the relative order of the 6’s changes after the first partitioning. So quicksort is not an unstable sorting algorithm.

Linear sorting algorithm

Linear sort algorithm includes bucket sort, counting sort and radix sort.
The linear sorting algorithm has O(n) time complexity.
The requirements for sorting data are very strict, so it is important to master the applicable scenarios of these three sorting algorithms.

Bucket sort

Algorithm principle:

The data to be sorted is sorted into several ordered buckets, and the data in each bucket is sorted separately.
After the bucket is sorted, the data in each bucket is taken out in sequence to form an orderly sequence.

Conditions of use:

The data to be sorted needs to be easily divided into m buckets, and buckets have a natural size order.
Data is evenly distributed across buckets.

Applicable scenarios:

Bucket sort is suitable for external sort.
External sort means that the data is stored on an external disk and there is a large amount of data, but the memory is limited and the entire data cannot be loaded into memory.

Application Cases:

Requirement description: There is 10GB order data, which needs to be sorted by order amount (assuming that the amount is a positive integer) but the memory is limited, only a few hundred MB.
Solution: Scan the file and look at the data range of the order amount, such as 1 yuan to 100,000 yuan, then divided into 100 barrels. The first bucket stores orders between 1 and 1000 yuan, the second bucket stores orders between 1001 and 2000 yuan, and so on. Each bucket corresponds to a file and is numbered and named according to the size of the bucket (00,01,02… , 99). Put 100 small files in memory one by one and sort them by quicksort. After all the files are sorted, you only need to read each small file in ascending order and write it to the large file.
Note: If a single file cannot be loaded into the memory, proceed with the preceding procedure for the file.

Count sorting

Algorithm principle:

Counting is really just a special case of bucket sorting.
If the range of n data to be sorted is not large, for example, the maximum value is K, it is divided into K buckets.
The data values in each bucket are the same, eliminating the time spent sorting buckets.

Conditions of use:

It can only be used in scenarios where the data range is small. If the data range K is much larger than the data to be sorted n, it is not suitable for counting sort.
Counting sort can only sort non-negative integers. Other types need to be converted to non-negative integers without changing their relative sizes.
For example, if the test score is accurate to the last decimal place, you need to multiply all the scores by 10 to round them up.

Application Cases:

Suppose there are only eight examinees with scores between 0 and 5, and the scores are stored in the array A[8] = [2,5,3,0,2,3,0,3].
Buckets are represented by an array of size 6, C[6], with subscripts corresponding to scores, namely, 0,1,2,3,4,5. C[6] stores the number of examinees. You can get C[6] = [2,0,2,3,0,1] just by iterating the scores of each examinee.
The sequential sum of the C[6] array is C[6]=[2,2,4,7,7,8], where C[k] stores the number of candidates less than or equal to the score k.
The array R[8] = [0,0,2,2,3,3,3,5] stores the candidate ranking. So how do I get R[8]? From the back to the former, in turn, scanning array. A, such as scanning and 3, can from the array subscript 3 value 7 C, that is to say, so far, myself included, the examinee has seven score less than or equal to 3, that is to say, 3 is the seventh element of the array R (that is, the location of the array subscript is 6 in the R). When 3 is placed in R, there are six elements less than or equal to 3 left, and C[3] is subtracted by 1 to become 6. Similarly, when a second candidate with a score of 3 is scanned, it is placed at the position of the sixth element in array R (the subscript 5 position). After scanning array A, the data in array R is sorted in order of fractions. (This is to ensure stable sorting)

Radix sort

Algorithm principle (to sort 100,000 mobile phone numbers as an example) :

If you only compare the size of two mobile phone numbers a and B, if a is already bigger than B in the first few digits, you don’t need to look at the next few digits.
Compare multiple mobile phone numbers, using the idea of stable sorting algorithm, you can first sort the phone number by the last digit, then resort it by the second to last, and so on. Note that the algorithm for sorting by bits here has to be stable, otherwise this implementation is not correct.
After 11 sorting, the phone number became in order.
Each sort has a small range of ordered data and can be done using bucket sort or counting sort.

Conditions of use:

Data can be divided into separate “bits” for comparison.
There is a progressive relationship between the bits. If the high level of data A is larger than that of data B, the remaining low levels need not be compared.
The data range of each bit should not be too large, and linear sorting should be possible, otherwise the time complexity of radix sorting cannot reach O(n).

How to implement a generic, high-performance sorting function

Although linear sorting algorithm has low time complexity, it lacks universality due to its special application scenarios. If the sorting is for small-scale data, the algorithm whose time complexity is less than O(n*n) can be selected. If sorting large-scale data, the time complexity of O(nlogn) algorithm is more efficient. Therefore, in order to take into account the sorting of arbitrary scale data, the time complexity of O(nlogn) algorithm is generally selected to achieve the sorting function.

Considering that the space complexity of merge sort is O(n), quicksort is more suitable to implement the sorting function. However, the time complexity of quicksort is O(n*n) in the worst case. How to optimize this problem?

So let’s see, why is quicksort in the worst case order n by n? As I mentioned earlier, if the data was originally ordered or nearly ordered, and the last data was selected at each partition point, the quicksort algorithm would be very bad, and the time would degrade to O(n by n). In fact, the main reason for this O(n*n) time complexity is that we didn’t choose our partitioning points properly. What kind of partition point is a good partition point? Or how to choose partition points? The ideal partition point is one where the two partitions separated by the partition point have about the same amount of data.

Here introduce two more commonly used, relatively simple partition algorithm.

We take a number from the beginning, the end and the middle of the interval, and then compare the size, and take the middle value of these three numbers as the partition point. In this way, at every interval of a fixed length, take the data out for comparison, and the partition algorithm that takes the median value as the partition point is definitely better than simply taking a certain data. However, if the array to be sorted is large, “middle of three” may not be enough, and “middle of five” or “middle of ten” may be used.
Random method Random method is to randomly select one element at a time from the interval to be sorted. This method does not guarantee that the partition points are selected well every time, but from the perspective of probability, it is unlikely that the partition points are selected badly every time, so on average, the partition points selected in this way are relatively good. It is unlikely that the time complexity will degenerate to the worst-case O(n2).

In order to avoid the stack overflow caused by too much recursion in quicksort, there are two solutions: the first is to limit the recursion depth. Once the recursion gets too deep, beyond the threshold we set, we stop recursing. The second way is to manually simulate the process of pushing and pushing a stack recursively by simulating a function call stack on the heap, so that there is no limit to the size of the system stack.

** For small-scale data, O(n*n) time complexity algorithm is not necessarily longer than O(nlogn) algorithm. ** Because in the big O notation, we omit the lower order, coefficients, and constants, that is, O(nlogn) may be O(KNlogn + C) without omitting the lower order, coefficients, and constants, and k and C may still be large numbers.

Bubble sort

Insertion sort

Selection sort

Merge sort

Quick sort

Linear sorting algorithm

Bucket sort

Count sorting

Radix sort

How to implement a generic, high-performance sorting function

Related Posts

Python Web deployment: Use flask + Gunicorn + Supervisor + nginx

It’s a crazy spirit to build the Hexo blog

The math necessary for machine learning