This is the sixth day of my participation in the First Challenge 2022. For details: First Challenge 2022.

The principles of 10 common sorting algorithms are introduced in detail, including bubble sort, selection sort, insertion sort, Hill sort, heap sort, merge sort, quick sort, counting sort, bucket sort and radix sort. And each sort provides implementation examples of Java code.

1 Sorting Overview

The concept of sorting: sorting the input data in order from smallest to largest or from largest to smallest in a comparative relation.

Sorting stability: If the relative order of these records remains unchanged, that is, in the original sequence, r[I]=r[j], and r[I] is before r[j], and in the sorted sequence, r[I] is still before r[j], then the sorting algorithm is stable. Otherwise it is called unstable.

Internal sort and external sort: according to whether all the records to be sorted are placed in memory during sorting, sorting is divided into internal sort and external sort.

Internal sort is when all the records to be sorted are placed in memory during the sorting process. External sorting is because the number of sorted records is too large to be placed in the memory at the same time, the whole sorting process needs to exchange data between internal and external memory for many times. Here mainly introduces the method of inner sort.

According to the main operations in the sorting process, we divide inner sort into four types: insertion sort, swap sort, selection sort and merge sort, each of which has its own specific methods and different time complexity:

Insertion sort Exchange sort Selection sort Merge sort
Insertion sort Bubble sort Selection sort Merge sort
Hill sorting Quick sort Heap sort

The above 7 kinds of sorting algorithms are divided into two categories according to the complexity of the algorithm, bubble sort, selection sort and insertion sort belong to the simple algorithm, while Hill sort, heap sort, merge sort and quick sort belong to the improved algorithm. The above 7 algorithms are collectively referred to as comparison sorting.

Comparison sort means that in the end result of a sort, the order between elements depends on how they are compared. Each number must be compared with the others to determine its position. The difference between algorithms is the number of comparisons, so in bubble sort, for example, the problem size is N, and because you have to compare n times, the average time complexity is O(n ^ 2). In merge sort, quicksort and so on, the problem size is reduced to logN times by divide-and-conquer, so the average time complexity is O(nlogn). The advantage of comparative sorting is that it is applicable to all sizes of data, regardless of the distribution of data, can be sorted. You can say that comparison sort applies to everything that needs sorting. An algorithm that sorts two numbers by comparing their sizes is at least O(NLGN) in time.

Because if we want to sort an array, we must look at at least every element, so we can infer that O(n) is the lower bound on all sorting algorithms. So is there any sort algorithm that has linear time complexity (O(n))? Under certain conditions, in fact, there are, counting sort, radix sort, bucket sort is linear complexity of sort, at the same time, they all have “certain requirements”!

Counting sort, radix sort and bucket sort belong to non-comparison sort. Non-comparative sort requires that the sorted elements be integers and within an explicit m-n range. The position of arr[I] in the sorted array is uniquely determined by counting the number of elements before arr[I]. Non-comparison sorting only needs to determine the number of existing elements before each element, so it can be solved by traversing it once. The algorithm time complexity is O(n). The time complexity of non-comparative sort is low, but it takes up space to determine the unique position. Therefore, there are certain requirements for data scale and data distribution, that is, high spatial complexity.

2 Comparison sort

2.1 Bubble Sort

2.1.1 Implementation of bubble sort

No matter what language, bubble sort as one of the simplest sorting algorithm, is also a novice must know one of the sorting algorithm.

Bubble sorting principle: compare the former number with the latter number, if the former number is smaller than the latter, then switch positions, after the completion of the round, rank the maximum value in the front and then start the second round, select the second largest value, rank in the second to last position, until rank to the second order position, complete the sorting.

The outer loop controls the number of cycles, and the inner loop controls the two numbers for comparison.

Bubble sort is to move small elements forward or large elements back. A comparison is a comparison of two adjacent elements, and an exchange occurs between the two elements. So, if two elements are equal, they don’t swap; If two equal elements are not adjacent, then even if the two elements are adjacent by the previous pairwise swap, they will not swap, so the order of the same elements does not change, so bubble sort is a stable sorting algorithm.

public class BubbleSort {
    public static void main(String[] args) {
        int[] arr = new int[] {9.8.5.3.1.7};
        // The outer loop controls the number of loops
        // The first loop comparison takes five times, and then picks the maximum 9, ranking at the end
        // The second loop only compares four times and picks the second largest value, 8, which is the second-to-last position
        // The NTH loop needs to compare arr. Length -n times
        // The last loop requires a comparison
        for (int i = 0; i < arr.length - 1; i++) {
            // The inner loop controls the comparison of two adjacent numbers
            for (int j = 0; j < arr.length - 1 - i; j++) {
                // If the former number is greater than the latter number, then switch places
                if (arr[j] > arr[j + 1]) {
                    arr[j] = arr[j] ^ arr[j + 1];
                    arr[j + 1] = arr[j] ^ arr[j + 1];
                    arr[j] = arr[j] ^ arr[j + 1]; } } } System.out.println(Arrays.toString(arr)); }}Copy the code

2.1.1 Complexity analysis of bubble sort

Let’s say there are n numbers to sort. In bubble sort, the first round needs to be compared n-1 times, and the second round needs to be compared N-2 times… Round N – 1 requires one comparison. Therefore, the total number of comparisons is (n-1) + (n-2) +… Plus 1 is n squared over 2. ** The number of comparisons is constant and has nothing to do with the order of input data. ** Therefore, the time complexity of bubble sort is O(n²).

However, the number of numbers exchanged depends on the order in which the input data is arranged. Suppose there is an extreme case where the input data is exactly in order from smallest to largest, then no swap operations are required. Conversely, if the input data is in order from largest to smallest, it will be swapped each time the numbers are compared. That is, the complexity of swapping elements is at most O(n ^ 2).

Bubble sort does not use external auxiliary variables or data structures, so the space complexity is O(1).

2.2 Selection Sort

2.2.1 Select the implementation of sorting

Selection sort principle: The first round, the use of the first value, the index of I, in turn with the back of the value, and using a temporary variable min = I, after the comparison is relatively small the value of the index, after the inner loop, determine if min is not equal to the first element of the subscript I, let he and the first element exchange value, thus to find the smallest number in the array, At the end of the round, the smallest value is placed first; The loop compares the second value to the next value until the penultimate value completes the comparison, the sorting.

The outer loop controls the first number of comparisons, and the inner loop controls the second number of comparisons.

After a round of selection, if an element is smaller than the current element and the smaller element appears after an element equal to the current element, the stability after the exchange is broken. For example, in sequence 5, 8, 5, 2, 9, the first selection of the first element 5 will swap with 2, so the relative order of the two 5’s in the original sequence will be broken, so selection sort is an unstable sorting algorithm.

public class SelectionSort {
    public static void main(String[] args) {
        //int[] arr = new int[]{9, 8, 5, 3, 1, 7};
        int[] arr = new int[] {49.38.65.97.76.13.27.49.78};
        // The outer loop controls the first number index to be compared [0 - (arr. Length-1) -1]
        // The first loop requires five comparisons, and then the minimum value 9 is selected to rank first
        // The second loop only compares four times and picks the second smallest value, 8, to rank second
        // The NTH loop needs to compare arr. Length -n times
        // The last loop requires a comparison
        for (int i = 0; i < arr.length - 1; i++) {
            int min = i;
            [I +1 ~ arr.length-] [I +1 ~ arr.length-]
            for (int j = i + 1; j < arr.length; j++) {
                // Record the smallest number of the round index min
                if(arr[min] > arr[j]) { min = j; }}Arr [min] = arr[I] = arr[I] = arr[I]
            if (min != i) {
                arr[i] = arr[i] ^ arr[min];
                arr[min] = arr[i] ^ arr[min];
                arr[i] = arr[i] ^ arr[min];
            }
        }
        System.out.println(Arrays.toString(arr));
    }
}

Copy the code

2.2.2 Complexity analysis of selection sort

Selection sort uses linear lookup to find the minimum, so n-1 numbers need to be compared in round 1, and n-2 numbers need to be compared in round 2… By round N-1 you only have to compare one number. Therefore, the total number of comparisons is the same as bubble sort, both are (n-1) + (n-2) +… Plus 1 is n squared over 2 times. The time of selection sort is the same as bubble sort, order n ^ 2.

A maximum of one number can be exchanged in each round. If the input data is sorted from smallest to largest, no interchange is required. That is, the complexity of the number of exchange elements is O(n) at most. If the cost of exchange elements is relatively high, then selective sorting is better than bubble sorting.

The selection sort only relies on an external auxiliary variable, which is a constant relative to the number of input elements n, and does not rely on other data structures, so the space complexity is O(1).

2.3 Insertion Sort

2.3.1 Implementation of insertion sort

Insert sort principle: During the sort process, the data on the left is sorted by default, while the data on the right is not sorted. The idea of insert sort is to take the first data from the unsorted region on the right and insert it into the appropriate position in the sorted region on the left. When the right side of the data extraction sorting only left side of the region, then sorted.

The outer loop controls the unsorted numbers on the right, and the inner loop controls the sorted numbers on the left.

public class InsertionSort {
    public static void main(String[] args) {
        int[] arr = new int[] {9.8.5.3.1.7};
        int j;
        // The outer loop controls the unsorted numbers to the right assuming that the "first number" is "already" sorted, so the unsorted data is taken from the second
        for (int i = 1; i < arr.length; i++) {
            int norSort = arr[i];
            // The inner loop controls the sorted number on the left, starting with the largest sorted number
            // If the unsorted number is less than the sorted number arr[j], move the arr[j] image one bit to the right
            for (j = i - 1; j >= 0 && norSort < arr[j]; j--) {
                arr[j + 1] = arr[j];
            }
            // If the unsorted number is greater than the sorted number arr[j], then arr[j+1] is assigned to norSort, which is the position in the sorted data to be inserted for the sorted number
            arr[j + 1] = norSort; } System.out.println(Arrays.toString(arr)); }}Copy the code

2.3.2 Sorting process analysis

This algorithm may be a little more convoluted than bubble sort and selection sort, but let’s take a look at the steps to perform the insertion sort on the array {9, 8, 5, 3, 1, 7}. We can think of an insertion sort of n numbers as an n-1 round sort. In the array above, n=6 means 5 rounds of sorting.

Unsorted: When unsorted, the leftmost number is treated as the sorted data region by default. The array structure is as follows:

Round 1 sort: outer loop I =1, extract unsorted element 8; Enter the inner loop, j=i-1=0, that is, 0 index element 9, compare them and find that 8<9, at this time, assign 9 to the position of index 0+1=1, j– becomes -1, which is not greater than or equal to 0, and the inner loop ends. To proceed to the next step, assign the value of the -1+1=0 index to the unsorted element 8. The outer loop ends and the first round ends. The array structure is as follows:

Round 2 sort: outer loop I =2, extract unsorted element 5; Enter the inner loop, j=i-1=1, that is, 1 index element 9, compare them and find that 5<9, then assign 9 to the position of index 1+1=2, j– becomes 0>=0, compare 8 and 5 of index 0 and find that 5<8, enter the second inner loop, at this time assign 8 to the position of index 0+1=1. J — becomes -1 does not satisfy greater than or equal to 0, the inner loop ends; To proceed to the next step, assign the value of the -1+1=0 index to the unsorted element 5. The outer loop ends, and the second sorting ends. The array structure is as follows:

Repeat 5 rounds of sorting, then the sorting can be completed:

2.3.3 Complexity analysis of insert Sort

In insert sort, you need to compare the retrieved data with the number to the left. Just like in the previous step, if the number on the left is smaller, there is no need to continue the comparison, and this is the end of the round, so there is no need to switch the number. If the table to be sorted is itself ordered, then n rounds need only be compared n times with O(n) time.

However, if the retrieved number is smaller than all the sorted numbers on the left, you must keep comparing sizes and swapping numbers until it reaches the far left of the entire sequence. In particular, the NTH round of the outer layer requires the inner layer to be compared n times. So, in the worst case, when the input data is sorted in descending order, round 1 requires one operation, round 2 requires two operations… The NTH run is n times, so the time is O(n ^ 2), the same as bubble sort and selection sort.

When the table data is basically ordered, insert sort is faster than bubble sort and select sort. When the table data is basically ordered, insert sort is faster than bubble sort and select sort. Insert sort applies when some data is already sorted, and the larger the sorted part, the better. Generally, it is not recommended to use insertion sort when the input size is larger than 1000.

The spatial complexity of insertion sort is constant order O(1).

So far, we know that the time complexity of bubble sort, selection sort and insertion sort is O(n²), and the following will introduce the sorting algorithm that breaks the square time barrier.

2.4 Shell Sort

2.4.1 Principle and implementation of Hill sort

For a long time, people realized that despite the variety of sorting algorithms (for example, the three different sorting algorithms we mentioned earlier), the time complexity was O(n²) and seemed unsurmountable.

Hill sorting is a sorting algorithm proposed by D.L.Shell in 1959. Hill sorting is one of the first sorting algorithms to break the second time barrier. Unfortunately, it took several years before its discovery to prove its subquadratic time bound.

Hill sort is an improved method based on the following two properties of insertion sort:

  1. Insertion sort has high efficiency when it operates on almost ordered data, that is, it can achieve the efficiency of linear sort.
  2. However, for large and basically unordered data, insertion sort is inefficient, because insertion sort can only move the data one bit at a time, and the work of insertion sort is proportional to the square of n, if N is small, then the work of sorting is naturally much smaller.

Hill sorting algorithm firstly groups the records to be sorted according to a certain increment of subscript index D1 <n, and the subscripts of the records in each group differ d1, sorting all elements in each group respectively. At this time, for each index I, arr[I]< ARr [I + D], and the whole record becomes “relatively ordered”. It is then grouped with a small increment of D2 <d1, sorted in each group, and the records become more “relatively ordered”; Repeat the grouping and sorting mentioned above. When the incremental DN is reduced to 1, that is, all records are finally placed in the same group for a complete insertion sorting, the sorting is completed.

Due to the delta sequences D1, d2… , dn (dn=1), Hill sort is also called reduced increment sort. The method is essentially a grouping insertion method. Comparing Numbers, separated by a distance (called incremental) made several moves to across multiple elements, brisk move, is a comparison can eliminate multiple elements to exchange, and each round of grouped data is less, and round after the sorting data “relatively orderly”, its overall efficiency higher than direct insertion sort.

Because of multiple insertion sorts, we know that the single insertion sort is stable and does not change the relative order of the same elements. However, in the process of different insertion sorts, the same elements may move in their own insertion sorts, and finally their stability will be disturbed, so the Hill sort is unstable.

One implementation of Hill sort is as follows:

public class ShellSort {
    public static void main(String[] args) {
        int[] arr = new int[] {49.38.65.97.76.13.27.49.78.34.12.64.1};
        int j;
        // Hill delta initial gap=arr. Length / 2 increments halved each time gap /= 2 until equal to 0, end hill sort
        for (int gap = arr.length / 2; gap > 0; gap /= 2) {
            /* Insert sort is used internally for each group of elements, so instead of traditional insert sort, insert sort moves data by leaps and bounds */
            // The outer loop controls the unsorted numbers to the right of a group, again assuming that the "first number" is "already" sorted, so the unsorted data starts at the index of the second group, 0+gap
            for (int i = gap; i < arr.length; i++) {
                int norSort = arr[i];
                // The inner loop controls the sorted numbers to the left of a group, starting with the largest sorted number, again assuming that the "first number" is "sorted", i.e. the number indexed by 0
                // If the unsorted number is less than the sorted number arr[j], move the arr[j] image right j+gap bit
                for (j = i - gap; j >= 0 && norSort < arr[j]; j -= gap) {
                    arr[j + gap] = arr[j];
                }
                // If the unsorted number is greater than the sorted number arr[j], then arr[j+gap] is assigned to norSort, which is where the unsorted number needs to be inserted in the sorted dataarr[j + gap] = norSort; } } System.out.println(Arrays.toString(arr)); }}Copy the code

2.4.2 Sorting process analysis

Hill sort is the improvement of insertion sort, here is a detailed description of the sort process.

The ungrouped data is: The first round:The initial Hill delta is zeroarr.length / 2=4. Then insert sort is performed on each group of data grouped by 4. The structure after grouping is shown below:

After inserting and sorting each group, the structure is as follows:

Can see that after this group each group of data quantity less, insertion sort is more efficient, though after the first round and no elements completely and orderly, but it will be smaller elements, is not a step by step to move forward, but move forward by leaps and bounds, relative to the beginning of the data becomes more orderly “”, this helps speed up the subsequent second round hill sorting efficiency.

The second round:

Hill increment is4/2 = 2. Then insert sort is performed on each group of data grouped by 2. The structure after grouping is shown below:

After inserting and sorting each group, the structure is as follows:

It can be seen that the elements are not completely ordered after the second round, but the data in the first round become “more ordered”, which helps to speed up the efficiency of hill sorting in the subsequent third round.

The third round:

Hill increments of 2/2=1 indicate that this is the last round of sorting, after which the elements will be sorted. Insert sort for each group of data indexed by 1. The structure after grouping is shown below:

As you can see, eventually all the elements are grouped into a group for the total insertion sort. After sorting, the structure is as follows:

It can be seen that the insertion sort of the data after the previous two rounds of Hill sorting, the two data only need to exchange at most once to achieve the order, which is much faster than the direct insertion sort of the initial element set. In fact, only 4 inner layer swaps are needed:

2.4.3 Complexity analysis of Hill sort

As can be seen from the above case, the key of Hill sorting is not to sort randomly, but to form a sub-sequence of records separated by a certain “increment” to achieve leap-forward movement and improve the sorting efficiency. Here the choice of “increment” is critical.

There are two common types of delta sequences, Shell delta sequences and Hibbard delta sequences.

The recursion formula of Shell delta sequence is:

The worst time complexity of Shell incremental sequences isO (N squared).

The recursive formula of Hibbard increment sequence is:

The worst time complexity of Hibbard delta sequence is O(N ^(3/2)^). The average time complexity is about O(N^(5/4)^), which is better than O(N^ 2) for direct insertion sort. At this point, Hill sort breaks through the sub-quadratic time bound.

Note that the last increment in the sequence must be equal to 1. In addition, hill sorting is not a stable sorting algorithm because the records move by leaps. The time complexity is directly related to the incremental sequence, and if the incremental sequence is not well chosen, it is almost impossible to achieve more efficiency than direct insertion.

Hill sort does not use external auxiliary variables or data structures, so the spatial complexity is O(1).

2.5 Heap Sort

2.5.1 Principle and implementation of heap sort

Simple selection sort, which selects the smallest of the n records to be sorted and compares them n-1 times. Unfortunately, such operations did not make the comparison results of each round, in the comparison of the round, there are many more in the first round has been done, but because the previous round when ordering not save these comparison results, so round sorting after repeat again the comparison operation, therefore, relatively more recording.

If you can remember the result of the comparison every time you select the smallest record, and make adjustments to other records based on the comparison results, the overall efficiency of sorting will be very high. And HeapSort, is a kind of improvement on simple selection sort, the effect of this improvement is very obvious. The heapsort algorithm was invented by Floyd and Williams in 1964, when they invented the “heap” data structure. Heap structure is related to trees, so you need some basic knowledge. If you don’t understand binary trees, check out this column: Trees.

A heap is a binary tree with the following properties:

  1. It is a complete binary tree, and an important property is that the nodes of a complete binary tree can map perfectly into an array;
  2. Each node has a value greater than or equal to the values of its left and right children, known as the big top heap/maximum heap, as shown below to the left; Or each node has a value less than or equal to the value of its left and right children, known as the small top/minimum heap, as shown on the right.

If the nodes are numbered from 1 in the way of sequential traversal, due to the natural relationship between nodes of a complete binary tree (the nature of binary trees), the relationship between nodes is as follows:

Big top pile: [I] > k = k [2] I [I] && k > = k (2 + 1] I (1 < = I < = n / 2) small top pile: k [I] [2] I < = k & k [I] < = k (2 + 1] I (1 < = I < = n / 2)

If the nodes are numbered from 0 and mapped into an array, the relationship between nodes is as follows:

Big top pile: arr [I] > = arr [2 + 1] I && arr [I] > = arr [2 + 2] I (0 < = I < = n / 2-1) small top pile: Arr [I] < = arr [2 + 1] I && arr [I] < = arr [2 + 2] I (0 < = I < = n / 2-1)

The big top heap maps to an array structure:

The small top heap maps to an array structure:

N is the length of the array, and n/ 2-1 actually represents the index position of the last non-leaf node from beginning to end of the array.

Heap Sort (Heap Sort) Heap Sort (Heap Sort)

The principle of heap sorting: firstly, the unordered sequence with a given N values is constructed into a big/small top heap (generally, the big top heap is used in ascending order, and the small top heap is used in descending order). The top element is then swapped with the tail element to make the tail element maximum/small, and the tail element is removed from the heap. The remaining N-1 heap elements are then adjusted to become the large/small top heap, the top element is swapped with the bottom element, and the bottom element is removed from the heap, resulting in the second large/small element. And so on and so forth. Until the last element is left in the heap, at which point the heap sort is done, and you get an ordered sequence.

Heap sort is also an unstable sorting algorithm because records are compared and exchanged by leaps and bounds. In addition, the heap needs to be built regardless of the number of numbers, and it is not suitable for the case with a small number of sequences to be sorted because of the large number of comparisons required to build the heap initially.

A good implementation of heap sort (big, small top heap) is as follows:

public class HeapSort {
    public static void main(String[] args) {
        int[] arr = new int[] {49.38.65.97.76.13.27.49.78};
        // Encapsulate the big top heap sort algorithm
        bigHeapSort(arr);
        System.out.println(Arrays.toString(arr));
        // Encapsulate the small top heap sort algorithm
        // We can see that the big top heap and the small top heap sort algorithm are similar
        smallHeapSort(arr);
        System.out.println(Arrays.toString(arr));
    }


    /** * Big top heap sort (order) **@paramArr requires the data set to be sorted */
    private static void bigHeapSort(int[] arr) {
        /* build a big top heap */
        /* I starts at the index of the last non-leaf node and builds down until I =-1 ends the loop where the index of the element starts at 0, so the last non-leaf node array.length/ 2-1, which takes advantage of the property of a complete binary tree */
        for (int i = arr.length / 2 - 1; i >= 0; i--) {
            buildBigHeap(arr, i, arr.length);
        }
        /*2, start heap sort, I = arr. Length - 1, that is, start from the end of the big top heap, I =0 to end the loop */
        for (int i = arr.length - 1; i > 0; i--) {
            // Swap the top and bottom of the heap
            swap(arr, 0, i);
            // Rebuild the big top heap
            buildBigHeap(arr, 0, i); }}/** * Build the big top heap **@paramArray arr *@paramI Index of non-leaf nodes *@paramLength Indicates the heap length */
    private static void buildBigHeap(int[] arr, int i, int length) {
        // Take out the current non-leaf node element first, because the current element may keep moving
        int temp;
        // Index of the child node of the node
        int childIndex;
        /* Loop to determine whether the parent is greater than two children, and end the loop if the left child index is greater than or equal to the heap length or the parent is greater than or equal to two children */
        for (temp = arr[i]; (childIndex = 2 * i + 1) < length; i = childIndex) {
            //childIndex + 1 < length indicates that the node has a right child. If the value of the right child is greater than that of the left child, then childIndex is incremented by 1
            if (childIndex + 1 < length && arr[childIndex] < arr[childIndex + 1]) {
                childIndex++;
            }
            // If the largest child is found to be larger than the root node, swap values to ensure that the root node of the large top heap is larger than the child node
            // If the child node is changed, then the root of the child node will be affected, so continue the loop to determine the tree where the child node is
            if (arr[childIndex] > temp) {
                swap(arr, i, childIndex);
            } else {
                // The parent node is larger than the largest child node, which satisfies the maximum heap condition and terminates the loop
                break; }}}/** * Small top heap sort (reverse order) **@paramArr requires the data set to be sorted */
    private static void smallHeapSort(int[] arr) {
        /* build a small top heap */
        /* I starts at the index of the last non-leaf node and builds down until I =-1 ends the loop where the index of the element starts at 0, so the last non-leaf node array.length/ 2-1, which takes advantage of the property of a complete binary tree */
        for (int i = arr.length / 2 - 1; i >= 0; i--) {
            buildSmallHeap(arr, i, arr.length);
        }
        /*2, start heap sort, I = arr. Length - 1, that is, the number of the bottom of the small top heap, until I =0 to end the loop */
        for (int i = arr.length - 1; i > 0; i--) {
            // Swap the top and bottom of the heap
            swap(arr, 0, i);
            // Rebuild the small top heap, at this point the heap size is -1 before the swap
            buildSmallHeap(arr, 0, i); }}/** * Build a small top heap **@paramArray arr *@paramI Index of non-leaf nodes *@paramLength Indicates the heap length */
    private static void buildSmallHeap(int[] arr, int i, int length) {
        // Take out the current non-leaf node element first, because the current element may keep moving
        int temp;
        // Index of the child node of the node
        int childIndex;
        /* The loop determines whether the parent is greater than two children and terminates the loop if the index of the left child is greater than or equal to the heap length or the parent is less than or equal to two children */
        for (temp = arr[i]; (childIndex = 2 * i + 1) < length; i = childIndex) {
            //childIndex + 1 < length indicates that the node has a right child, and if the value of the right child is less than that of the left child, then childIndex increases by 1, i.e. childIndex points to the right child
            if (childIndex + 1 < length && arr[childIndex] > arr[childIndex + 1]) {
                childIndex++;
            }
            // If it is found that the smallest node (left and right child nodes) is smaller than the root node, a value swap is required to satisfy that the root node of the small top heap is smaller than the child node
            // If the child node is changed, then the root of the child node will be affected, so continue the loop to determine the tree where the child node is
            if (arr[childIndex] < temp) {
                swap(arr, i, childIndex);
            } else {
                // The parent node is smaller than the smallest child node, which satisfies the minimum heap condition and terminates the loop
                break; }}}/** * swap elements **@paramArray arr *@paramThe subscript star of the element A@paramThe subscript of the b element */
    private static void swap(int[] arr, int a, int b) { arr[a] = arr[a] ^ arr[b]; arr[b] = arr[a] ^ arr[b]; arr[a] = arr[a] ^ arr[b]; }}Copy the code

2.5.2 Sorting process analysis

2.5.2.1 building heap

The algorithm first builds the heap in the first large loop, where I starts at the index arr. Length /2 — 1=3, and changes from 3-2-1-0, which are actually non-leaf nodes.

The following diagram shows the logical structure of an array mapped to a complete binary tree:

What we mean by building the sequence to be sorted into a heap is that we go from bottom to top and right to left, treating each non-leaf node as the root and adjusting it to the heap with its subtrees.

Let’s look at the first step, how to build a heap, using the big top heap as an example.

// Build the big top heap
for (int i = arr.length / 2 - 1; i >= 0; i--) {
    buildBigHeap(arr, i, arr.length);
}

private static void buildBigHeap(int[] arr, int i, int length) {
    // Take out the current non-leaf node element first, because the current element may keep moving
    int temp;
    // Index of the child node of the node
    int childIndex;
    /* Loop to determine whether the parent is greater than two children, and end the loop if the left child index is greater than or equal to the heap length or the parent is greater than two children */
    for (temp = arr[i]; (childIndex = 2 * i + 1) < length; i = childIndex) {
        //childIndex + 1 < length indicates that the node has a right child, and if the value of the right child is greater than that of the left child, then childIndex is incremented by 1
        if (childIndex + 1 < length && arr[childIndex] < arr[childIndex + 1]) {
            childIndex++;
        }
        // If the largest child is found to be larger than the root node, swap values to ensure that the root node of the large top heap is larger than the child node
        // If the child node is changed, then the root of the child node will be affected, so continue the loop to determine the tree where the child node is
        if (arr[childIndex] > temp) {
            swap(arr, i, childIndex);
        } else {
            // The parent node is larger than the largest child node, which satisfies the maximum heap condition and terminates the loop
            break; }}}Copy the code

First loop:

ChildIndex = 2 * I +1= 7; childIndex= 2 * I +1= 7; childIndex=7;

Then it is determined whether the value of the parent node is greater than or equal to the largest child node. If not, it is necessary to swap values in order to satisfy the requirement that the value of the root node of the large top heap is greater than that of the child node. Here arR [3]=97> ARR [8]=78, so there is no need to exchange, at this point, the cycle is directly broken to end and the next cycle is carried out. The array is not restructured at this point.

Second loop:

ChildIndex =2 * I +1= 5; childIndex= 2 * I +1= 5; childIndex= 2 * I +1= 5;

Then it is determined whether the value of the parent node is greater than or equal to the largest child node. If not, it is necessary to swap values in order to satisfy the requirement that the value of the root node of the large top heap is greater than that of the child node. Here arR [2]=65> ARr [6]=27, so there is no need to swap. At this point, the loop is directly broken to end and the next loop is carried out. The array is not restructured at this point.

Third loop:

ChildIndex = 2 * I + 1=3; childIndex = 2 * I + 1=3;

Then it is determined whether the value of the parent node is greater than or equal to the largest child node. If not, it is necessary to swap values in order to satisfy the requirement that the value of the root node of the large top heap is greater than that of the child node. Here arr[1]=38

Then, as the child node is replaced, the subtree with the child node as the root will be affected. Therefore, the loop continues to judge the tree where the child node is located after the exchange. ChildIndex = 2 * I +1= 7 childIndex= 2 * I +1= 7 childIndex= 2 * I +1= 7

Then it is determined whether the value of the parent node is greater than or equal to the largest child node. If not, it is necessary to swap values in order to satisfy the requirement that the value of the root node of the large top heap is greater than that of the child node. Here arr[3]=38

Then, as the child node is replaced, the subtree with the child node as the root will be affected. Therefore, the loop continues to judge the tree where the child node is located after the exchange. I =8, then get the left childIndex childIndex = 2 * I + 1=17, greater than length 9, so end the third loop.

Fourth cycle:

ChildIndex = 2 * I + 1=1; childIndex = 2 * I + 1=1; childIndex =1;

Then it is determined whether the value of the parent node is greater than or equal to the largest child node. If not, it is necessary to swap values in order to satisfy the requirement that the value of the root node of the large top heap is greater than that of the child node. Here arr[0]=47<arr[1]=97, so we need to switch positions, then the array edge becomes:

Then, as the child node is replaced, the subtree with the child node as the root will be affected. Therefore, the loop continues to judge the tree where the child node is located after the exchange. I =1; childIndex = 2 * I + 1=3; childIndex = 2 * I + 1=3;

Then it is determined whether the value of the parent node is greater than or equal to the largest child node. If not, it is necessary to swap values in order to satisfy the requirement that the value of the root node of the large top heap is greater than that of the child node. Here arr[1]=49<arr[3]=78, so we need to switch positions, then the array edge becomes:

Then, as the child node is replaced, the subtree with the child node as the root will be affected. Therefore, the loop continues to judge the tree where the child node is located after the exchange. I =3; childIndex = 2 * I + 1=7; childIndex = 2 * I + 1=7;

Then it is determined whether the value of the parent node is greater than or equal to the largest child node. If not, it is necessary to swap values in order to satisfy the requirement that the value of the root node of the large top heap is greater than that of the child node. Here arr[1]=49= ARr [3]=49, so there is no need to exchange, at this point, the loop is directly broken to end and the next loop is carried out. The array is not restructured at this point.

Next loop I –= -1, < I <0, then the first big loop ends and the big top heap is built.

It is mapped to the balanced binary tree with the following structure:

As you can see, this balanced binary tree perfectly conforms to the property of the big top heap, where the parent node is greater than or equal to its children.

All that’s left is the second part to do the repeated exchange – rebuild.

2.5.2.1 Switching to Rebuild

/*2, start heap sort, I = arr. Length - 1, that is, start from the end of the big top heap, I =0 to end the loop */
for (int i = arr.length - 1; i > 0; i--) {
    // Swap the top and bottom of the heap
    swap(arr, 0, i);
    // Rebuild the big top heap
    buildBigHeap(arr, 0, i);
}

Copy the code

With the first part understood, the second part should be easy to follow.

Pile top node of the 0 index elements namely the tail index and pile end node element to swap the position, and then the last element “kick” heap, at this time due to the pile top node element changed, therefore needs the heap, the refactoring is not like the first part of cycle to build, but build could affect element, because the pile top node element change, at this time of the incoming I = 0, Order requires kicking out the end of the heap, so the passed length becomes arr. Length — 1=8. The position of the elements in the array is as follows:

Red represents the heap elements and black represents the elements that were “kicked out” of the heap. Then start a build heap.

I =0; childIndex = 2 * I + 1=1; childIndex = 2 * I + 1=1;

Then it is determined whether the value of the parent node is greater than or equal to the largest child node. If not, it is necessary to swap values in order to satisfy the requirement that the value of the root node of the large top heap is greater than that of the child node. Here arr[0]=38

Then, as the child node is replaced, the subtree with the child node as the root will be affected. Therefore, the loop continues to judge the tree where the child node is located after the exchange. ChildIndex = 2 * I +1= 3; childIndex= 2 * I +1= 3; childIndex= 2 * I +1= 3;

Then it is determined whether the value of the parent node is greater than or equal to the largest child node. If not, it is necessary to swap values in order to satisfy the requirement that the value of the root node of the large top heap is greater than that of the child node. Here arr[1]=38<arr[4]=76, so we need to switch positions, then the array edge becomes:

Then, as the child node is replaced, the subtree with the child node as the root will be affected. Therefore, the loop continues to judge the tree where the child node is located after the exchange. I =4; childIndex = 2 * I + 1=9; The logical structure of the balanced binary tree (i.e. heap) is as follows:

Here the red nodes represent the nodes that were “kicked out” of the heap, and you can see that the adjusted balanced binary tree also fully conforms to the maximum heap property.

All that remains is to loop over and over until finally I =0, where eight elements are “kicked out” of the heap and the smallest element is left, which completes the loop and completes heap sorting. The array structure is as follows:

We can see that when an element removed from the stack, only need to rebuild the elements in the heap of trees and the influence of pile after pile of the elements to build to build, without affecting the other data, not redundant comparison, this is because the building in front of the reactor operation, all the order of the elements will be preserved. That’s why heap sort is faster.

2.5.3 Complexity analysis of heap sort

The running time of heap sort is mainly consumed by the initial build of the heap and repeated filtering during the sort rebuild of the heap.

Build a heap:

Given a height of k, starting with the nodes to the right of the second-to-last layer, all nodes in this layer perform child node comparisons and then swap (no swap if the order is correct); At the third layer from the bottom, the child nodes will be selected for comparison and swap. If there is no swap, the execution can not continue. If so, a subtree should be selected to compare and swap.

So the total time is calculated as: s = 2^(I – 1) * (k – I); Where I is the level, 2^(I – 1) is how many elements there are on that level, and (k – I) is the number of times the subtree has to compare, or in the worst case, swap after the comparison; Because this is a constant, you can ignore it when you factor it out;

S = 2^(k-2) * 1 + 2^(k-3)2….. – 2) + 2 (k + 2 ^ (0) * (k – 1) = = = > because leaves need not exchange layer, so I from k – 1 to 1;

S = 2^ k-k-1; And since k is the depth of the complete binary tree, and k=log(n) + 1, substitute this; S = 2n-logn-2, so the time complexity is O(n).

Sort rebuild heap:

Each rebuild means that one node goes out of the heap, so you need to reduce the size of the heap by one. The time complexity of the heap-building function k=log(n), where k is the number of layers of the heap. Therefore, in each reconstruction, as the heap capacity decreases, the number of layers will decrease, and the time complexity of the function will change. A total of N-1 cycles are required to rebuild the heap. The number of comparisons in each cycle is log(I), which adds up to: log2+log3+… + log (n – 1) + log (n) material log (n.) . log(n!) It’s the same order as nlogn, so it’s order nlogn.

So in general, the time complexity of heap sort is O(n+nlogn)=O(nlogn). Since heap sort is not sensitive to the sort state of the original record, it is O(nlogn) for best, worst, and average time complexity. This is obviously far better in performance than the time complexity of over-bubbling, simple selection, direct insertion O(n^2^). However, it is also difficult to implement because of the relatively complex data structure of the heap (the above example uses a logical heap, a physical structure or an array).

In terms of space complexity, it does not really rely on the physical structure of the heap, but still operates on the original array, and its space complexity is constant order O(1). Note that some heapsort implementations use additional data structures, which greatly increase the spatial complexity, so the algorithm in this case is a better algorithm.

2.6 Merge Sort

2.6.1 Principle and implementation of merge sort

Merge Sort is an effective sorting algorithm based on Merge operation. It is a typical application of Divide and Conquer. The divide phase divides problems into small problems and solves them recursively, while the conquer phase “merges” the answers obtained in the divide and conquer phase.

Merging sort principle: if the initial sequence contains N records, divide the total record into two sub-sequences of the same length first, then continue to split the two sub-sequences (using recursion), and finally split into N ordered sub-sequences, the length of each sub-sequence is 1; Then two merge, get | | n / 2 (not less than the minimum integer x | | x said) a length of 2 or 1 an orderly sequence; Merge two more, and repeat (using recursion) until you get an ordered sequence of length N. This sort is also known as binary merge sort.

Merge sort is a stable sorting algorithm, that is, the order of equal elements does not change, second only to quicksort.

A kind of union sort implementation is as follows:

public class MergeSort {

    public static void main(String[] args) {
        int[] arr = new int[] {49.38.65.97.76.13.27.49.78};
        mergeSort(arr, new int[arr.length], 0, arr.length - 1);
        System.out.println(Arrays.toString(arr));
    }

    / sort and owned by * * * * *@paramArr The array to sort by *@paramAidedArr auxiliary array *@paramL The starting index of the split left array *@paramR split the end index of the right array */
    private static void mergeSort(int[] arr, int[] aidedArr, int l, int r) {
        // If the array is greater than or equal to two elements, split, otherwise return recursively
        if (l < r) {
            /* * */
            // Calculate the split median as the end index of the left array
            int mid = (r + l) / 2;
            // Split into a left subarray, and then continue to recursively split the left subarray as its parent until it splits into two "ordered" subarrays with only one element
            mergeSort(arr, aidedArr, l, mid);
            // Split into right subarrays, and continue to recursively split until you have two "ordered" subarrays with only one element
            mergeSort(arr, aidedArr, mid + 1, r);
            /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
            merge(arr, aidedArr, l, mid, mid + 1, r); }}/** * sort - merge **@paramArr Needs to sort the array *@paramAidedArr auxiliary array *@paramLeftStart The starting index of the left array *@paramThe end index of the leftEnd left array *@paramRightStart Start index * of the right array@paramRightEnd End index */ of the right array
    private static void merge(int[] arr, int[] aidedArr, int leftStart, int leftEnd, int rightStart, int rightEnd) {
        // select * from index m; Gets the number of elements of the two adjacent subarrays, numElements, used later
        int m = leftStart, numElements = rightEnd - leftStart + 1;
        // If the start position of the left array is less than or equal to the end position of the left, and the start position of the right array is less than or equal to the end position of the right array, then compare the size of their relative positions, and add the smaller elements to the corresponding index position of the new array (starting from the left index)
        // The position of the added element is then incremented by 1, and the loop continues until one of the conditions is not met, ending the loop
        while (leftStart <= leftEnd && rightStart <= rightEnd) {
            aidedArr[m++] = arr[leftStart] <= arr[rightStart] ? arr[leftStart++] : arr[rightStart++];
        }
        // If the starting position of the left array is less than or equal to the ending position of the left array, the above loop did not add the elements of the left array
        while (leftStart <= leftEnd) {
            aidedArr[m++] = arr[leftStart++];
        }
        // If the start of the right array is less than or equal to the end of the right array, the above loop did not add the elements to the right array
        while (rightStart <= rightEnd) {
            aidedArr[m++] = arr[rightStart++];
        }
        // Then copy the elements of the new array to the corresponding index of the original array. This step is necessary to ensure that the elements are sorted
        for (int j = 0; j < numElements; j++, rightEnd--) { arr[rightEnd] = aidedArr[rightEnd]; }}}Copy the code

2.6.2 Sorting process analysis

As you can see from the code, the actual merge sort is not too difficult, but the important thing is to understand the divide-and-conquer idea and how to implement it cleverly with the help of recursion.

2.6.2.1 split

/* * */
// Calculate the split median as the end index of the left array
int mid = (r + l) / 2;
// Split into a left subarray, and then continue to recursively split the left subarray as its parent until it splits into two "ordered" subarrays with only one element
mergeSort(arr, aidedArr, l, mid);
// Split into right subarrays, and continue to recursively split until you have two "ordered" subarrays with only one element
mergeSort(arr, aidedArr, mid + 1, r);

Copy the code

Split part of the code is only a few lines above, but the use of recursion, as long as the array elements more than two, then is split into two subarrays, and then the left and right subarrays continue to recursively split, of course, recursive calls are return conditions, return conditions here is the beginning of the judgment: L < r, if l is not less than r, the array has only one element, the statement in if is not executed, the recursive method returns, and the entire sequence is split into subarrays of only one element. The recursive return is followed by the merge method, sort-merge.

The following figure shows the process of splitting, which is 8 times in 4 rounds:

2.6.2.2 Sort – Merge

/** * sort - merge **@paramArr Needs to sort the array *@paramAidedArr auxiliary array *@paramLeftStart The starting index of the left array *@paramThe end index of the leftEnd left array *@paramRightStart Start index * of the right array@paramRightEnd End index */ of the right array
private static void merge(int[] arr, int[] aidedArr, int leftStart, int leftEnd, int rightStart, int rightEnd) {
    // select * from index m; Gets the number of elements of the two adjacent subarrays, numElements, used later
    int m = leftStart, numElements = rightEnd - leftStart + 1;
    // If the start position of the left array is less than or equal to the end position of the left, and the start position of the right array is less than or equal to the end position of the right array, then compare the size of their relative positions, and add the smaller elements to the corresponding index position of the new array (starting from the left index)
    // The position of the added element is then incremented by 1, and the loop continues until one of the conditions is not met, ending the loop
    while (leftStart <= leftEnd && rightStart <= rightEnd) {
        aidedArr[m++] = arr[leftStart] <= arr[rightStart] ? arr[leftStart++] : arr[rightStart++];
    }
    // If the starting position of the left array is less than or equal to the ending position of the left array, the above loop did not add the elements of the left array
    while (leftStart <= leftEnd) {
        aidedArr[m++] = arr[leftStart++];
    }
    // If the start of the right array is less than or equal to the end of the right array, the above loop did not add the elements to the right array
    while (rightStart <= rightEnd) {
        aidedArr[m++] = arr[rightStart++];
    }
    // Then copy the elements of the new array to the corresponding index of the original array. This step is necessary to ensure that the elements are sorted
    for (int j = 0; j < numElements; j++, rightEnd--) { arr[rightEnd] = aidedArr[rightEnd]; }}Copy the code

The above code is simple, also is comparison on two subarray size, from small to large, in the corresponding auxiliary array index as the segment element already sorted, and then sorted the position of the corresponding auxiliary array elements, copy to the original array the same index, a merger is completed, and then start the next sort – merger, Sort-merge is the result of the previous split, and it takes advantage of the recursive nature of continuing to execute successive layers of recursive code after the method returns. Let’s break it down step by step.

Because of the recursive nature, the first method returned will be the first to perform the following merge, a sort-merge process. Combining the above split diagram and code (first recursive split left subarray, then recursive split right subarray), it can be seen that the split method with the last turn will return first, and the split method with the left subarray will return first. Therefore, the sort-merge process and sequence are shown as follows:

2.6.3 Complexity analysis of merge Sort

Time complexity:

For a sequence with N elements, n elements need to be scanned once for each round of union sorting, and the time complexity is O(n). The depth of the complete binary tree is | logn |, so the whole union sorting needs to go through logn rounds, and the total time complexity is O(nlogn). Merge sort requires comparison of elements in pairs, there is no jump, so merge sort is a stable sorting algorithm, O(nlogn) is the best, worst, average time performance of merge sort algorithm.

Space complexity:

The space complexity of merging is the space taken up by the temporary array and the data pushed recursively: n + logn; So the space complexity is order n. Merge sort is stable and efficient in time, but it consumes space.

2.7 Quick Sort (Quick Sort)

2.7.1 Principles of quicksort

Hill sort is an upgrade of direct insert sort, which is the same class as insert sort, and heap sort is an upgrade of simple selection sort, which is the same class as selection sort. Quicksort is an upgrade of what we thought was the slowest bubble sort, both of which belong to the swap sort class. Quicksort was proposed by C. A. R. Hoare in 1960 and is known as. Relative to the bubble sort, quick sort increases the record comparison and moving distance, move the keyword larger record directly from the front to the back, keyword smaller record directly move to the front, from behind it doesn’t like bubble only every time the exchange of adjacent two Numbers, so the overall comparison and exchange of this number is less, higher speed of nature.

The basic type sort in Java source code uses quicksort, which is more suitable for basic types with a balance between comparison and data movement. There are many specific implementations of quicksort, simply divided into single-axis quicksort (as defined above, with a central point), double-axis quicksort (JDK1.8 quicksort is a kind of double-axis quicksort, with two central points).

As of JDK1.7, the TimSort for generic (Comparator, Comparable) object sorting in Java source code is based on merge sort. Because of the overhead of comparing object types, merge sort has the lowest number of comparisons of any popular algorithm. Relatively high data movement (in Java, object references are moved, not real objects, so the overhead is low).

TimSort is an upgrade to union sorting, with insertion sorting. TimeSort is a kind of stable, adaptive, iterative and merge sort. When running on partially sorted array, it can recognize the sorted part, and the time complexity is much less than O (nlogn). When running on random number group, the performance cost is equivalent to traditional merge sort. The Java version of the algorithm implementation is based on an improvement originally implemented by Tim Peters in Python in 2002: The Python implementation of TimSort. TimSort is not one of the top ten classic sorting algorithms, but it is faster than the classical sorting algorithms, and some modern faster algorithms will be introduced separately in a subsequent article.

Quick sort principle: by means of a trip to sort to sort data split into separate two parts, one part of all the data are smaller than the other part of all data (find a benchmark, and part number is smaller than benchmark, another part of the larger than the benchmark), thus to find the basic value in the array in the right place; Then the method is used to sort the two parts of the data quickly, so as to achieve the whole data into an ordered sequence. And you can see that we’re still using the divide-and-conquer idea, and we can sort recursively.

Quicksort is an unstable sorting algorithm, that is, the relative positions of multiple identical values may change at the end of the algorithm.

2.7.2 Optimization of quicksort

Before providing the implementation, let’s explain a few problems and how to solve them.

2.7.2.1 How to Select a reference Value

Each quicksort needs to select a reference value, and if we select a reference value in the middle of the entire sequence, then we can make the length of the two subsequences half of the original, so the running time of quicksort is O(nlogn), which is the same as merge sort.

A common garbage selection is to use the first element as a reference value. If the input is truly random, this method is correct. If the input is in positive or reverse order, each partition results in a subsequence that is one less than the last partition, and note that the other one is empty. So you have to recursively call all the elements, and the time is O(n ^ 2). And usually the input elements aren’t really random, they’re usually partially ordered. Therefore, it is not advisable to arbitrarily take a fixed element as a reference value.

The other option is to randomly select a reference value in the sequence, which is perfectly correct and, to some extent, solves the performance bottleneck of quicksort for basically ordered sequences. However, Random numbers are usually expensive to generate (such as Random in Java, which is not really Random), so this method is correct but not practical.

The common correct approach is the median-of-three method. That is, the three keywords are sorted first, and the middle number is used as the pivot. Generally, the left end, the right end and the middle three numbers can also be randomly selected. In this way, at least the middle number must not be the smallest or the largest number. In terms of probability, it is very unlikely that all three numbers are the smallest or the largest number. Therefore, the possibility that the middle digit is in the middle value is greatly increased. Since the whole sequence is in an unordered state, randomly selecting three numbers is the same as taking three numbers from the left, middle and right ends, and the random number generator itself will also bring time overhead, so random generation is not considered.

If the amount of data is very large, you can also use the nine-digit method.

2.7.2.2 Implementation of segmentation policy

After finding the reference value, all that remains is how to split the array against the reference value. The segmentation strategy is also more, here introduces a correct and more efficient method. That’s the two-pointer method, which compares the two ends of an array.

Group: {38, 49, 65, 97, 49, 64, 27, 49, 78}.

The base value is first selected using the trinary method, and then a series of elements are swapped so that the base value is at the leftmost index:

So let’s take 49 as our base, which is much better than 38.

After the base value is selected, the real segmentation begins. The segmentation stage is to move the elements less than and the base value to the left, and the elements greater than the base value to the right. The principle of the two-pointer method is as follows:

Two Pointers are used, pointer I starting from the element after the reference value and pointer j starting from the last element. Move I to the right, passing the element less than the reference value, until it finds the element greater than the reference value or until it finds the position of J; Move j to the left, past elements larger than the base value, until you find elements smaller than the base value and stop. If the Pointers stop, if it’s still I <j, then swap the position of the element they’re in, and then continue looking. If it’s I >=j, then the partition is done, and now the next step is to find the correct position of the reference value in the array.

Looking for basic value in the correct position of the array is very simple, just need to the location of the location of the j value and basic value value exchange, the segmentation is real, the location of the basic value of sorted has been found, the rest is the basic value on both sides of the array, to continue the recursive integral operation. The condition returned is that the subarray has only one element.

Here is to consider, when an element and the basic value is equal to do, here we also adopted strategies of “stop the pointer can imagine if a sequence of elements all are equal, so may cause a lot of the same element of exchange, it seems no sense, but the benefits are will eventually in the interaction between I and j, So you can split an array into two nearly equal subarrays. This is like a union sort, running time up to order logn. Therefore, quicksort is an unstable sorting algorithm, that is, the relative positions of multiple identical values may change at the end of the algorithm.

Before and after the first exchange:

Arr [I]=arr[j]=49.

Before and after the second exchange:

Continue searching, before and after the third exchange:

Then I and J continue to search separately, and the following situation occurs:

According to the above steps of the double pointer method, at this time I >=j is established, even if the segmentation is completed, start to search for the position of the reference value, as previously said, the position search is very simple, that is, after the last segmentation of the position of J, let j and the reference value of the position of the element exchange on the line:

At this point, the sorted position J of reference value 49 has actually been found and the element has been returned. As can be seen, the split left and right subarray elements meet the conditions: all elements of the left subarray are less than or equal to the base value, and all elements of the right subarray are greater than or equal to the base value. Therefore, this value can be excluded when continuing the recursive partition of the two subarrays:

The following is the split process diagram of the left and right subarrays **, which is relatively simple: **

We can see that the left subarray has three elements after the right subarray is split, so we need to recursively split it again:

As you can see, when the array is recursively divided and returned (as long as the subarray has only one element), the array is already sorted:

Is it similar to merge sort? In fact, the idea of understanding it especially the idea of recursive segmentation, quicksort is relatively simple!

2.7.2.3 Sorting small arrays

As you can see from the quicksort algorithm, the array will eventually be split into one element length. If the array is very small, quicksort is actually not as good as direct insert sort (which is the best performance of simple sort). The reason for this is that quicksort uses recursive operations, which are negligible for the overall algorithmic advantage of sorting large numbers of data, but not suitable for arrays with relatively few records to sort.

So you can add a judgment that when the array length is less than a few hours, insert sort is used instead of quicksort. There is no specific rule about what this value is, and the Java Language Description of Data Structures and Algorithms suggests 10, while in JDK1.8’s arrays.sort () method, it is 47.

2.7.3 Implementation case of single-axis fast row

2.7.3.1 Common Implementation

The following provides a common implementation of single-axis quicksort, that is, using only quicksort:

public class QuickSort {
    public static void main(String[] args) {
        / / int [] arr = new int [] {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
        int[] arr = new int[] {5.4.3.2.1.0, -1, -2, -3, -4};
        //int[] arr = new int[]{49, 38, 65, 97, 64, 49, 27, 49, 78};
        //int[] arr = new int[]{49, 38, 49, 97, 49, 49, 49, 49, 49, 49, 38, 49, 97, 49, 49, 49, 49, 49};
        //int[] arr = new int[]{49, 38, 65, 97, 65, 13, 27, 49, 78};
        //int[] arr = new int[]{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1};
        / / int [] arr = new int [] {1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1};
        //int[] arr = new int[]{49, 65, 38, 97, 49, 78, 27, 11, 49, 49, 65, 38, 97, 49, 78, 27, 11, 49};
        quickSort(arr);
        System.out.println(Arrays.toString(arr));
    }


    /** * quicksort **@paramArr The array to sort by */
    private static void quickSort(int[] arr) {

        if (arr == null || arr.length <= 1) {
            return;
        }
        quickSort(arr, 0, arr.length - 1);
    }


    /** * fast core algorithm, recursive implementation **@paramArr The array to sort by *@paramLeft Start index *@paramRight End index */
    private static void quickSort(int[] arr, int left, int right) {
        /* If the length is greater than 4, go quicksort; otherwise, go insert sort */
        if (left < right) {
            /* return the correct index position */
            int baseIndex = partition(arr, left, right);
            / * 2, recursion to split after two subarray continue sorting, due to the basic value or above have identified so far, so you can rule out the benchmark index * /
            quickSort(arr, left, baseIndex - 1);
            quickSort(arr, baseIndex + 1, right); }}/** * Split array find a reference value, divide the array into two parts, one is smaller than the reference value, the other is larger **@paramArr needs to split the array *@paramLeft Left start index *@paramRight end index *@returnThe correct index of the base value in the entire array */
    private static int partition(int[] arr, int left, int right) {
        // It is not reasonable to take the leftmost value
        int base = arr[left];
        /* select * from */
        // Record the initial value of the index, which will be used later
        int i = left, j = right;
        while (true) {
            /* Search from left to right, then from right to left, not disorderly order, if disorderly then need to change the code */
            // Start from left to right until you find a number greater than or equal to base
            // I must be less than or equal to j
            while (arr[++i] < base && i < j) {
            }
            // Search from right to left until you find a number less than or equal to base
            //j can be less than or equal to I
            //j can also be left, which means that base is the smallest number
            while (arr[j] > base) {
                --j;
            }
            // The end of the loop indicates that the position (I 
      
       =j) has been found
      )>
            // If the position is found, then swap the positions of the two numbers in the array
            if (i < j) {
                swap(arr, i, j);
            } else {
                break;
            }
            // If the partition continues, then j--, which is the same as base, will continue to move towards the center instead of staying in the same place
            --j;
        }
        /*3, find the correct position of the reference value in the array, the position should be j value */
        arr[left] = arr[j];
        arr[j] = base;
        /* return the correct index */
        return j;
    }

    /** * swap elements **@paramArray arr *@paramThe subscript star of the element A@paramThe subscript of the b element */
    private static void swap(int[] arr, int a, int b) { arr[a] = arr[a] ^ arr[b]; arr[b] = arr[a] ^ arr[b]; arr[a] = arr[a] ^ arr[b]; }}Copy the code

2.7.3.2 Optimization implementation

Use the trinary method to get the reference value, when the number of elements in the array is less than a certain value, use insertion sort.

public class QuickSort3 {
    public static void main(String[] args) {
        //int[] arr = new int[]{0, 1, 2, 3, 4, 5, 6, 7, 8};
        //int[] arr = new int[]{5, 4, 3, 2, 1, 0, -1, -2, -3};
        int[] arr = new int[] {38.49.65.97.49.64.27.49.78};
        //int[] arr = new int[]{49, 38, 49, 97, 49, 49, 49, 49, 49, 49, 38, 49, 97, 49, 49, 49, 49, 49};
        //int[] arr = new int[]{49, 38, 65, 97, 65, 13, 27, 49, 78};
        //int[] arr = new int[]{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1};
        / / int [] arr = new int [] {1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1};
        //int[] arr = new int[]{49, 65, 38, 97, 49, 78, 27, 11, 49, 49, 65, 38, 97, 49, 78, 27, 11, 49};
        quickSort(arr);
        System.out.println(Arrays.toString(arr));
    }


    /** * quicksort **@paramArr The array to sort by */
    private static void quickSort(int[] arr) {

        if (arr == null || arr.length <= 1) {
            return;
        }
        quickSort(arr, 0, arr.length - 1);
    }

    /** * When the array length is less than or equal to 4, use direct insertion sort (for demonstration purposes, the value is small, the actual value can be larger) */
    private static final int INSERTION_SORT_THRESHOLD = 4;

    /** * fast core algorithm, recursive implementation **@paramArr The array to sort by *@paramLeft Start index *@paramRight End index */
    private static void quickSort(int[] arr, int left, int right) {
        /* If the length is greater than 4, go quicksort; otherwise, go insert sort */
        if (right - left + 1 > INSERTION_SORT_THRESHOLD) {
            /* return the correct index position */
            int baseIndex = partition(arr, left, right);
            / * 2, recursion to split after two subarray continue sorting, because or the position of the benchmark has identified above, so you can rule out the benchmark index * /
            quickSort(arr, left, baseIndex - 1);
            quickSort(arr, baseIndex + 1, right);
        } else{ insertionSort(arr, left, right); }}/** * Split array find a reference value, divide the array into two parts, one is smaller than the reference value, the other is larger **@paramArr needs to split the array *@paramLeft Left start index *@paramRight end index *@returnThe correct index of the base value in the entire array */
    private static int partition(int[] arr, int left, int right) {
        /*1. Use the three-digit method to select the reference value */
        int base = median3(arr, left, right);
        /* select * from */
        // Record the initial value of the index, which will be used later
        int i = left, j = right;
        while (true) {
            /* Search from left to right, then from right to left, not disorderly order, if disorderly then need to change the code */
            // Search from left to right until you find a number greater than or equal to base
            // I must be less than or equal to j
            while (arr[++i] < base && i < j) {
            }
            // Search from right to left until you find a number less than or equal to base
            //j can be less than or equal to I
            //j can also be left, which means that base is the smallest number
            while (arr[j] > base) {
                --j;
            }
            // The end of the loop means the position (I 
            // If the position is found, then swap the positions of the two numbers in the array
            if (i < j) {
                swap(arr, i, j);
            } else {
                break;
            }
            // If the partition continues, then j--, which is the same as base, will continue to move towards the center instead of staying in the same place
            --j;
        }
        /* * * * * * * * * * * * * * * * * * * * * *
        arr[left] = arr[j];
        arr[j] = base;
        /* return the correct index */
        return j;
    }

    /** * if the array is less than or equal to 4, use insert sort **@paramA array *@paramLeft element start index *@paramThe right element ends the index */
    private static void insertionSort(int[] a, int left, int right) {
        int j;
        for (int p = left + 1; p <= right; p++) {
            int noSort = a[p];
            for (j = p - 1; j >= left && noSort < a[j]; j--) {
                a[j + 1] = a[j];
            }
            a[j + 1] = noSort; }}/** * The method to select the base value, compared with each time to select a fixed position of the value is easier to get a better base value **@paramArray arr *@paramLeft Left index *@paramRight Index * on the right@returnThree numbers are selected by the method of reference value */
    private static int median3(int[] arr, int left, int right) {
        // Calculates the index of the middle element of the array
        int center = (left + right) / 2;
        // Switch left and right data to ensure that left is less than or equal to right
        if (arr[right] < arr[left]) {
            swap(arr, left, right);
        }
        // Exchange middle and right data, ensure that the middle is less than or equal to right
        if (arr[right] < arr[center]) {
            swap(arr, center, right);
        }
        // Exchange middle and left data, ensure that the middle is less than or equal to left
        if (arr[left] < arr[center]) {
            swap(arr, left, center);
        }
        Center <=left<=right
        // To do this, we can further swap to make the array as "more ordered" as possible. Note that this step can be omitted
        swap(arr, center, left + 1);
        return arr[left];
    }

    /** * swap elements, we must check whether the table below is not equal, otherwise, if the subscripts are equal ^ will return 0 **@paramArray arr *@paramThe subscript star of the element A@paramThe subscript of the b element */
    private static void swap(int[] arr, int a, int b) {
        if(a ! = b) { arr[a] = arr[a] ^ arr[b]; arr[b] = arr[a] ^ arr[b]; arr[a] = arr[a] ^ arr[b]; }}}Copy the code

2.7.4 Complexity analysis of Quicksort

When dividing the subsequence, you need to select a reference value. If the selected reference value can make the length of the two subsequences half of the original, then the running time of quicksort is O(nlogn), which is the same as that of merge sort. Just like merge sort, after you split the sequence in half logn times, there’s only one data left in the subsequence, and then you’re done sorting the subsequence, and you’re done sorting the total.

In the worst case, the sequence to be sorted is either in positive or reverse order, and each partition results in a subsequence with one less record than the previous partition. Note that the other one is empty. If I were to draw a recursion tree, it would be an oblique tree. In this case, n-1 recursive calls need to be made, and the i-th partition needs n-I keyword comparison to find the i-th record, which is the location of the reference value, and this operation is the same as selection sort. So the number of comparisons is sigma(I =1, n-1, n-i)=(n-1)+(n-2)+… +1=n(n-1)/2, and the final time is O(n ^ 2).

If every number in the data is equally likely to be selected as a reference value, the average elapsed time required is O(nlogn). However, it is difficult to select truly random numbers, that is, it is difficult to generate truly random numbers, and it will take more time.

Quicksort uses recursion, with the best and worst case of time complexity ranging from O(logn) to O(n) space complexity.

2.8 summarize

The indicators of the 7 algorithms are compared as follows:

In terms of the simplicity of the algorithm, we divide the seven algorithms into two categories:

  1. Simple algorithm: bubble, simple selection, direct insertion.
  2. Improved algorithm: Hill, heap, merge, fast.

On average, it is clear that the last three improved algorithms are better than Hill sorting and far better than the first three simple algorithms.

At best, bubbling and direct insertion sort are better, which means that if your sorting sequence is always basically ordered, you should not consider four complex improvements.

At worst, heapsort and merge sort are better than quicksort and other simple sorts.

In terms of the number of records to be sorted, the smaller the number of records to be sorted is, the more appropriate the simple sorting method is. On the contrary, the larger n is, the more appropriate the improved sorting method is. This is why when we optimize quicksort, we add a threshold, below which we switch to insert sort.

The comparison figure of movement times of the three simple sorting algorithms is as follows:

If moving data is time-consuming, then simply selecting sort is a good choice!

3. Non-comparative sort

3.1 Counting Sort (Counting Sort)

3.1.1 Principle and implementation of counting sort

Counting sort is a stable linear time sort algorithm. Counting sort requires that the input data be integers with a defined range. The relative positions of the same numbers before and after sorting remain the same.

Counting sort applies only when the difference between the maximum and minimum values is not large.

Counting sort works as follows:

  1. Prepare an auxiliary count array help of length (max-min+1) for the elements with the maximum -max and minimum -min of the array to be sorted.
  2. Allocation: Count the number of occurrences of arr[I] elements in the original array and store them in the index position corresponding to arr[I]-min of the auxiliary counting array.
  3. Collection: Collection: Populates the target array starting at index=0 based on the reverse of the auxiliary array. Arr [index++]= I + min
public class CountingSort {
    public static void main(String[] args) {
        // The array elements are required to be integers
        //int[] arr = new int[]{3, 3, 5, 7, 2, 4, 10, 1, 13, 15, 3, 5, 6};
        int[] arr = new int[] {49.38.65.97.65.13.27.49.78};
        //int[] arr = new int[]{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1};
        //int[] arr = new int[]{5, 4, 3, 2, 1, 0, -1, -2, -3};
        //int[] arr = new int[]{1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1};
        /*1, get the maximum and minimum values of this step should be known in advance */
        int max = Arrays.stream(arr).max().getAsInt();
        int min = Arrays.stream(arr).min().getAsInt();
        /*2. The size of the auxiliary count array is max-min+1*/
        int[] help = new int[max - min + 1];
        /*2, allocate: auxiliary count array padding */
        // Count the number of occurrences of each element value, increment the value at the auxiliary array's value-min index by 1
        for (int value : arr) {
            help[value - min]++;
        }
        System.out.println("After help fills" + Arrays.toString(help));
        /* arr[index++]= I + min */ arr[index++]= I + min */
        int index = 0;
        for (int i = 0; i < help.length; i++) {
            while (help[i]-- > 0) {
                arr[index++] = i + min;
            }
        }
        System.out.println("After arR sort" + Arrays.toString(arr));
        System.out.println("Help after use"+ Arrays.toString(help)); }}Copy the code

It can be seen from the implementation that counting sort only applies to the case where the difference between the maximum and minimum values is not very large. If the difference is not large, even if there are many elements to be sorted, the auxiliary array is relatively short. Otherwise, the auxiliary array will become very long, causing space complexity to increase.

After counting sort, the relative order of elements with the same value in the output sequence is the same as their relative order in the input sequence. In other words, counting sort algorithm is a stable sorting algorithm. But the above implementation is not a stable algorithm implementation, the above implementation is only for basic types, because both numbers are consistent for basic types, the stable version is provided, but requires that all elements be greater than or equal to 0.

public class CountingSortStable {
    public static void main(String[] args) {
        The count sort stable version requires that the array elements be integers and must be greater than or equal to 0
        //int[] arr = new int[]{3, 3, 5, 7, 2, 4, 10, 1, 13, 15, 3, 5, 6};
        //int[] arr = new int[]{49, 38, 65, 97, 65, 13, 27, 49, 78};
        //int[] arr = new int[]{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1};
        //int[] arr = new int[]{5, 4, 3, 2, 1, 0, -1, -2, -3}; An error is reported when the element is less than 0
        int[] arr = new int[] {1.0.0.1.1.1.1.1.1.0.0.1.1.0.0.1.1.1.1.1.1.0.0.1};
        /*1, get the maximum and minimum values of this step should be known in advance */
        int max = Arrays.stream(arr).max().getAsInt();
        int min = Arrays.stream(arr).min().getAsInt();
        /*2. The size of the auxiliary count array is max-min+1*/
        int[] help = new int[max+1];
        // Count the number of occurrences of each element value, increment the value at the auxiliary array's value-min index by 1
        for (int value : arr) {
            help[value]++;
        }
        // Count the number of elements less than or equal to each element in the array. Starting with the first element in TMP, add each item to the previous one
        for (int j = 1; j < help.length; j++) {
            help[j] += help[j - 1];
        }

        /* Backorder traversal is important, this step ensures the stability of counting sort */
        // The result array is used to temporarily store sorted results
        int[] result = new int[arr.length];
        for (int i = arr.length - 1; i >= 0; i--) {
            result[help[arr[i]] - 1] = arr[i];
            // The value of the bucket is decreased by 1 because the next same number to find the index needs to be placed before the previous number to ensure stability;
            help[arr[i]]--;
        }
        // Take the data from the temporary array and assign it to arR
        for (int i = 0, j = 0; i < arr.length; i++, j++) {
            arr[i] = result[j];
        }
        System.out.println("After arR sort"+ Arrays.toString(arr)); }}Copy the code

In addition, counting sort is essentially a special kind of bucket sort, when the number of buckets is maximum (max-min+1), it is counting sort. This bucket right here is one of the locations of the auxiliary array. So let’s look at bucket sorting.

3.1.2. Complexity analysis of counting sort

The counting sort algorithm does not use comparisons between elements. It uses the actual values of elements to determine their position in the output array. Therefore, counting sort algorithm is not a comparison-based sorting algorithm, and thus its computation time lower bound is no longer O(nlogn).

The complexity of unstable versions is as follows:

Average time complexity O(n + (max – min))
Optimal time complexity O(n + (max – min))
Worst time complexity O(n + (max – min))
Spatial complexity O(max – min)

Max represents the maximum value, min represents the minimum value, and if max-min=O(n), then the time complexity is O(n).

Since counting sort requires an additional (Max — min) + 1-length auxiliary array with no recursion, the space complexity is O(max-min). Similarly, if max-min=O(n), the space complexity is also O(n). The stable version has the following complexity:

The average, best and worst time complexity are O(n+k), k is the maximum value of the sequence to be arranged, it requires two auxiliary arrays, and the space complexity is O(n+ K).

3.2 Bucket Sort

3.2.1 Principles and implementation of Bucket Sorting

Bucket sort is also known as box sort. In the counting sort mentioned above, the difference between the maximum value and the minimum value is how many buckets are prepared. Imagine that if the difference between the maximum value and the minimum value is too large, the number of buckets will be too many, and the space complexity will be greatly improved. Count sort is no longer applicable, bucket sort can be used instead.

In fact, in this case, you don’t always have one element in a bucket, you often have multiple elements in a bucket. In fact, real bucket sorting and hash table have the same principle. Similarly, bucket sorting requires prior knowledge of the range of elements, i.e., minimum and maximum.

Bucket sort can be used when the maximum and minimum values of data differ greatly. However, data must be evenly distributed in bucket sort. Otherwise, data may be concentrated in one bucket and bucket sort becomes invalid.

Stability: Whether bucket sorting is stable depends on the sorting algorithm used by each bucket. Bucket sorting can be stable, so bucket sorting is a stable sorting algorithm.

The bucket sorting process is as follows:

  1. Prepare a bucket container for the largest -max and smallest -min elements in the array to be sorted. Prepare a bucket container for the elements in each bucket to be stored in list, since the number of elements in each bucket is not fixed. The number of buckets is k=(max-min)/arr.length+1.
  2. Iterate through the arR, calculate the bucket position of each element according to certain mapping rules (this is similar to a hash table), and divide the elements to be sorted into different buckets.
  3. Each bucket is sorted individually, usually using quicksort.
  4. Iterate through the bucket collection sequentially, writing the sorted elements back into the original array.
public class BucketSort {
    public static void main(String[] args) {
        // The array elements are required to be integers
        //int[] arr = new int[]{3, 3, 5, 7, 2, 4, 10, 1, 13, 15, 3, 5, 6};
        //int[] arr = new int[]{1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15};
        int[] arr = new int[] {49.38.65.97.65.13.27.49.78.49.38.65.97.65.65.13.27.49.78.100};
        //int[] arr = new int[]{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1};
        //int[] arr = new int[]{5, 4, 3, 2, 1, 0, -1, -2, -3};
        //int[] arr = new int[]{1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1};


        /*1, get the maximum and minimum values of this step should be known in advance */
        int max = Arrays.stream(arr).max().getAsInt();
        int min = Arrays.stream(arr).min().getAsInt();


        /*2. Prepare the container */
        // Bucket quantity calculation formula
        int bucketsLength = getBucketsLength(arr, max, min);
        List<Integer>[] buckets = new List[bucketsLength];
        for (int i = 0; i < bucketsLength; i++) {
            buckets[i] = new ArrayList<>();
        }


        /*3, calculate arR each element should be stored in the bucket, and put each element into the corresponding bucket */
        // The range size of each bucket is calculated
        int bucketRange = getBucketRange(max, min, bucketsLength);
        // Put each element into the bucket
        for (int value : arr) {
            // Each element maps to a bucket index function f(key)
            int bucketIndex = getBucketIndex(min, bucketRange, value);
            buckets[bucketIndex].add(value);
        }
        System.out.println(After you fill them with buckets: + Arrays.toString(buckets));


        /* * * * * * * * * * * *
        for (List<Integer> list : buckets) {
            // Sort uses a ComparableTimSort comparison for each bucket element, which is a new sort method based on merge sort and superior to merge sort
            Collections.sort(list);
        }
        System.out.println("Buckets" + Arrays.toString(buckets));


        /* select * from array */
        int index = 0;
        for (List<Integer> bucket : buckets) {
            for (Integer integer : bucket) {
                arr[index++] = integer;
            }
        }
        System.out.println("After arR sort :" + Arrays.toString(arr));
    }

    /** * Obtain the number of buckets **@paramArr Needs to sort the array *@paramMax Maximum value of the array *@paramMin array minimum value *@returnBucket number * /
    private static int getBucketsLength(int[] arr, int max, int min) {
        return (max - min) / arr.length + 1;
    }

    /** * Gets the range of each bucket **@paramMax Maximum value of the array *@paramMin array minimum value *@paramBucketsLength Number of buckets *@returnBucket range This range is the range in which values are directly subtracted, that is, 8-1=7 */
    private static int getBucketRange(int max, int min, int bucketsLength) {
        return (max - min + 1) / bucketsLength;
    }

    /** * gets the bucket index of the element **@paramMin array minimum value *@paramBucketRange bucketRange *@paramValue The value of the element *@returnBucket index * /
    private static int getBucketIndex(int min, int bucketRange, int value) {
        return (value - min) / (bucketRange + 1); }}Copy the code

3.2.2 Complexity analysis of Bucket Sorting

Assuming that data is evenly distributed, the average number of elements in each bucket is n/k, where k represents the number of buckets. If you choose to sort the elements in each bucket using quicksort, the time complexity of each sort is O(n/klog(n/k)). The total time complexity is O (n) + O (m) O (n/klog (n/k)) = O (n + nlog (n/k)) = O (n + nlogn – nlogk). When k is close to n, the time complexity of bucket sorting can be considered O(n). That is, the more buckets, the higher the time efficiency, and the more buckets, the greater the space complexity. When the number of buckets reaches (max-min+1), it becomes counting sort.

The elements of the original array also affect whether the elements in the bucket are evenly distributed. If the elements in the original array are not extremely uneven, bucket sorting is not appropriate. As can be seen from the code implementation, the number of buckets and the selection of mapping function F (key) also affect whether elements are evenly distributed. In order to make bucket sorting more efficient, we need to do the following two things:

  1. In the case of sufficient space, as far as possible to increase the number of barrels;
  2. The mapping function F (key) needs to be able to evenly distribute the N input data into K buckets.

3.3 Radix Sort

3.3.1 Implementation of radix sort

Radix sort is a non-comparative sort for non-negative integers, and the maximum value must also be known. Radix sort does not compare elements directly according to the overall size of the elements, but divides the original list elements into multiple parts and sorts each part according to certain rules, thus forming the final ordered list. That is, radix sort must depend on another sort method. In fact, radix sort is a multi-keyword sort. Note that elements in radix sort must be non-negative integers.

Radix sort is also a bucket sort. Bucket sort is divided by value interval, radix sort is divided by digit; Radix sort can be thought of as multi-round bucket sort, with one round of bucket sort for each digit.

Radix sort is a stable sort algorithm.

Specific ideas are as follows:

  1. Unifies all non-negative integers to be sorted into integers with the same number of digits. Generally use base 10, can also use base 16 or even base 2, all the premise is to be able to find the maximum, get the longest digit, let the longest digit in base K is D. The value of the same digit is actually the number of keys to compare.
  2. Stable sorting of the same number of digits by size (because stable sorting preserves the results of the last sorting. For example, the sorting process of ten digits can preserve the sorting result of one digit, and the sorting process of hundreds can preserve the sorting result of ten digits.The radix sort can be LSD (Least Significant Digital) or MSD (Most Significant Digital). LSD sorts from the right of the key, while MSD sorts from the left of the key.
    1. MSD: Sort from the highest level first. On each keyword, count sort can be used.
    2. LSD: Sort from the lowest level first. Bucket sort can be used for each keyword.

The figure below illustrates sorting from the lowest order after completing the length. When sorting from the lowest to the highest, the entire sequence becomes an ordered sequence:

LSD code implementation is as follows:

public class RadixSort {
    public static void main(String[] args) {
        // The array elements are required to be non-negative integers
        //int[] arr = new int[]{12, 3, 55, 78, 102, 0, 88, 61, 30, 12, 3, 55, 78, 102, 0, 88, 61, 30};
        //int[] arr = new int[]{1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15};
        //int[] arr = new int[]{11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25};
        //int[] arr = new int[]{49, 38, 65, 97, 65, 13, 27, 49, 78, 49, 38, 65, 97, 65, 65, 13, 27, 49, 78, 100};
        //int[] arr = new int[]{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1};
        //int[] arr = new int[]{5, 4, 3, 2, 1, 0, -1, -2, -3}; If the value is less than 0, an exception is reported
        int[] arr = new int[] {1.0.0.1.1.1.1.1.1.0.0.1.1.0.0.1.1.1.1.1.1.0.0.1};

        /* How many bits */ get the largest element
        int max = Arrays.stream(arr).max().getAsInt();
        int num = 0;
        while(max ! =0) {
            max = max / 10;
            num++;
        }

        // Auxiliary count array bucket
        int[] help = new int[10];

        // Temporary array, put data, fetch data
        int[] bucket = new int[arr.length];

        //k represents the number of digits, 1 represents the ones digit, 2 represents the tens digit, and 3 represents the hundreds digit
        for (int k = 1; k <= num; k++) {
            // set count to null to prevent data impact from the last loop
            for (int i = 0; i < 10; i++) {
                help[i] = 0;
            }

            // count the number of bits in each bucket where the k-th bit is 0,1,2,3,4,5,6,7,8,9
            /* Count the number of buckets in which each digit is placed in its corresponding bucket. * /
            for (int value : arr) {
                help[getFigure(value, k)]++;
            }

            Bucket [I] += bucket[i-1] = bucket[i-1] Help [j] = (help[j-1])~(help[j]-1 Help [j]-1 = help[j]-1 = help[j]-1 = help[j]-1 = help[j]-1 = help[j]-1 . * /
            for (int i = 1; i < 10; i++) {
                help[i] += help[i - 1];
            }

            / * use cycle to convert data into a temporary array, pay attention to traverse from behind, because from the back of the index began to load Because if used to traverse the array when encounter the same number of the number of the same relative position is changed In the back of the original number will be the front row The sorting is not stable, and not in accordance with the order * /
            for (int i = arr.length - 1; i >= 0; i--) {
                // Get the value of arr[I]
                int j = getFigure(arr[i], k);
                // the sorted index for arr[I] is help[j]-1
                bucket[help[j] - 1] = arr[i];
                // The bucket value is decreased by 1 because the next same number to find the index needs to be placed before the previous number;
                help[j]--;
            }

            // Fetch the data from the bucket and assign it to arR
            for (int i = 0, j = 0; i < arr.length; i++, j++) {
                arr[i] = bucket[j];
            }

        }

        System.out.println("After arR sort" + Arrays.toString(arr));

    }

    /** what is the KTH bit of the integer I **@paramI integer I star@paramK the KTH bit star@returnThe value of the KTH bit */
    private static int getFigure(int i, int k) {
        int[] a = {1.10.100};
        return (i / a[k - 1]) % 10; }}Copy the code

3.3.2. Complexity analysis of radix sort

Let the number of elements to be sorted be n, the largest number is d digits, and the base is K (for example, if the base is 10, that is, the base is 10, there is a maximum of 10 possibilities, that is, a maximum of 10 buckets are needed to map the elements of the array).

To process one of the bits, we need to traverse both the original array and the counter array, with a time complexity of O(n+k) and a total time complexity of O(d*(n+k)).

In radix sort, we use a counter array of length R and a temporary array of length N to hold the elements, so the space complexity is O(k+n).

Radix sort is based on sorting separately, collecting separately, so it’s stable.

When using base 2, k=2 is the smallest, then the digit d is the largest, the time complexity O((d*(n+k)) will be larger, and the space complexity O(n+k) will be smaller.

3.4 summarize

Radix sort, counting sort and bucket sort all make use of the concept of buckets, but there are obvious differences in the use of buckets:

  1. Counting sort: each bucket stores a single key;
  2. Bucket sorting: Each bucket stores a range of values;
  3. Radix sort: buckets are allocated according to each digit of the key value;

Related references:

  1. “The algorithm”
  2. Data Structures and Algorithms
  3. Big Talk Data Structure
  4. Diagram of algorithms
  5. My First Algorithm Book

If you don’t understand or need to communicate, you can leave a message. In addition, I hope to like, collect, pay attention to, I will continue to update a variety of Java learning blog!