Sorting algorithm

Compare the algorithms

Sorting algorithm Average time complexity Best time complexity Worst time complexity Spatial complexity The sorting way Is it based on comparison Is stable
Bubble sort O(n^2) O(n) O(n^2) O(1) In situ is is
Selection sort O(n^2) O(n^2) O(n^2) O(1) In situ is no
Insertion sort O(n^2) O(n) O(n^2) O(1) In situ is is
Hill sorting O(nlogn) O(nlogn) O(n^2) O(1) In situ is no
Merge sort O(nlogn) O(nlogn) O(nlogn) O(n) The in situ is is
Quick sort O(nlogn) O(nlogn) O(n^2) O(n) In situ is no
Heap sort O(nlogn) O(nlogn) O(nlogn) O(1) In situ is no
Count sorting O(n+k) O(n+k) O(n+k) O(n+k) The in situ no is
Radix sort O(n*k) O(n*k) O(n*k) O(n) The in situ no is
Bucket sort O(n+k) O(n+k) O(n^2) O(n+k) The in situ no is

Bubble sort

Algorithm steps

Given an array of N elements, bubble sort will:

  1. Compare adjacent elements. If the first one is bigger than the second, swap them both.
  2. Do the same for each pair of adjacent elements, starting with the first pair and ending with the last pair. At this point, the last element should be the largest number.
  3. Repeat this step for all elements except the last one.
  4. Keep repeating the above steps for fewer and fewer elements at a time until there are no more pairs to compare.

The demo

Reference code

public class BubbleSort {
    public static void bubbleSort(int[] arr) {
        for (int i = 0; i < arr.length - 1; i++) {
            // Add an identifier, if there is no exchange, it is completely ascending, then you can exit the loop
            boolean flag = false;
            for(int j = 0; j < arr.length -1 - i; j++) {
                if(arr[j] > arr[j+1]) {
                    flag = true;
                    int tmp = arr[j];
                    arr[j] = arr[j + 1];
                    arr[j + 1] = tmp; }}if(! flag) {break; }}}}Copy the code

Complexity analysis

Comparison and interchange require a constant bounded time, which we’ll call C. There are two nested loops in Bubble Sort. The outer loop runs exactly n iterations. But the inner loop is getting shorter and shorter:

  1. When I = 0, (n-1) iterations (compare and possibly swap).
  2. When I = 1, n minus 2 iterations
  3. When I = (n-2), 1 iteration
  4. When I = (n-1), 0 iterations

Therefore, the total number of iterations = (n-1) + (n-2) +… Plus 1 plus 0 is n times n minus 1 over 2. Total time = c times n times n minus 1 over 2 = O times n squared.

Best time complexity: the original array is ascending, so only one inner loop is needed to exit the double-layer loop, time complexity is 0(n)

Worst case: the original array is in descending order, then, the loop will not break, complexity is 0(n^2)

Selection sort

Algorithm description

  1. For the first time, the smallest (or largest) element is selected from the data set to be sorted and stored at the beginning of the sequence

  2. It then finds the smallest (largest) element from the remaining unsorted elements and places it at the end of the sorted sequence.

  3. And so on until the total number of data elements to be sorted is zero.

The demo

Reference code

public class SelectionSort {
    public static void selectionSort(int[] arr) {
        for(int i = 0 ; i < arr.length -1; i++) {
            int minIndex = i;
            for(int j = i+1; j < arr.length; j++) {
                if(arr[minIndex] > arr[j]) { minIndex = j; }}if(minIndex ! = i) {inttmp = arr[i]; arr[i] = arr[minIndex]; arr[minIndex] = tmp; }}}}Copy the code

Algorithm analysis

So the average, the best, the worst time is O(n^2), and the second thing is that you change the relative position in the process of choosing the switch, so the algorithm is not stable

Insertion sort

Algorithm description

Insert sort, also commonly known as direct insert sort. It is an efficient algorithm for sorting a small number of elements. The algorithm is to insert a record into the ordered sequence, so as to obtain a new ordered sequence (the first data of the sequence is regarded as an ordered sub-sequence, and then orderly insert from the second record to the ordered sub-sequence one by one, until the whole sequence is ordered)

The illustration below

The demo

Reference code

public class InsertionSort {
    public static void insertionSort(int[] arr) {
        // The first element is ordered by default. Start with the second element and find its insertion position
        for(int i = 1; i < arr.length; i++) {
            int X = arr[i]; // X is the element to be inserted
            int j = i-1;
            // Iterate over the ordered part to find the insertion position of X
            for(; j>=0&& arr[j] > X; j--) { arr[j+1] = arr[j];
            }
            // Insert X in the correct position
            arr[j+1] = X; }}}Copy the code

Algorithm analysis

Time complexity analysis

The outer loop executes n minus 1 times, which is pretty obvious. The number of times the inner loop executes depends on the input array: In the best case, the array is sorted and (a) [j] > X is always false, so don’t need to move data, and the inner loop runs in O (1), in the worst case, the array is a reverse ordering and (a) [j] > X is always true, insert always occur at the front end of the array, and the internal loop to O (n) operation. So, the best case time is O(n × 1) = O(n), and the worst case time is O(n × n) = O(n^2).

Is stable

When we insert, the first one is bigger than the last one, so the same data doesn’t move or swap, so insert sort is a stable algorithm

Hill sorting

Hill sort is a kind of insertion sort, also known as “reduced increment sort”, is a more efficient and improved version of the direct insertion sort algorithm. Hill sort is an unstable sort algorithm

Hill sort is to group the records by a certain increment of the index and sort each group by direct insertion sort algorithm. As the increments decrease, each group contains more and more keywords, and when the increments decrease to 1, the entire file is exactly grouped and the algorithm terminates

Hill sort is an improved method based on the following two properties of insertion sort:

  1. Insertion sort has high efficiency when it operates on almost ordered data, that is, it can achieve the efficiency of linear sort.
  2. But insert sort is generally inefficient because it can only move data one bit at a time.

In insertion sort, many elements need to be moved when an element is inserted into a previous position (e.g., sequence {2,3,4,5,1}, 2,3,4,5 are moved backwards to insert 1 into {2,3,4,5}). The idea of hill sort is to allow the exchange of further elements

In hill sort, we swap the elements that are h away from the array, and then we keep decreasing h until h is equal to 1, and then we do the last insertion sort, and then the sort is done, and that h is called the increment

For demonstration purposes, the increments are initialized to half the size of the sequence and halved each time. For {5,8,6,4,2,1,7}, the following figure shows the steps and results of the algorithm

The group span 4, 2, 1 is called incremental sequence

The demo

Reference code

public class ShellSort {
    public static void shellSort(int[] arr) {
        int h = arr.length / 2; / / increment
        while(h >= 1) {
            for (int i = h; i < arr.length ; i++) {
                int X = arr[i];
                int j = i - h;
                for(; j>=0&& arr[j] > X; j-=h) { arr[j+h] = arr[j]; } arr[j+h] = X; } h /=2; }}}Copy the code

The execution time of Shell sort depends on the increment sequence, about the selection of increment sequence, people have studied a lot of schemes, interested in the fact that the most extensive increment method is h= length / 3 + 1, interested in why it is divided by 3(divided by 3 exchange the least number of times, The least time used), the code is as follows

public class ShellSort {
    public static void shellSort(int[] arr) {
        int h = arr.length / 3 + 1; / / increment
        while(h >= 1) {
            for (int i = h; i < arr.length ; i++) {
                int X = arr[i];
                int j = i - h;
                for(; j>=0&& arr[j] > X; j-=h) { arr[j+h] = arr[j]; } arr[j+h] = X; } h = h /3 + 1; }}}Copy the code

Algorithm analysis

Hill sort uses coarse tuning to make the algorithm complexity less than O(n^2), but in some extreme cases, the worst time complexity of Hill sort is still O(n^2), which is slower than direct insert sort. Why is this?

If we look at the array above, if we do the same thing we did before, whether we do increments by 4 or by 2, there is no swap of the elements inside each group. Until we reduce the increment to one, the array will be adjusted to insert sort.

For such arrays, hill sort increases the cost of grouping rather than reducing the amount of direct insertion sort.

From this we can conclude that the execution time of Shell sort depends on the incremental sequence.

Common characteristics of good incremental sequences:

  1. The last increment must be 1;

  2. Values in a sequence (especially adjacent values) should be avoided as much as possible.

Hill sort best case: O(nlogn) Worst case: O(n^2) Average case: T(n) = O(nlogn)

Merge sort

Algorithm description

Merge operations work as follows:

The first step is to apply for a space equal to the sum of the two sorted sequences. This space is used to store the combined sequence

Step 2: Set two Pointers, starting at the start of two sorted sequences

Step 3: Compare the two Pointers to the element, select the smaller element into the merge space, and move the pointer to the next position

Repeat step 3 until a pointer exceeds the end of the sequence

Copies all remaining elements of the other sequence directly to the end of the merged sequence

The demo

Reference code

Version1 recurses from the top down
public class MergeSort {
    public static void sort(int[] arr) {
        if(arr.length < =1) {
            return;
        }
        mergeSort(arr, 0, arr.length -1);
    }
    private static void mergeSort(int[] arr, int left,int right) {
        if(left < right) {
            int mid = (right - left) / 2 + left; // Split in half
            mergeSort(arr, left, mid); // Continue recursive sorting for each part
            mergeSort(arr, mid+1, right);
            merge(arr, left, mid, right); // merge subroutine}}Arr [mid] arr[mid+1...right]
    private static void merge(int[] arr, int left, int mid, int right) {
        int n = right - left + 1;
        int[] data = new int[n]; // Apply for a temporary array to merge
        int k = 0;
        int i = left;
        int j = mid+1;
        while(i <= mid && j <= right) {
            data[k++] = arr[i] <= arr[j] ? arr[i++]: arr[j++];
        }
        while(i <= mid) {
            data[k++] = arr[i++];
        }
        while(j <= right) {
            data[k++] = arr[j++];
        }
        for(int m = 0; m < n; m++) { arr[m + left] = data[m]; }}}Copy the code

The above code is a top-down recursive merge sort, take a target sort array, using dichotomy, recursive decomposition, until the number is 1, start pwise comparison sort merge, and finally complete the target array merge sort.

Now let’s think about bottom-up merge sort, so instead of going recursively, we merge directly from the neighbors, doubling the steps.

Version2 recurses from the bottom up
public class MergeSortBU {
    public static void mergeSortBU(int[] arr) {
        int n = arr.length;
        for (int size = 1; size < n; size *= 2) { // Step size multiplied, 1, 2, 4, 8.....
            for (int i = 0; i < n - size; i += size * 2) {
                // merge arr[I, I +size-1] and arr[I +size, I +size* 2-1
                merge(arr, i, i + size - 1, Math.min(i + 2 * size - 1, n - 1)); }}}Arr [mid] arr[mid+1...right]
    private static void merge(int[] arr, int left, int mid, int right) {
        int n = right - left + 1;
        int[] data = new int[n];
        int k = 0;
        int i = left;
        int j = mid + 1;
        while (i <= mid && j <= right) {
            data[k++] = arr[i] <= arr[j] ? arr[i++] : arr[j++];
        }
        while (i <= mid) {
            data[k++] = arr[i++];
        }
        while (j <= right) {
            data[k++] = arr[j++];
        }
        for (int m = 0; m < n; m++) { arr[m + left] = data[m]; }}}Copy the code

Algorithm analysis

Is it stable sort

The key to the stability of merge sort is the merge() function, which is the part of the code that merges two ordered subarrays into one ordered array. Mid = mid = mid = mid = mid = mid = mid = mid = mid = mid = mid = mid This ensures that elements with the same value are placed in the same order before and after the merge. So merge sort is a stable sorting algorithm.

Time complexity

We’re looking at the time complexity of the top-down recursion,

Merge sort involves recursion, how to analyze the time complexity of recursive code. Recursion applies when a problem A can be decomposed into multiple subproblems B and C, so solving problem A can be decomposed into solving problems B and C. After problems B and C are solved, we combine the results of B and C into the results of A. If we define the time to solve problem A to be T(a), and the time to solve problem B and C to be T(b) and T(c) respectively, then we can get the recurrence relation like this:

T(a) = T(b) + T(c) + K
Copy the code

Where K is equal to the time consumed to combine the results of two sub-problems B and C into the results of problem A

Using this formula, let’s analyze the time complexity of merge sort.

Let’s say it takes T(n) to merge n elements, so it takes T(n/2) to split into two subarrays. The merge() function merges two ordered subarrays in O(n) time. Therefore, the time complexity of merge sort is calculated by:

T (1) = C; When n=1, only constant execution time is required, so it is denoted by C. T(n) = 2 times T(n/2) + n; n>1Copy the code

Let’s break it down a little bit more

T(n) = 2*T(n/2) + n = 2*(2*T(n/4) + n/2) + n = 4*T(n/4) + 2*n = 4*(2*T(n/8) + n/4) + 2*n = 8*T(n/8) + 3*n = 8*(2*T(n/16)  + n/8) + 3*n = 16*T(n/16) + 4*n ...... = 2^k * T(n/2^k) + k * n ......Copy the code

So by doing this step by step, we get T of n is equal to 2 to the kT of n over 2 to the k plus kN. When T of n over 2 to the k is equal to T of 1, which is n over 2 to the k is equal to 1, we get k is equal to log base 2n. We substitute the value of k into the formula above, and we get T(n)=Cn+nlog2n. If we use big O notation, T of n is equal to order nlogn. So merge sort is order nlogn.

Spatial complexity

Merge sort is order nlogn in any case, which looks pretty good. (even quicksort is O(n2) in the worst case.) But merge sort is not as widely used as quicksort. Why is that? Because it has a fatal “weakness”, that is, merge sort is not an in-place sort algorithm. This is because the merge function merge sort, merge two ordered arrays into one ordered array, requires additional storage space.

So what is the spatial complexity of merge sort? Is it order n, or is it order N logn?

If we continue to analyze the time complexity of recursion, recursively, the space complexity required for the whole merge process is order nlogn. Is it right that we analyze spatial complexity in the same way that we analyze temporal complexity **? In fact, the spatial complexity of recursive code does not add up as much as the temporal complexity does. Although additional memory is required for each merge operation, the temporary space is freed after the merge is complete. There is only one function executing on the CPU at any one time, so there is only one temporary memory space in use. The temporary memory space can’t exceed the size of n data, so the space complexity is O(n).

Quick sort

Quicksort algorithm realizes sorting through multiple comparison and exchange, and its sorting process is as follows:

  • (1) First set a boundary value, generally choose the left or right boundary, through the boundary value to divide the array into left and right parts.

  • (2) Collect the data greater than the boundary value to the right of the array, and the data less than or equal to the boundary value to the left of the array. At this point, each element in the left part is less than or equal to the cut-off value, while each element in the right part is greater than the cut-off value. This process is called partition

  • (3) Then, the data on the left and right can be sorted independently. For the left side of the array, you can take a boundary value and divide that part of the data into left and right parts, again placing a smaller value on the left and a larger value on the right. The array data on the right can be treated similarly.

  • (4) Repeat the above process until partitioning is no longer possible, which is a recursive definition. After recursively ordering the left side, recursively ordering the right side. When the sorting of the left and right parts of the data is complete, the sorting of the entire array is complete

The demo

The sample code

public class QuickSort {
    public static void sort(int arr[]) {
        int n = arr.length-1;
        quickSort2(arr, 0, n);
    }
    public static void quickSort(int[] arr, int left, int right) {
       if(left < right) {
           int pivotIndex = partition(arr, left, right);
           quickSort(arr, left, pivotIndex-1);
           quickSort(arr, pivotIndex+1, right); }}private static int partition(int[] arr, int left, int right) {
        int pivot = arr[left]; // select the left border
        int l = left;
        int r = right;
        while(l < r) {
            // If you select the left boundary, you must move the right pointer first
            while(l < r && arr[r] > pivot) {
                r--;
            }
            while(l < r && arr[l] <= pivot) {
                l++;
            }
            if (l<r) {
                inttmp = arr[l]; arr[l] = arr[r]; arr[r] = tmp; }}// Pivot and pointer overlap point switch
        int p = arr[l];
        arr[l] = arr[left];
        arr[left] = p;
        returnl; }}Copy the code

In fact, there are many ways to write partition function, here is the use of double pointer method, as well as the digging method, interested in children’s shoes can go to understand

Algorithm analysis

Compare that to merge sort

Merge sort is handled bottom-up, dealing with subproblems first, and then merging. Quicksort, on the other hand, works from top to bottom, partitioning first and then dealing with sub-problems. Although merge sort is a stable sorting algorithm with O(nlogn) time complexity, it is a non-in-situ sorting algorithm. As we saw earlier, the main reason why merge is not an in-place sort algorithm is because merge functions can’t be executed in place. Quicksort can realize in-place sort by cleverly designed in-place partition function, which solves the problem that merge sort occupies too much memory.

Complexity analysis

The best cases of quicksort, such as merge sort, occur when partitions always split an array into two equal halves. When this happens, the depth of the recursion is only O(logn). Since each level is O(n) compared, the time complexity is O(nlogn) and the space complexity is O(logn).

So let’s say that if we take the left edge, if the right side of the sequence is larger than the edge, then the recursion tree is so unbalanced that it simply degrades to a linked list, the recursion depth becomes N, the time complexity becomes O(n^2), and the space complexity becomes O(n).

How to avoid this problem? In fact, it’s easy to solve, you don’t want to pick a fixed partition, you want to pick it randomly. This kind of quicksort is also called random quicksort, and this randomized version of quicksort has an expected time of O (N log N) over any N element input array.

To sum up: the average time complexity of fast sorting is O(nlogn), the best is O(nlogn), the worst is O(n^2). The average space complexity of fast sorting is O(nlogn), the best is O(logn), the worst is O(n).

Random fast row

Reference code

public class QuickSort {
    public static void sort(int arr[]) {
        int n = arr.length-1;
        quickSort2(arr, 0, n);
    }
    public static void quickSort(int[] arr, int left, int right) {
       if(left < right) {
           int pivotIndex = partition(arr, left, right);
           quickSort(arr, left, pivotIndex-1);
           quickSort(arr, pivotIndex+1, right); }}private static int partition(int[] arr, int left, int right) {
        // Randomly generate an index of the interval [left, right]
        int randomIndex = (int) (Math.random() * (right - left + 1));
        swap(arr, left, randomIndex);
        int pivot = arr[left];
        int l = left;
        int r = right;
        while(l < r) {
            while(l < r && arr[r] > pivot) {
                r--;
            }
            while(l < r && arr[l] <= pivot) {
                l++;
            }
            if (l<r) {
                inttmp = arr[l]; arr[l] = arr[r]; arr[r] = tmp; }}// Pivot and pointer overlap point switch
        int p = arr[l];
        arr[l] = arr[left];
        arr[left] = p;
        return l;
    }
    private static void swap(int[] arr, int i , int j) {
        inttmp = arr[i]; arr[i] = arr[j]; arr[j] = tmp; }}Copy the code

Heap sort

A heap is a complete binary tree with the following properties: the value of each node is greater than or equal to the value of its left and right children, called the big top heap; Or the value of each node is less than or equal to the value of its left and right children, called the small top heap.

Heap sort exploits the properties of binary heaps

Steps of the heapsort algorithm:

1. Construct disordered number fabric into binary heap.

2. Loop to remove the top of the heap, move to the end of the collection, adjust the heap to generate a new top.

The demo

public class HeapSort {
    public static void heapSort(int[] arr) {
        int n = arr.length;
        // 1. Heapify builds a maximum heap from the first non-leaf node down
        for (int i = (n-1) / 2; i>=0; i--) {
            shiftDown(arr, i, n);
        }
        // 2 Loop to remove the top element, move to the end of the collection, adjust the heap to produce a new top.
        for(int i = n - 1 ; i > 0; i--) { swap(arr,0, i); // Swap the top of the heap with the current element
            shiftDown(arr, 0, i); // Sink from the top of the heap to maintain the heap}}If the child node is larger than the current node, then the switch starts the next cycle, otherwise the loop exits
    private static void shiftDown(int[] arr, int k, int n) {
        while(2*k + 1 < n) {
            int i = 2*k + 1;
            if(i + 1 < n && arr[i+1] > arr[i]) {
                i = i+1;
            }
            if(arr[k] > arr[i]) {
                break; } swap(arr, i, k); k = i; }}private static void swap(int[] arr, int i , int j) {
        inttmp = arr[i]; arr[i] = arr[j]; arr[j] = tmp; }}Copy the code

Using this array as an example, look at the heapify process in the first step

The index of the first non-leaf node of a complete binary tree is (n-1) / 2, and a maximum heap is constructed from the first non-leaf node to the first node

Complexity analysis

Let’s review the steps of heapsort again:

  1. Disordered number fabric is constructed into binary heap.

  2. Loop to remove the top of the heap, move to the end of the collection, adjust the heap to produce a new top.

In the first step, the disordered number fabric is constructed into a binary heap, which requires N /2 cycles. The shiftDown method is called once per cycle, so the size of the first step is N /2 * logn and the time complexity is O (nlogn).

The second step requires n-1 cycles. The shiftDown method is called once per cycle, so the size of the second step is (n-1) * logn and the time complexity is O (nlogn).

So the overall time complexity is also O(nlogn), and the worst and best time complexity is also O(nlogn).

Merge and quicksort are both unstable sorting algorithms with an average time of O(nlogn), and the worst time of quicksort is O(n^2) and heap sort is O(nlogn).

Count sorting

Count sort is a non-comparison sorting algorithm. It has the advantage that the complexity of count sort is n+k (where K is the integer range) when sorting two integers within a certain range, which is faster than any comparison sorting algorithm. Of course, this is a way to sacrifice space for time, and when O(k)>O(nlog(n)), it is not as efficient as the time complexity of comparison based sorting (the theoretical lower limit of comparison based sorting is O(nlog(n)), such as merge sort, quicksort, heap sort).

The core of counting sort is to convert input data values into keys and store them in extra array space. As a sort of linear time complexity, counting sort requires that the input data must be integers with a definite range.

Algorithm description

  1. Take O(n) time to scan the entire sequence A to get min and Max
  2. Create a new array B of length (max-min + 1)
  3. The element of index in array B records the number of occurrences of an element in A
  4. Finally output the sequence of target integers, the specific logic is to traverse the number group B, output the corresponding elements and the corresponding

The demo

The sample code

The first edition
// version 1
public class CountingSort {
    public static void countingSort(int[] arr) {
        int max = arr[0];
        for(int i = 0 ; i < arr.length; i++) {
            if(max < arr[i]) {
               max = arr[i]
            }
        }
        int k = max +1;
        // Create count array
        int[] countArr = new int[k];
        // Iterate over each element in the number column, increasing the corresponding count by 1
        for(int i = 0; i < arr.length: i++) {
            countArr[arr[i]]++;
        }
        int index = 0;
        for (int i = 0; i < k; i++) {
           while(countArr[i] > 0) { arr[index++] = i; countArr[i]--; }}}}Copy the code

We’ve implemented a counting sort above,

But this algorithm is not stable!! Think about why? Now we need to transform it into a stable algorithm, so let’s write the second version of the algorithm

The second edition
// version2
public class CountingSort {
    public static int[] countingSort(int[] arr) {
        int max = arr[0];
        for(int i = 0 ; i < arr.length; i++) {
            if(max < arr[i]) {
               max = arr[i]
            }
        }
        int k = max +1;
        // Create count array
        int[] countArr = new int[k];
        // Iterate over each element in the number column, increasing the corresponding count by 1
        for(int i = 0; i < arr.length: i++) {
            countArr[arr[i]]++;
        }
        for(int i = 1; i < k; i++) {
            countArr[i] = countArr[i] + countArr[i-1]; // The sorted position of the statistics
        }
        int[] res = new int[arr.length];
        for(int i = arr.length -1; i--; i>=0) {
            result[--count[arr[i]-1]] = arr[i]; // put arr[I] in the correct position after sorting
        }
        returnres; }}Copy the code

Let’s examine this code first

for(int i = 0; i < arr.length: i++) {
    countArr[arr[i]]++;
}
for(int i =1; i < k; i++) {
    countArr[i] = countArr[i] + countArr[i-1];
}
Copy the code

Now that we’ve solved the stability problem, let’s move on to what else

Assuming that our array does not start at 0, the minimum is 90, and the maximum is 99, opening up an array of 100 would waste the first 90 Spaces.

To solve this problem, instead of (the maximum value of the input array +1), we use (the difference between the maximum and minimum value of the array +1) as the length of the statistical array. At the same time, the minimum value of the array is used as an offset for the count of the array.

So the final version 3 code is as follows

The third edition
// version3
public class CountingSort {
    public static int[] countingSort(int[] arr) {
        int max = arr[0];
        int min = arr[0];
        for(int i= 0; i< arr.length; i++) {if(arr[i] > max) {
                max = arr[i];
            }
            if(arr[i] < min) { min = arr[i]; }}int k = max - min +1;
        // Create count array
        int[] countArr = new int[k];
        // Iterate over each element in the number column, increasing the corresponding count by 1
        for(int i = 0; i < arr.length: i++) {
            countArr[arr[i]-min]++; // Take care to subtract the offset
        }
        for(int i = 1; i < k; i++) {
            countArr[i] = countArr[i] + countArr[i-1]; // The sorted position of the statistics
        }
        int[] res = new int[arr.length];
        for(int i = arr.length -1; i--; i>=0) {
            result[--count[arr[i]-min]] = arr[i]; // Put arr[I] in the correct position after sorting, and here also subtract the offset
        }
        returnres; }}Copy the code

Algorithm analysis

We ended up with version 3 to analyze the complexity

Time complexity: three cycles in total, n+ (k-1) + n = O(n+k)

Space complexity: a statistical array requires k space, a space for the array n to hold the results, so O(n + k)

Conclusion:

Counting sort is suitable for integer sequence and requires small data range. For k<<n(massive data, limited data range), the time complexity can be reduced to O(n).

Radix sort

Radix sort uses buckets, which, as the name implies, compares the bits, tens, hundreds, and allocates the elements to be sorted into 0 to 9 buckets. In some cases, radix sort is more efficient than other comparative sorts.

For example: 2,22,31,1221,90,85,105

Tens: 2,105,1221,22,31,85,90 hundreds: 2,22,31,85,90,105,1221 thousands: 2,22, 31, 81, 90, 90, 105,1221 Each sort is based on the last sort, that is, when the number of digits in this sort is equal, the element is not moved (i.e. the sorting order of one digit in the order parameter), so it is stable sortCopy the code

The demo

The sample code

Let’s take the non-negative integer case first

// verson1
import java.util.Arrays;
public class RadixSort {
    public static void radixSort(int[] arr) {
        int max = getMax(arr);
        // Take counting order from lowest to highest
        for(int exp = 1; max/exp > 0; exp*=10) {
            // Count sortcountSort(arr, exp); }}private static void countSort(int[] arr, int exp) {
        int n = arr.length;
        int output[] = new int[n]; // Output an array
        int count[] = new int[10]; // the length of the bucket is 10, and the corresponding bits are 0,1,2,3,4,5,6,7,8,9
        // count[] Number of statistics
        for (int i = 0; i < n; i++)
            count[(arr[i]/exp)%10] + +;// Store the correct position of the array element under the current bit
        for (int i = 1; i < 10; i++)
            count[i] += count[i - 1];
        // Build the output array
        for (int i = n - 1; i >= 0; i--){
            output[--count[ (arr[i]/exp)%10 ]] = arr[i];
        }
        // Each round copies the output array to the original array
        for (int i = 0; i < n; i++)
            arr[i] = output[i];
    }
    private static int getMax(int[] arr) {
        int max = arr[0];
        for (int i = 1; i < arr.length; i++)
            if (arr[i] > max)
                max = arr[i];
        returnmax; }}Copy the code

Our current algorithm only supports sorting sequences of non-negative integers, so we need to modify it to support negative numbers as well

Set the bucket length to 20, where [0-9] corresponds to a negative number, [10-19] corresponds to a positive number, and the maximum value should be the largest absolute value

// verson2
public class RadixSort {
    public static void radixSort(int[] arr) {
        int max = getAbsMax(arr);
        // Take counting order from lowest to highest
        for(int exp = 1; max/exp > 0; exp*=10) {
            // Count sortcountSort(arr, exp); }}private static void countSort(int[] arr, int exp) {
        int n = arr.length;
        int output[] = new int[n]; // Output an array
        int k = 20;
        int count[] = new int[k];
        int offset = 10; // The offset is 10
        // count[] Number of statistics
        for (int i = 0; i < n; i++)
            count[(arr[i]/exp)%10 + offset]++;
        // Store the correct position of the array element under the current bit
        for (int i = 1; i < k; i++)
            count[i] += count[i - 1];
        // Build the output array
        for (int i = n - 1; i >= 0; i--){
            output[--count[(arr[i]/exp)%10+offset]] = arr[i];
        }
        // Each round copies the output array to the original array
        for (int i = 0; i < n; i++)
            arr[i] = output[i];
    }
    private static int getAbsMax(int[] arr) {
        int max = Math.abs(arr[0]);
        for (int i = 1; i < arr.length; i++)
            if (Math.abs(arr[i]) > max)
                max = Math.abs(arr[i]);
        returnmax; }}Copy the code

Complexity analysis

The time to count countSort is n + r, where r is base, and the bucket length for base 10 is 10*2, the same for all other bases, and then the outer bit loop, depending on the number of digits with the largest absolute value, is a constant k, so

O(k * (n + r) = O(k*n)

Space complexity O(n + R) = O(n)

Radix sort uses count sort for each bucket sort. Count sort is stable sort, so radix sort is stable sort

Bucket sort

A bucket is a container that can be implemented using a variety of data structures, including arrays, queues, or stacks.

Bucket sorting is not affected by O(nlogn) time complexity lower limit, and it divides the sequence to be sorted into a finite number of buckets through traversal. Then, each bucket is sorted separately. In-bucket sorting can be implemented by comparison or non-comparison.

Algorithm steps

Bucket sorting by comparison is divided into four steps:

1. Set an appropriate number of buckets;

2. Iterate through the sequence to be sorted and put each element into the corresponding bucket.

3. Conduct a proper sorting algorithm for each non-empty bucket;

4. Access buckets in sequence, and put the elements in the buckets back to the corresponding positions in the original sequence.

Reference code

Bucket sort can sort floating-point data, which is double

import java.util.*;
public class BucketSort {
    public static void main(String[] args) {
        double[] arr = {10.3.60.4.30.7, -20.4.10, -70.60};
        bucketSort(arr);
        System.out.println(Arrays.toString(arr));
    }
    public static void bucketSort(double[] arr) {
        int n = arr.length;
        double max = arr[0];
        double min = arr[0];
        for(double v: arr) {
            if(v > max) {
                max = v;
            }
            if(v < min) { min = v; }}// To calculate the number of buckets, it is necessary to flexibly design according to the data scale and characteristics. Bucket partitioning also determines the merits of the algorithm
        int bucketNum = (int) ((max - min) / n + 1);
        // Initialize the bucket
        LinkedList<Double>[] bucket = new LinkedList[bucketNum];
        for (int i = 0; i < bucketNum; i++) {
            bucket[i] = new LinkedList<Double>();
        }
        // Assign the element to the corresponding bucket
        for (int i = 0; i < n; i++) {
            int index = (int) ((arr[i] - min ) / n);
            bucket[index].add(arr[i]);
        }
        // Sort the elements of each bucket. The sorting algorithm is optional
        for(int i = 0; i < bucket.length; i++){
            // Use TimSort, a kind of insertion and merge sort upgrade version, efficient and stable
            Collections.sort(bucket[i]);
        }
        // Take the elements out of the bucket
        int index = 0;
        for(int i = 0; i < bucket.length; i++){
            // Take out the elements in the current bucket, i.e. the iterated list
            Iterator<Double> iterator = bucket[i].iterator();
            while(iterator.hasNext()){ arr[index++] = iterator.next(); }}}}Copy the code

Algorithm analysis

The complexity of the

The time complexity of bucket sorting N data is divided into two parts:

  1. N times, each data is loaded into the bucket
  2. The time complexity of sorting data in buckets is∑ O (Ni * logNi)Where Ni is the data amount of the ith bucket.
  3. Run N times to obtain N data from each bucket

For N data to be sorted and M buckets, each bucket contains [N/M] data on average. The average bucket sorting time complexity is as follows:

O(N)+O(M*(N/M)*log(N/M)) + O(N)=O(N+N*(logN-logM))=O(N+N*logN - N*logM)

When N=M, in the limit case there is only one data per bucket. The best efficiency of bucket sorting is O(N).

For the same amount of data, the more buckets, the more evenly distributed data, the higher the efficiency of bucket sorting, it can be said that the efficiency of bucket sorting is the sacrifice of space.

Space complexity Space complexity generally refers to the extra storage space required during algorithm execution

In bucket sort, you need to create M extra buckets and N extra elements

So the bucket sort space is O(N+M).

The stability of

Stability means, for example, if a is ahead of B, if a is equal to B, after sorting, a should still be ahead of B, so it’s stable.

In bucket sorting, if a is already in the bucket, b will always be inserted to the right of A (because it is usually from right to left, if not less than the current element, insert the element to the right).

So bucket sorting is stable

Of course, if the in-bucket sorting algorithm adopts non-stable sorting such as quicksort, then it is not stable, but we can achieve stability, so it is uniformly considered stable

Scope of application

Using sorting is mainly suitable for evenly distributed numeric sequences, so that the data can be evenly distributed in each bucket, in which case maximum efficiency can be achieved


References:

  • visualgo.net/zh/sorting
  • www.geeksforgeeks.org/sorting-alg…
  • www.cnblogs.com/xiaowenshu/…