• Sorting algorithm (I) – bubble, insert, select
  • Sorting algorithm (two) – merge, fast sorting

The three sorting algorithms are all O(n) in time.

Bucket sort

Bucket sorting refers to sorting buckets based on the data to be sorted, and then sorting data into buckets. After sorting buckets, the sorting is complete.

Bucket sort is O(n), so let’s analyze it. Suppose there is an array to be sorted, the number of elements is N, m buckets are divided, and each bucket is sorted by quick sort with the time complexity of O(klogk). At this time, the time complexity of bucket sorting is M * K * logk. Since k= N /m, m * N/M * logn/m = Nlogn /m. When the number of buckets m is close to the number of data n, log(n/m) is a very small constant, and the time complexity of bucket sorting is close to O(n).

However, bucket sorting has high requirements on data, and the sorting data should be evenly distributed among buckets as much as possible. In extreme cases, the time complexity will be reduced to O(nlogn).

Under normal circumstances: according to the previous derivation formula, we can get the time complexity close to O(n).

In extreme cases, the time complexity degrades to O(nlogn) because the data is distributed in a bucket, which is equivalent to quicksort.

For example, if 1GB of data needs to be sorted but the memory is only 100 MB, the bucket sort algorithm can be used to divide 1GB of data into 10 files. If some files exceed 100 MB due to uneven data distribution, the files exceeding 100 MB can be split into multiple buckets. Sort each bucket individually until the split file is no larger than 100M.

Count sorting

Counting sort also has the concept of buckets. Buckets are divided according to the maximum value of the array to be sorted. If the maximum value of the array to be sorted is K, buckets are divided into K +1. The elements in each bucket are the same, so there is no sorting within the bucket.

Sorting process:

  • Buckets are divided and each element is counted in the corresponding bucket
  • After counting buckets, add them from left to right
  • Sorting (difficult)

The sorting process of counting sort is the most difficult to understand, and we break it down using a graph that starts at the end of the array to be sorted

Counting sort also has special requirements for sorting data:

  1. It cannot be a negative number. If there is a negative number, you need to change the negative number to a positive number and sort it. For example, if the minimum value is -100, you need to +100 the whole array and sort it
  2. The value must be an integer. If there are decimal digits, convert them to integers and count them by buckets
  3. The maximum number of sorts cannot be too large; otherwise, buckets are not suitable for sorting. For example, when sorting id numbers, buckets are 18 digits

Radix sort

As mentioned above, id numbers are not suitable for counting sort, so we can use radix sort instead.

Radix sort also uses the concept of buckets, is to sort the value of each bit of sorting elements, sorting process can be divided into 0 to 9 such as 10 buckets, first from the lowest sort, put the corresponding bucket and then take out in order, until the highest bit is finished sorting. The number of loops is determined by the element with the highest median of all elements to be sorted. For example, if the minimum value is 3 and the maximum value is 12345678, the rest of 3 is filled with 0 and 00000003. Sort each bit once, and then sort eight times.

In addition, buckets can be divided not only by numbers, but also by single English letters, but the range of buckets cannot be too large.

Time complexity: For an array with length N, the time complexity of one round of sorting by unit number is O(n). If the longest one-digit length is K, it needs to sort by k rounds, and the complexity is O(n * k). K cannot be too large and can be ignored as a constant, so the time complexity of radix sort is approximately O(n).

Find a more graphic GIF from somewhere else

conclusion

Sorting algorithm Bucket-based approach Data requirements
Bucket sort Buckets are allocated based on numeric ranges Data is evenly distributed to ensure that data is evenly distributed to all buckets
Count sorting Allocate buckets based on individual elements It must be a positive integer. The maximum value of the element cannot be too large
Radix sort Buckets are allocated based on each digit Data can be divided into independent ‘bits’ for comparison, and the range of independent’ bits’ cannot be too large

Bucket sort, counting sort and radix sort are all O(n) time complexity sorting algorithms based on the concept of buckets. Sorting efficiency is high, but the data have certain requirements, so the application is not very widespread.

So in the right scene, choose the right sorting method, can greatly improve the sorting efficiency.