Sorting problem is a classic problem in the algorithm, but also a required course in the computer science data structure course, in the face of many such as insert sort, quick sort, heap sort, merge sort and so on classical sorting algorithm, JDK implementer is how to choose sorting algorithm? Arrays. Sort, collections. sort. This paper tries to analyze the implementation source code of JDK 1.8 and learn how to use the sorting algorithm in the actual industrial environment.

An overview of the

The implementation of the Collections sort method ends up calling the Sort method of Arrays, so we only analyze the sort method of Arrays. In this section, we try to look at the sort method of Arrays from a macro perspective. The Sort methods of Arrays are divided into two types: basic and Object. Arrays.sort(int[]) If the number of array elements is less than NSERTION_SORT_THRESHOLD(47), then use the improved insertion sort to sort, 2. If the number of elements is greater than the insertion sort threshold and less than the QUICKSORT_THRESHOLD(286), then the improved biaxial quicksort is used for sorting, 3. If the number of elements is greater than the threshold of quicksort, which algorithm to continue using is determined by the disorder degree of the array, which is determined by the number of ordered sequences divided into different arrays. If the number of ordered sequences is greater than MAX_RUN_COUNT(67), the original array is considered to be basically unordered, and biaxial quicksort is still used. If it is less than MAX_RUN_COUNT, the original array is considered basically ordered and is sorted using merge sort. Arrays.sort(Object[]) If the number of elements in the array is smaller than MIN_MERGE(32),2. If the number is larger than MIN_MERGE, the array is divided into ordered blocks for merge sort. To sum up, both the sort of basic type and the sort of Object type use a variety of sort combinations. The reason for this combination is that different sorting methods need to be adapted to different sorting scales. For example, when the sorting scale is small, Insertion sort of O(n^2) can achieve better sorting performance in proportion to quicksort, and merge sort is a better choice than quicksort when the array is basically ordered. In general, different sorting methods are used depending on the size of the array. Next, we examine the implementation through detailed code.

Arrays.sort(int[])

The Sort method of The Arrays directly calls the Sort method of DualQivotQuicksort. So biaxial quicksort is the key.



If the size of the array is smaller than QUICKSORT_THRESHOLD, the sort method is used to sort the array.



Whether the length of the array is smaller than INSERTION_SORT_THRESHOLD, if so, use insert sort, and use leftMost to control whether to use traditional insert sort or improved insert sort. The modified insertion sort is called pair insertion sort. The basic idea is to carry out two unsorted arrays at a time and ensure that (A1 > A2) is in front of A1 at the same time.

If (length < INSERTION_SORT_THRESHOLD) {if (leftmost) {// Default, Optimized for Server VM at the insertion sort is used in case of * the leftmost part. */ for (int i = left, j = i; i < right; j = ++i) { int ai = a[i + 1]; while (ai < a[j]) { a[j + 1] = a[j]; if (j-- == left) { break; } } a[j + 1] = ai; /* * do {if (left >= right) {if (left >= right) {return; } } while (a[++left] >= a[left - 1]); /* * Every element from adjoining part plays the role * of sentinel, therefore this allows us to avoid the * left range check on each iteration. Moreover, we use * the more optimized algorithm, so called pair insertion * sort, which is faster (in the context of Quicksort) * than traditional implementation of insertion sort. */ for (int k = left;  ++left <= right; k = ++left) { int a1 = a[k], a2 = a[left]; if (a1 < a2) { a2 = a1; a1 = a[left]; } while (a1 < a[--k]) { a[k + 2] = a[k]; } a[++k + 1] = a1; while (a2 < a[--k]) { a[k + 1] = a[k]; } a[k + 1] = a2; } int last = a[right]; while (last < a[--right]) { a[right + 1] = a[right]; } a[right + 1] = last; } return; }Copy the code

If the value is not less than INSERTION_SORT_THRESHOLD, then use the quintuple method to find the value of the five positions for biaxial quicksort



It’s worth explaining here, the idea of two-axis quicksort is as opposed to one-axis quicksort, classic one-axis quicksort, you pick one element at a time as pivot value pivot, and then use a double pointer to move less than pivot to the left of pivot, and greater than pivot to the right, so that pivot is in its final position, This process is called a sort, and the call to a sort on both sides of the recursion can get the final sort result.

The basic idea of two-axis quicksort is that two elements can be placed in the final position at one time. Assuming that the values of these two axes are Pivot1 and pivot2, the final array is divided into three parts by these two elements after sorting: <pivot1, >= Pivot1 and <=pivot2, >pivot2. To achieve this division, three Pointers are needed for operation: I, K, and J. The Pointers to the left of I are smaller than Pivot1, and those to the right of J are larger than Pivot1. K is used for scanning, and when K and J meet, the scanning ends. For the concrete implementation of biaxis quicksort, please refer to this article”SinglePivotQuickSort and DualPivotQuickSort and their JAVA implementationsThis article is very well written.

Compared to single-axis quicksort, double-axis quicksort can obtain faster sorting efficiency, let’s take a look at the key implementation of the source code, from the source code, when all values of the 5 points are not the same, select the second point and the fourth point as a double-axis quicksort.





Otherwise, the third point is used for uniaxial quicksort



If the size of an array is smaller than QUICKSORT_THRESHOLD (286), the size of an array is larger than QUICKSORT_THRESHOLD (286). If the size of an array is larger than QUICKSORT_THRESHOLD (286), the size of an array is larger than QUICKSORT_THRESHOLD. Count is used for statistics, and if count is once equal to MAX_RUN_COUNT, it is considered basically unordered, using the method we mentioned above for quicksort.



Basic disorder check



In the process of the above statistics count, as well as using the run array to record all the basic order of the last element of an array of subscripts, if the decision result is the basic and orderly, the use of the final sort merge sort, method is through the run method to record the index of the merging two orderly sequence at a time, eventually making orderly array, I don’t want to post long merge code here.

So basically, that’s the sort algorithm for the base type.

Arrays.sort(Object[])

The Arrays.sort method’s Object array requires the Comparable interface to be implemented because it is in its implementation

LegacyMergeSort is the classic merge sort for Object sorts, but it’s going to be deprecated. By default, the sort method from the ComparableSort feature is used.



First, if the length of the array is smaller than MIN_MERGE(32), the binary insertion sort binarySort method is called to sort the array. Binary insertion sort is used to find the insertion position during insertion, which is a bit faster than linear search.



If MIN_MERGE is not allowed, merge sort is used. Merge sort is used by dividing the array into equal blocks (except for the last block), and then dividing the array into binary insertion sort. After sorting, pushRun is used to push the initial position and length of the current array. The mergeCollapse method merges two small arrays at a time, and by the end of the loop, the array has been sorted.



Well, that’s basically it

reference

DualPivotQuicksort Data array sort: Arrays.sort() and DualPivotQuickSort and their JAVA implementation of arrays.sort () method source analysis

The statement

This article was first published on my blog, welcome to follow! Knight has been commissioned to protect the rights of the articles of this site, reproduced to indicate the source of the article, the author retains the ownership of the article.