The JDK’s Collection framework for storing a set of objects is Collection. One set of operations for the Collection framework is Collections, which contains a variety of operations for Collections, such as sort, search, swap, reverse, copy, and so on.
This article covers sorting operations for Collections.
public static <T extends Comparable<? super T>> void sort(List<T> list) {
list.sort(null);
}
Copy the code
We can see that collections.sort () calls sort(), the default method in the List interface, as follows:
@SuppressWarnings({"unchecked", "rawtypes"})
default void sort(Comparator<? super E> c) {
Object[] a = this.toArray();
Arrays.sort(a, (Comparator) c);
ListIterator<E> i = this.listIterator();
for(Object e : a) { i.next(); i.set((E) e); }}Copy the code
Call arrays.sort () in the list.sort () method; The Sort method from the Arrays class looks like this:
public static <T> void sort(T[] a, Comparator<? super T> c) {
if (c == null) {
sort(a);
} else {
if (LegacyMergeSort.userRequested)
legacyMergeSort(a, c);
else
TimSort.sort(a, 0, a.length, c, null.0.0); }}Copy the code
If there is a custom Comparator, we call timsor.sort (). If there is no custom Comparator, we call comparableTimsor.sort (). Only Java. Util. TimSort. Sort can use custom Comparator, and java.util.Com parableTimSort. Do not use the Comparator sort.
ComparableTimSort.sort
static void sort(Object[] a, int lo, int hi, Object[] work, int workBase, int workLen) {
asserta ! =null && lo >= 0 && lo <= hi && hi <= a.length;
int nRemaining = hi - lo;
if (nRemaining < 2)
return;
// The array length is less than 32
if (nRemaining < MIN_MERGE) {
// Find the maximum number of increments or decrement, if decrement, the array is strictly inverted
int initRunLen = countRunAndMakeAscending(a, lo, hi);
binarySort(a, lo, hi, lo + initRunLen);
return;
}
// The array length is greater than or equal to 32` ` ` ` ` ` ` `/** * walk the array once from left to right, find the natural path, * extend the short natural run to the MIN element, and merge the run * leaving the stack unchanged. * /
ComparableTimSort ts = new ComparableTimSort(a, work, workBase, workLen);
int minRun = minRunLength(nRemaining);
do {
// Identify next run
int runLen = countRunAndMakeAscending(a, lo, hi);
// If run is short, extend to min(minRun, nRemaining)
if (runLen < minRun) {
int force = nRemaining <= minRun ? nRemaining : minRun;
binarySort(a, lo, lo + force, lo + runLen);
runLen = force;
}
// Push run onto pending-run stack, and maybe merge
ts.pushRun(lo, runLen);
ts.mergeCollapse();
// Advance to find next run
lo += runLen;
nRemaining -= runLen;
} while(nRemaining ! =0);
// Merge all remaining runs to complete sort
assert lo == hi;
ts.mergeForceCollapse();
assert ts.stackSize == 1;
}
Copy the code
Main steps:
The array length is smaller than 32
@SuppressWarnings({"fallthrough", "rawtypes", "unchecked"})
private static void binarySort(Object[] a, int lo, int hi, int start) {
// All elements with indexes less than start are sorted
assert lo <= start && start <= hi;
if (start == lo)
start++;
for(; start < hi; start++) { Comparable pivot = (Comparable) a[start];// Set left (and right) to the index where a[start] (pivot) belongs
int left = lo;
int right = start;
assert left <= right;
// Binary search
while (left < right) {
int mid = (left + right) >>> 1;
if (pivot.compareTo(a[mid]) < 0)
right = mid;
else
left = mid + 1;
}
assert left == right;
/* * The invariants still hold: pivot >= all in [lo, left) and * pivot < all in [left, start), so pivot belongs at left. Note * that if there are elements equal to pivot, left points to the * first slot after them -- that's why this sort is stable. * Slide elements over to make room for pivot. */
int n = start - left; // The number of elements to move
// Empty the position for insertion pivot
switch (n) {
case 2: a[left + 2] = a[left + 1];
case 1: a[left + 1] = a[left];
break;
default: System.arraycopy(a, left, a, left + 1, n); } a[left] = pivot; }}Copy the code
The array length is greater than or equal to 32
static void sort(Object[] a, int lo, int hi, Object[] work, int workBase, int workLen) {
// The number of arrays is less than 32.// The number of arrays is greater than 32
/** * March over the array once, left to right, finding natural runs, * extending short natural runs to minRun elements, and merging runs * to maintain stack invariant. */
ComparableTimSort ts = new ComparableTimSort(a, work, workBase, workLen);
// Calculate the length of run
int minRun = minRunLength(nRemaining);
do {
// Identify next run
// Find the maximum number of consecutive ascending orders
int runLen = countRunAndMakeAscending(a, lo, hi);
// If run is short, extend to min(minRun, nRemaining)
// If the run length is smaller than the specified minRun length, binary insertion sort is performed first
if (runLen < minRun) {
int force = nRemaining <= minRun ? nRemaining : minRun;
binarySort(a, lo, lo + force, lo + runLen);
runLen = force;
}
// Push run onto pending-run stack, and maybe merge
ts.pushRun(lo, runLen);
// merge
ts.mergeCollapse();
// Advance to find next run
lo += runLen;
nRemaining -= runLen;
} while(nRemaining ! =0);
// Merge all remaining runs to complete sort
assert lo == hi;
// merge all runs
ts.mergeForceCollapse();
assert ts.stackSize == 1;
}
Copy the code
-
Calculate the minimum length of run, minRun
A) If the array size is 2 to the NTH power, return 16 (MIN_MERGE / 2);
B) Otherwise, shift bitwise to the right (i.e. divide by 2) until you find a number between 16 and 32;
/**
* Returns the minimum acceptable run length for an array of the specified
* length. Natural runs shorter than this will be extended with
* {@link #binarySort}.
*
* Roughly speaking, the computation is:
*
* If n < MIN_MERGE, return n (it's too small to bother with fancy stuff).
* Else if n is an exact power of 2, return MIN_MERGE/2.
* Else return an int k, MIN_MERGE/2 <= k <= MIN_MERGE, such that n/k
* is close to, but strictly less than, an exact power of 2.
*
* For the rationale, see listsort.txt.
*
* @param n the length of the array to be sorted
* @return the length of the minimum run to be merged
*/
private static int minRunLength(int n) {
assert n >= 0;
int r = 0; // Becomes 1 if any 1 bits are shifted off
while (n >= MIN_MERGE) {
r |= (n & 1);
n >>= 1;
}
return n + r;
}
Copy the code
- Find the minimum increment length, if the length is less than minRun, use insert sort to supplement the number of minruns, the operation is the same as the number less than 32.
- Use the stack to record the length of each run, merging if one of the following conditions is true until the number is constant:
runLen[i - 3] > runLen[i - 2] + runLen[i - 1]
runLen[i - 2] > runLen[i - 1]
Copy the code
/** * Examines the stack of runs waiting to be merged and merges adjacent runs * until the stack invariants are reestablished: * * 1. runLen[i - 3] > runLen[i - 2] + runLen[i - 1] * 2. runLen[i - 2] > runLen[i - 1] * * This method is called each time a new run is pushed onto the stack, * so the invariants are guaranteed to hold for i < stackSize upon * entry to the method. */ private void mergeCollapse() { while (stackSize > 1) { int n = stackSize - 2; if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1]) { if (runLen[n - 1] < runLen[n + 1]) n--; mergeAt(n); } else if (runLen[n] <= runLen[n + 1]) { mergeAt(n); } else { break; // Invariant is established } } }Copy the code
A simple optimization is made on the merging method and the general merging sort. If the two runs are run1 and run2, run gallopRight and binarySearch are used to find the first element k in run1, then the elements before k in run1 are the smallest. Then, look in run2 for len2, the element after len2 in run2, and the element after len2 in run2 will be the largest element after the merge. Finally, depending on len1 and Len2, call mergeLo or mergeHi to merge the remaining elements.
/** * Merges the two runs at stack indices i and i+1. Run i must be * the penultimate or antepenultimate run on the stack. In other words, * i must be equal to stackSize-2 or stackSize-3. * * @param i stack index of the first of the two runs to merge */ @SuppressWarnings("unchecked") private void mergeAt(int i) { assert stackSize >= 2; assert i >= 0; assert i == stackSize - 2 || i == stackSize - 3; int base1 = runBase[i]; int len1 = runLen[i]; int base2 = runBase[i + 1]; int len2 = runLen[i + 1]; assert len1 > 0 && len2 > 0; assert base1 + len1 == base2; /* * Record the length of the combined runs; if i is the 3rd-last * run now, also slide over the last run (which isn't involved * in this merge). The current run (i+1) goes away in any case. */ runLen[i] = len1 + len2; if (i == stackSize - 3) { runBase[i + 1] = runBase[i + 2]; runLen[i + 1] = runLen[i + 2]; } stackSize--; /* * Find where the first element of run2 goes in run1. Prior elements * in run1 can be ignored (because they're already in place). */ int k = gallopRight((Comparable<Object>) a[base2], a, base1, len1, 0); assert k >= 0; base1 += k; len1 -= k; if (len1 == 0) return; /* * Find where the last element of run1 goes in run2. Subsequent elements * in run2 can be ignored (because they're already in place). */ len2 = gallopLeft((Comparable<Object>) a[base1 + len1 - 1], a, base2, len2, len2 - 1); assert len2 >= 0; if (len2 == 0) return; // Merge remaining runs, using tmp array with min(len1, len2) elements if (len1 <= len2) mergeLo(base1, len1, base2, len2); else mergeHi(base1, len1, base2, len2); }Copy the code
- And then finally we merge and we have unmerged runs, and we know that the number of runs is 1.
example
For demonstration purposes, I set minRun in TimSort directly to 2, otherwise I can’t demonstrate with a very small array. Also change MIN_MERGE to 2 (the default is 32) to avoid binary insert sort directly.
1. The initial array for,5,1,2,6,8,10,12,4,3,9,11,13,15,16,14 [7]
2. Look for the first row of ascending or descending sequence: [1, 5, 7],6,8,10,12,4,3,9,11,13,15,16,14 [2]
3. StackSize =1, so no merge, continue to find the second run
4. Find a decreasing sequence, adjust the order: [1, 5, 7],6,8,10,12 [2],3,9,11,13,15,16,14 [4]
RunLen [0] <= runLen[1
1) gallopRight: Find where the first element of run1 should be inserted in run0 (” 2 “should be inserted after” 1 “), and then ignore the elements of run0 before (all smaller than the first element of run1).
2) gallopLeft: Find where the last element of run0 should be inserted in run1 (” 7 “should be inserted before” 8 “), and then ignore the elements of run1 after it (all larger than the last element of run0).
This leaves only [5,7] [2,6] and mergeLow results: [1,2,5,6,7,8,10,12] [4,3,9,11,13,15,16,14]
6. Look for the ascending or descending sequence of consecutive,2,5,6,7,8,10,12 [1] [3, 4],11,13,15,16,14 [9]
RunLen [0] > runLen[1]
8. Looking for a continuous ascending or descending sequence:,2,5,6,7,8,10,12 [1] [3, 4],11,13,15,16 [9] [14]
9. Because runLen[1] <= runLen[2], merge is required
-
Use gallopRight, found in normal order. Have to,2,5,6,7,8,10,12 [1] [3,4,9,11,13,15,16] [14]
-
Finally only [14] this element:,2,5,6,7,8,10,12 [1] [3,4,9,11,13,15,16] [14]
-
Merge because runLen[0] <= runLen[1] + runLen[2]. Because runLen[0] > runLen[2], run1 and run2 are merged first. Otherwise run0 and run1 will be merged first
After completion of results:,2,5,6,7,8,10,12 [1],4,9,11,13,14,15,16 [3]
- After completion of results:,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 [1]