The JDK’s Collection framework for storing a set of objects is Collection. One set of operations for the Collection framework is Collections, which contains a variety of operations for Collections, such as sort, search, swap, reverse, copy, and so on.

This article covers sorting operations for Collections.

public static <T extends Comparable<? super T>> void sort(List<T> list) {
    list.sort(null);
}
Copy the code

We can see that collections.sort () calls sort(), the default method in the List interface, as follows:

@SuppressWarnings({"unchecked", "rawtypes"})
default void sort(Comparator<? super E> c) {
    Object[] a = this.toArray();
    Arrays.sort(a, (Comparator) c);
    ListIterator<E> i = this.listIterator();
    for(Object e : a) { i.next(); i.set((E) e); }}Copy the code

Call arrays.sort () in the list.sort () method; The Sort method from the Arrays class looks like this:

public static <T> void sort(T[] a, Comparator<? super T> c) {
    if (c == null) {
        sort(a);
    } else {
        if (LegacyMergeSort.userRequested)
            legacyMergeSort(a, c);
        else
            TimSort.sort(a, 0, a.length, c, null.0.0); }}Copy the code

If there is a custom Comparator, we call timsor.sort (). If there is no custom Comparator, we call comparableTimsor.sort (). Only Java. Util. TimSort. Sort can use custom Comparator, and java.util.Com parableTimSort. Do not use the Comparator sort.

ComparableTimSort.sort

static void sort(Object[] a, int lo, int hi, Object[] work, int workBase, int workLen) {
    asserta ! =null && lo >= 0 && lo <= hi && hi <= a.length;

    int nRemaining  = hi - lo;
    if (nRemaining < 2)
        return; 

    // The array length is less than 32
    if (nRemaining < MIN_MERGE) {
        // Find the maximum number of increments or decrement, if decrement, the array is strictly inverted
        int initRunLen = countRunAndMakeAscending(a, lo, hi);
        binarySort(a, lo, hi, lo + initRunLen);
        return;
    }

    // The array length is greater than or equal to 32` ` ` ` ` ` ` `/** * walk the array once from left to right, find the natural path, * extend the short natural run to the MIN element, and merge the run * leaving the stack unchanged. * /
    ComparableTimSort ts = new ComparableTimSort(a, work, workBase, workLen);
    int minRun = minRunLength(nRemaining);
    do {
        // Identify next run
        int runLen = countRunAndMakeAscending(a, lo, hi);

        // If run is short, extend to min(minRun, nRemaining)
        if (runLen < minRun) {
            int force = nRemaining <= minRun ? nRemaining : minRun;
            binarySort(a, lo, lo + force, lo + runLen);
            runLen = force;
        }

        // Push run onto pending-run stack, and maybe merge
        ts.pushRun(lo, runLen);
        ts.mergeCollapse();

        // Advance to find next run
        lo += runLen;
        nRemaining -= runLen;
    } while(nRemaining ! =0);

    // Merge all remaining runs to complete sort
    assert lo == hi;
    ts.mergeForceCollapse();
    assert ts.stackSize == 1;
}
Copy the code

Main steps:

The array length is smaller than 32

@SuppressWarnings({"fallthrough", "rawtypes", "unchecked"})
private static void binarySort(Object[] a, int lo, int hi, int start) {
    // All elements with indexes less than start are sorted
    
    assert lo <= start && start <= hi;
    if (start == lo)
        start++;
    for(; start < hi; start++) { Comparable pivot = (Comparable) a[start];// Set left (and right) to the index where a[start] (pivot) belongs
        int left = lo;
        int right = start;
        assert left <= right;
        // Binary search
        while (left < right) {
            int mid = (left + right) >>> 1;
            if (pivot.compareTo(a[mid]) < 0)
                right = mid;
            else
                left = mid + 1;
        }
        assert left == right;

        /* * The invariants still hold: pivot >= all in [lo, left) and * pivot < all in [left, start), so pivot belongs at left. Note * that if there are elements equal to pivot, left points to the * first slot after them -- that's why this sort is stable. * Slide elements over to make room for pivot. */
        int n = start - left;  // The number of elements to move
        // Empty the position for insertion pivot
        switch (n) {
            case 2:  a[left + 2] = a[left + 1];
            case 1:  a[left + 1] = a[left];
                     break;
            default: System.arraycopy(a, left, a, left + 1, n); } a[left] = pivot; }}Copy the code

The array length is greater than or equal to 32

static void sort(Object[] a, int lo, int hi, Object[] work, int workBase, int workLen) {
        // The number of arrays is less than 32.// The number of arrays is greater than 32
        /** * March over the array once, left to right, finding natural runs, * extending short natural runs to minRun elements, and merging runs * to maintain stack invariant. */
        ComparableTimSort ts = new ComparableTimSort(a, work, workBase, workLen);
        // Calculate the length of run
        int minRun = minRunLength(nRemaining);
        do {
            // Identify next run
            // Find the maximum number of consecutive ascending orders
            int runLen = countRunAndMakeAscending(a, lo, hi);

            // If run is short, extend to min(minRun, nRemaining)
            // If the run length is smaller than the specified minRun length, binary insertion sort is performed first
            if (runLen < minRun) {
                int force = nRemaining <= minRun ? nRemaining : minRun;
                binarySort(a, lo, lo + force, lo + runLen);
                runLen = force;
            }

            // Push run onto pending-run stack, and maybe merge
            ts.pushRun(lo, runLen);
            // merge
            ts.mergeCollapse();

            // Advance to find next run
            lo += runLen;
            nRemaining -= runLen;
        } while(nRemaining ! =0);

        // Merge all remaining runs to complete sort
        assert lo == hi;
        // merge all runs
        ts.mergeForceCollapse();
        assert ts.stackSize == 1;
    }
Copy the code
  1. Calculate the minimum length of run, minRun

    A) If the array size is 2 to the NTH power, return 16 (MIN_MERGE / 2);

    B) Otherwise, shift bitwise to the right (i.e. divide by 2) until you find a number between 16 and 32;

/**
     * Returns the minimum acceptable run length for an array of the specified
     * length. Natural runs shorter than this will be extended with
     * {@link #binarySort}.
     *
     * Roughly speaking, the computation is:
     *
     *  If n < MIN_MERGE, return n (it's too small to bother with fancy stuff).
     *  Else if n is an exact power of 2, return MIN_MERGE/2.
     *  Else return an int k, MIN_MERGE/2 <= k <= MIN_MERGE, such that n/k
     *   is close to, but strictly less than, an exact power of 2.
     *
     * For the rationale, see listsort.txt.
     *
     * @param n the length of the array to be sorted
     * @return the length of the minimum run to be merged
     */
    private static int minRunLength(int n) {
        assert n >= 0;
        int r = 0;      // Becomes 1 if any 1 bits are shifted off
        while (n >= MIN_MERGE) {
            r |= (n & 1);
            n >>= 1;
        }
        return n + r;
    }
Copy the code
  1. Find the minimum increment length, if the length is less than minRun, use insert sort to supplement the number of minruns, the operation is the same as the number less than 32.
  2. Use the stack to record the length of each run, merging if one of the following conditions is true until the number is constant:
runLen[i - 3] > runLen[i - 2] + runLen[i - 1] 
runLen[i - 2] > runLen[i - 1]
Copy the code
/** * Examines the stack of runs waiting to be merged and merges adjacent runs * until the stack invariants are reestablished: * * 1. runLen[i - 3] > runLen[i - 2] + runLen[i - 1] * 2. runLen[i - 2] > runLen[i - 1] * * This method is called each time a new run is pushed onto the stack, * so the invariants are guaranteed to hold for i < stackSize upon * entry to the method. */ private void mergeCollapse()  { while (stackSize > 1) { int n = stackSize - 2; if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1]) { if (runLen[n - 1] < runLen[n + 1]) n--; mergeAt(n); } else if (runLen[n] <= runLen[n + 1]) { mergeAt(n); } else { break; // Invariant is established } } }Copy the code

A simple optimization is made on the merging method and the general merging sort. If the two runs are run1 and run2, run gallopRight and binarySearch are used to find the first element k in run1, then the elements before k in run1 are the smallest. Then, look in run2 for len2, the element after len2 in run2, and the element after len2 in run2 will be the largest element after the merge. Finally, depending on len1 and Len2, call mergeLo or mergeHi to merge the remaining elements.

/** * Merges the two runs at stack indices i and i+1. Run i must be * the penultimate or antepenultimate run on the stack. In other words, * i must be equal to stackSize-2 or stackSize-3. * * @param i stack index of the first of the two runs to merge */ @SuppressWarnings("unchecked") private void mergeAt(int i) { assert stackSize >= 2; assert i >= 0; assert i == stackSize - 2 || i == stackSize - 3; int base1 = runBase[i]; int len1 = runLen[i]; int base2 = runBase[i + 1]; int len2 = runLen[i + 1]; assert len1 > 0 && len2 > 0; assert base1 + len1 == base2; /* * Record the length of the combined runs; if i is the 3rd-last * run now, also slide over the last run (which isn't involved * in this merge). The current run (i+1) goes away in any case. */ runLen[i] = len1 + len2; if (i == stackSize - 3) { runBase[i + 1] = runBase[i + 2]; runLen[i + 1] = runLen[i + 2]; } stackSize--; /* * Find where the first element of run2 goes in run1. Prior elements * in run1 can be ignored (because they're already  in place). */ int k = gallopRight((Comparable<Object>) a[base2], a, base1, len1, 0); assert k >= 0; base1 += k; len1 -= k; if (len1 == 0) return; /* * Find where the last element of run1 goes in run2. Subsequent elements * in run2 can be ignored (because they're already in place). */ len2 = gallopLeft((Comparable<Object>) a[base1 + len1 - 1], a, base2, len2, len2 - 1); assert len2 >= 0; if (len2 == 0) return; // Merge remaining runs, using tmp array with min(len1, len2) elements if (len1 <= len2) mergeLo(base1, len1, base2, len2); else mergeHi(base1, len1, base2, len2); }Copy the code
  1. And then finally we merge and we have unmerged runs, and we know that the number of runs is 1.

example

For demonstration purposes, I set minRun in TimSort directly to 2, otherwise I can’t demonstrate with a very small array. Also change MIN_MERGE to 2 (the default is 32) to avoid binary insert sort directly.

1. The initial array for,5,1,2,6,8,10,12,4,3,9,11,13,15,16,14 [7]

2. Look for the first row of ascending or descending sequence: [1, 5, 7],6,8,10,12,4,3,9,11,13,15,16,14 [2]

3. StackSize =1, so no merge, continue to find the second run

4. Find a decreasing sequence, adjust the order: [1, 5, 7],6,8,10,12 [2],3,9,11,13,15,16,14 [4]

RunLen [0] <= runLen[1

1) gallopRight: Find where the first element of run1 should be inserted in run0 (” 2 “should be inserted after” 1 “), and then ignore the elements of run0 before (all smaller than the first element of run1).

2) gallopLeft: Find where the last element of run0 should be inserted in run1 (” 7 “should be inserted before” 8 “), and then ignore the elements of run1 after it (all larger than the last element of run0).

This leaves only [5,7] [2,6] and mergeLow results: [1,2,5,6,7,8,10,12] [4,3,9,11,13,15,16,14]

6. Look for the ascending or descending sequence of consecutive,2,5,6,7,8,10,12 [1] [3, 4],11,13,15,16,14 [9]

RunLen [0] > runLen[1]

8. Looking for a continuous ascending or descending sequence:,2,5,6,7,8,10,12 [1] [3, 4],11,13,15,16 [9] [14]

9. Because runLen[1] <= runLen[2], merge is required

  1. Use gallopRight, found in normal order. Have to,2,5,6,7,8,10,12 [1] [3,4,9,11,13,15,16] [14]

  2. Finally only [14] this element:,2,5,6,7,8,10,12 [1] [3,4,9,11,13,15,16] [14]

  3. Merge because runLen[0] <= runLen[1] + runLen[2]. Because runLen[0] > runLen[2], run1 and run2 are merged first. Otherwise run0 and run1 will be merged first

After completion of results:,2,5,6,7,8,10,12 [1],4,9,11,13,14,15,16 [3]

  1. After completion of results:,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 [1]