Fast row profile

Quick sort (quick sort) is the algorithm often encountered in the tool class algorithm, the so-called tool class algorithm is there are a lot of algorithms or thinking questions are based on the same idea to answer, then this kind of algorithm is investigated probability is very high, for this kind of algorithm thinking and exploration is also very meaningful!

Quicksort is also a sorting algorithm that is widely used in practice, especially in C++ or Java basic types.

Why basic types? Merge sort this is a common question when comparing merge sort with fast sort. The reason is that the two kinds of sort have their own characteristics. Merge sort has more moves and the least comparison of elements. The running time of the algorithm is: 1) comparing elements; 2) Move elements. So quicksort is better for primitive types that are cheaper to compare, whereas for generic types that take longer to compare, such as implementing the Comparator interface, merge sorts with fewer comparisons should be considered.

Its average running time is zero, but unstable, and its worst-case time complexity isHowever, such instability can be avoided by optimizing the algorithm, which will be discussed next.

Common writing

    public static void QSort(int[] a, int left, int right) {
        if(left >= right) {
            return; Int base = a[left]; int i = left; int j = right; // Move the element to make the baseline valuewhile(I < j) {// move left to the frontwhile(i < j && base <= a[j]) { j--; } / / moves to the rightwhile(i < j && base >= a[i]) {
                i++;
            }
            if(i < j) { swap(a, i, j); }} // Swap base with the last element smaller than base (a, left, I); QSort(a, left, i-1); QSort(a, I +1, right); }Copy the code

Quicksort is a divide-and-conquer recursive algorithm that describes the most common implementation of this quicksort for arraysThe basic algorithm for sorting consists of the following four steps:

  • ifIf the number of elements is 0 or 1, return;
  • takeAny element in, known as theHub for the yuan(pivot), the above implementation takes the first element of the array as the pivot element;
  • In addition to the arrayThe remainder of is divided into two disjoint sets, a value greater than or equal toIs composed of elements less than or equal toElement composition of;
  • The collectionRepeat the previous steps to recursively fast-sort.

The process of fast sorting is shown in the following GIF:

The problem of imbalance arises

As mentioned above, there is an imbalance in fast sorting, but this imbalance can be solved by optimizing the algorithm. When does that cause an imbalance?

1) Caused by hub element selection

The performance of quicksort is highly dependent on the choice of hub elements for common writingSelect the first elementThe strategy of being a hub element is extremely dangerous. If the input is pre-sorted or anti-sorted, the hub element will produce extremely lopsided segmentation – all elements are presentThe set or the elements are all divided intoCollection.

And this bad situation will happen in all recursions, this imbalance situation the time cost is zeroEven more embarrassing is if the first element is selected as the pivot element and the input is pre-sorted, the time consumption is quadratic, but the result is nothing.

Test case: Leetcode 217. Duplicate elements exist

    public boolean containsDuplicate(int[] nums) {
        if (nums == null) {
            return false;
        }
        QSort(nums, 0, nums.length - 1);
        for (int i = 0; i < nums.length; i++) {
            if (i + 1 < nums.length && nums[i] == nums[i + 1]) {
                return true; }}return false;
    }
Copy the code

When fast sorting is done in the usual way, the last test case of the problem is a large, pre-sorted array of integers that causes a timeout due to an unbalanced quadratic time.

2) Caused by segmentation strategy

Then, we consider the imbalance caused by the segmentation strategy. The comparison and exchange strategy of the third element in the quicksort step is called the segmentation strategy, which can be understood as the strategy of dividing the array into two non-intersecting subarrays according to the size relationship with the hub element, as shown in the following figure:

The hub element in the figure is 5, and the number less than 5 is on the left of the segmentation result 5, and the number greater than 5 is on the right. It is important to note that all elements in the array are different, and the optimization of the partition strategy focuses on what to do if there are duplicate elements in the array.

The optimal segmentation strategy is expected to split the array into two subarrays with similar number of elements, while the bad segmentation strategy will produce two imbalanced subarrays, that is, the imbalance problem occurs. In extreme cases, the result will be the same as the time complexity when the first element is selected as the hub element in pre-sorting.

Let’s consider an extreme case where all elements of an array are equal, using a common notation to look at the algorithm’s split strategy:

/ / shift to the leftwhile(i < j && base <= a[j]) { j--; } / / moves to the rightwhile(i < j && base >= a[i]) {
                i++;
            }
Copy the code

Firstly, the R pointer moves left to find the first element less than the hub value. Note that the strategy adopted for elements identical with the hub element is non-stop (continue to move when equal elements are encountered), so the right shift will move left until L == R ends, as shown below:

It turns out, obviously, that subarrayIs empty,It is a highly unbalanced segmentation strategy that contains five elements except the hub element.

Optimization of hub element selection

From the algorithm described above, there are many choices for the hub elements, no matter which element in the array is selected, the sorting can be done. However, as mentioned above, some bad choices can lead to imbalance. The following choices are discussed:

  • A wrong approach

The wrong way to select is to use the first or last element as the pivot element. This is acceptable if the input array is random, but if the input is pre-sorted or unsorted, it can cause an imbalance and time complexity increases.

This method produces poor results as shown in the previous Leetcode algorithm problem, resulting in algorithm timeouts, so we should avoid this method.

  • A safe way to do it

One safe policy is to randomly select hub elements. This strategy is generally very safe unless there is a problem with the random number generator, because random hub elements cannot always continuously produce poor segmentation. However, random number generation is usually expensive and not worth the cost.

  • Three slightly value segmentation method in thinking, the most optimal hub for the yuan should be a number of the array is divided into two elements of similar subarray, actually is also in the median number of array elements, namely the optimal hub for the yuan is the median, but if every selection when calculate the median array, and requires a lot of time and, apparently, nor desirable.

After comprehensive consideration, a method of median estimation is proposed —- three-digit median segmentation method. The basic idea is that the median of the three elements on the left end, right end and center is used as the pivot element, which is actually the estimation of the median. The selection process is as follows:

The code is implemented as follows. The parameter is an array of hub elements to be selected, and the value of the hub elements is returned.

Private int median3(int[] a,int I,int j) {private int median3(int[] a,int I,int j) {private int median3(int[] a,int I,int j) {if (a[m] < a[i]) {
            swap(a, i, m);
        }
        if (a[j] < a[i]) {
            swap(a, i, j);
        }
        if(a[j] < a[m]) { swap(a, j, m); } // place the hub value at j-1; swap(a, m, j - 1);return a[j - 1];
    }
Copy the code

Sort the elements of left end A [left], right end A [right] and center position A [center], then place the hub element at position A [right-1]. Advantage 1: The position of a[left] and a[right] is the correct position of segmentation, so the interval to be segmented in the latter order can be reduced to [left+1,right-2]. Advantage two: the hub metadata stored in A [right-1] can act as a warning mark to prevent transgression.

Optimization of segmentation strategy

Recalling the previous imbalance caused by the split strategy, the detail of the split strategy is how to deal with those elements equal to the pivot element. The problem is whether the L pointer and R pointer stop when they meet the element equal to the pivot element. Then there are three strategies:

  • LPointers andRThe hands are non-stop;
  • LPointers andRThe hands stopped;
  • LPointers andROne of them stops, one of them doesn’t stop.

Considering the extreme cases where all elements are equal, it is clear that both the stop and stop strategies actually result in an unbalanced case, splitting the two arrays into extremely unbalanced ones. (Refer to the previous diagram of unbalanced segmentation strategies)

So given that we’re pursuing a balanced strategy, it’s better to do unnecessary swapping to create two balanced subarrays than to risk two extremely unbalanced subarrays. Therefore, when LR encounters an element equal to the pivot element, both Pointers are stopped to avoid the occurrence of quadratic time.

Optimized fast platoon

    public void QSort(int[] a, int left, int right) {
        if(left >= right) {
            return; } // int base = median3(a, left, right); int i = left; int j = right - 1;while(i < j) {
            while(i < j && base > a[++i]) {}
            while(i < j && base < a[--j]) {}
            if(i < j) { swap(a, i, j); } } swap(a, i, right - 1); QSort(a, left, i - 1); QSort(a, i + 1, right); } private void swap(int[] a, int i, int j) { int temp = a[i]; a[i] = a[j]; a[j] = temp; } private int median3(int[] a,int I,int j) {private int median3(int[] a,int I,int j) {int m = (I + j) >> 1;if (a[m] < a[i]) {
            swap(a, i, m);
        }
        if (a[j] < a[i]) {
            swap(a, i, j);
        }
        if(a[j] < a[m]) { swap(a, j, m); } // put the pivot element in j-1; swap(a, m, j - 1);return a[j - 1];
    }
Copy the code

Small details of the comparison between the common writing method and the optimal fast sorting, it is found that in addition to the optimization strategy mentioned above, there are some changes: for example, in the common writing method, the R pointer moves left first, and the L pointer moves right after the optimization, and the L pointer moves right first. Why? The reason: In common writing hub for the yuan in the left, right hand left shift must stop at a less than or equal to the hub yuan elements corresponding to the location of the (assuming the index), followed by the left pointer value moves to the right hypothesis has been less than hubs, will stop at the same location and right pointer (index), segmentation of the last step is to exchange hub and left pointer, While the left pointer points to a value less than or equal to the hub element (i.e. the index of the right pointer moved to the left), the final segmentation result is less than the hub element and the value is on the left, so there is no problem. In the common writing method, if the left pointer moves first, then the left pointer stops at a position greater than or equal to the hub element. If the last left pointer is exchanged with the hub element, a value greater than the hub element is moved to the left, which is obviously not feasible. Conclusion: Therefore, it is concluded that the moving order of the left and right Pointers is determined by the position of the hub element to be exchanged. If the hub element is on the left (commonly written as the hub element), a value less than or equal to the hub element should be exchanged with it, while the right pointer must be moved left to get a value less than or equal to the hub element. And the hub element on the right (optimized hub element) then should be a value greater than or equal to the hub element and it exchange, so use the first left move scheme!

An example of an unbalanced test case can be tried again with the optimized algorithm: Leetcode 217. Duplicate elements exist, execute result.

The last

This is the first blog in nuggets, write bad place hope we correct!

Data Structure and Algorithm Analysis of Java Language description -7.7 Quick sort image source: Fast sort dynamic graph unbalanced fast sort call stack segmentation strategy