“This is the 16th day of my participation in the First Challenge 2022. For details: First Challenge 2022.”


This paper continues the sorting algorithm summary 1

Quicksort – version 1

Algorithm idea:

Typical application of divide-and-conquer algorithm

The first element arr[0] in the array is taken as the marker element, and the array to be sorted is divided into two parts through a quick sort. The element value of the left part is smaller than the marker element, while the element value of the right part is greater than or equal to the marker element. It then recursively applies quicksort to the two-part array until it is all sorted.

From the figure above, we can see that after the first quicksort, the position of the flag element is its final position; We can also see that partition (grouping arrays) is the core operation of quicksort, unlike merge algorithms, which simply divide elements into two parts based on array subscripts.

If you understand the following figure, this version of the quicksort algorithm will be solved.

We use the pointer I to traverse all elements except the first element, and use the pointer j to represent the position of the last element < v. Assuming we are accessing element E (the index of which is the pointer I), the following two situations will obviously occur:

// The pseudocode is shown as follows
if (e < v){
	swap(arr[j + 1],arr[i]);
	j++;
	i++;
}else{
	i++;
}
	
Copy the code

At the end of the process, we get something like thisAll you need to do is swap v and arr[j], and the first quicksort is done. All that’s left is to continue the recursive processing of the two-part array.

The code is as follows:

#include <iostream>
#include <algorithm>


template <typename T>
int __partition(T arr[],int l,int r){
	T pivot = arr[l];
	int j = l;
	for (int i = l + 1; i <= r; i++){if (arr[i] < pivot){
			std::swap(arr[i],arr[j + 1]);
			j++;
		}
	}
	std::swap(arr[l],arr[j]);
	return j;
}

template <typename T>
void __quickSort(T arr[],int l ,int r){
	if (l >= r)
		return ;
	//partition procedure, which returns the index of the final position of the element in the array
	int p = __partition(arr,l,r);
	__quickSort(arr,l,p- 1);
	__quickSort(arr,p+1,r);
	
}

template <typename T>
void quickSort(T arr[],int n){
	__quickSort(arr,0,n- 1);
}
Copy the code

The quicksort test for this edition is as follows

Next, we use this algorithm to sort an array with 100,000 nearly ordered elements. The test results are as follows:

The program crashed earlier…

The reason for this is that the system maintains a call stack for our recursive calls. When we choose the first element in a nearly ordered array as pivot, the rest of the elements are almost all larger than the pivot and continue to be recursively called, so the system maintains a call stack that is too deep. So that in this case, the time complexity of this version of quicksort degrades to O(n2)O(n^2)O(n2). Next, we will improve this version for this situation.

1.1 Quicksort – improvements to version 1

We randomly select pivot in order to obtain a balanced partition, and the specific code changes are as follows:

// Quicksort 1 improved version
#include <iostream>
#include <algorithm>

template <typename T>
int __partition1_enhance(T arr[],int l,int r) {
	// Optimization: arR [L] and random position of the numerical exchange
	std::swap(arr[rand() % (r - l + 1) + l],arr[l]);

	T flag = arr[l];

	int j = l;
	for (int i = l + 1; i <= r; i++) {
		if (arr[i] > flag) {
			//do nothing, just i++
		} else {
			std::swap(arr[j + 1],arr[i]);
			j++;

			// Can be abbreviated as
			// std::swap(arr[++j],arr[i]);
		}
	}
	std::swap(arr[l],arr[j]);
	return j;
}


template <typename T>
void __quickSort1_enhance(T arr[],int l,int r) {

	if (l >= r)
		return;

	int p = __partition1_enhance(arr,l,r);
	__quickSort1_enhance(arr,l,p- 1);
	__quickSort1_enhance(arr,p+1,r);
}


template <typename T>
void quickSort1_enhance(T arr[],int n) {
	// Set the random number seed
	srand(time(NULL));
	__quickSort1_enhance(arr,0,n- 1);
}

Copy the code

Test results:

As you can see, randomly selecting pivot makes a big improvement in sorting an almost ordered array.

Further, we use the optimized algorithm to sort an array containing a large number of duplicate elements (100000 data, element range [0,5]). The test results are as follows:

We see that for the same large amount of data, different data characteristics, the efficiency of the same quicksort algorithm is almost completely different, we continue to analyze why the improved version of quicksort is so weak for the array containing a large number of repeated elements. In fact, the reason is the same as before, also caused by the depth of recursion. With a large number of repeating elements (assuming that element 5 repeats the most), our algorithm says that once Pivot chooses 5, all 5 of the remaining elements must be on the right side of pivot (or on the left, depending on the implementation, but either way is unacceptable to us. This results in a very large imbalance in the size of the recursive array.

So our next optimization strategy is to try to randomly distribute duplicate elements on both sides of pivot rather than all on one side of pivot, which leads to the next two-way quicksort

Two-way quicksort

We use Pointers I and j to traverse from left to right and from right to left respectively, and the pseudocode is as follows:

/ / pseudo code
while(true) {
        while(i <= r && arr[i] < pivot) i++;
        while(j >l && arr[j] > pivot) j--;
        if (i > j) break;
        swap(arr[j],arr[i]);
        i++;
        j--;
    }
    swap(arr[j],arr[l]);
Copy the code

In this way, for I, the elements >= v all go to the position of J, and for j, the elements <= v all go to the position of I, which also achieves the purpose of uniform distribution of repeating elements on both sides. The specific implementation code is as follows:

#include <iostream>
#include <algorithm>
#include <ctime>
#include "InsertionSort.h"


template <typename T>
int __partition2(T arr[],int l,int r) {
	std::swap(arr[rand() % (r - l + 1) + l],arr[l]);

	T pivot = arr[l];
	int i = l + 1;
	int j = r;
	while(true) {
		while(i <= r && arr[i] < pivot) i++;
		while(j >l && arr[j] > pivot) j--;
		if (i > j) break;
		std::swap(arr[j],arr[i]);
		i++;
		j--;
	}
	std::swap(arr[j],arr[l]);
	return j;

}

template <typename T>
void __quickSort(T arr[],int l,int r) {
	if (r - l >= 15) {
		insertionSort(arr,l,r);
		return ;
	}

	int q = __partition2(arr, l ,r);
	__quickSort(arr,l,q - 1);
	__quickSort(arr,q + 1,r);
}

template <typename T>
void quickSort2(T arr[],int n) {
	srand(time(NULL));
	__quickSort(arr,0,n- 1);
}

Copy the code

Our two-way merge sort has been modified to sort arrays with a large number of repeating elements quite well. But if we think about it, since there are a lot of repeating elements, if I pick just one of them as pivot, then there must be a lot of elements in pivot equal to Pivot, and if we remove the repeating elements when we recurse, then we can imagine that we’re improving the performance of the algorithm to some extent. Let’s do one last optimization for quicksort. , many languages have their own sorting algorithm is the implementation of this idea, is well-known three-way quicksort.

Three-way quicksort

Three-way quicksort takes V as pivot, and divides elements into <v, = v, >v. After partition is completed, only merge the <v and >v parts.

We iterate over the elements with pointer I, use pointer lt to represent the index of the last element <v, and gt to represent the index of the first element >v. If we are accessing element E with index I, then there are three possible situations

if (e > v){
	swap(arr[i],arr[gt - 1]);
	gt--;
}else if (e < v){
	swap(arr[i],arr[lt + 1]);
	lt++;
	i++;
}else{
	i++;
}
Copy the code

The specific code is as follows:

#include <iostream>
#include <ctime>
#include <algorithm>

template <typename T>
void __quickSort3(T arr[],int l,int r){
	if (l >= r)
		return;
		
	/ / partition process
	std::swap(arr[l],arr[rand() % (r - l + 1) + l]);
	T pivot = arr[l];
	int lt = l;
	int gt = r + 1;
	int i = l + 1;
	while (i < gt){
		if ( arr[i] > pivot){
			std::swap(arr[--gt],arr[i]);
		}else if (arr[i] < pivot){
			std::swap(arr[++lt],arr[i]);
			i++;
		}else{
			i++;
		}
	}
	__quickSort3(arr,l,lt);
	__quickSort3(arr,gt,r);
} 

template <typename T>
void quickSort3(T arr[],int n){
	__quickSort3(arr,0,n- 1);
}
Copy the code

At this point, all versions of our quicksort algorithm are complete. See my Github for the complete code and test cases