Python implements ten sorting algorithms

Common internal sort algorithms are: insertion sort, hill sort, selection sort, bubble sort, merge sort, quick sort, heap sort, radix sort and so on.

Bubble sort

Ideas:Compare adjacent elements. If the first one is bigger than the second, swap them both. The larger number is compared to the next digit;

Each pair of adjacent elements does the same work, from the first pair at the beginning to the last pair at the end. When you do that, the last element is going to be the largest;

Repeat for all elements except the last one;

Keep repeating the above steps for fewer and fewer elements at a time until there are no more pairs to compare;

def bubbleSort(arr):
    for i in range(1, len(arr)):
        for j in range(0, len(arr)-i):
            if arr[j] > arr[j+1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]
    return arr
Copy the code

Second, selection sort

Ideas:

Find the smallest (largest) number in the unsorted array and place it in the starting position;
Find the smallest (largest) number in the unsorted array and place it at the end of the sorted array.
Repeat Step 2;

Def selectionSort(arr): for I in range(len(arr) -1) def selectionSort(arr): for I in range(len(arr) -1) If arr[j] < arr[minIndex]: minIndex = j # I if arr[j] < minIndex [minIndex]: minIndex = j # I if arr[j] < minIndex [minIndex]: minIndex = j # I = minIndex: arr[i], arr[minIndex] = arr[minIndex], arr[i] return arrCopy the code

Insert sort

Ideas:

Start with the first element, which can be considered sorted
Takes the next element and scans back to front in the sorted element sequence
If the element (sorted) is larger than the new element, move the element to the next position
Repeat step 3 until you find a place where the sorted element is less than or equal to the new element
After inserting the new element into the location
Repeat steps 2 to 5

def insertionSort(arr):
    for i in range(len(arr)):
        preIndex = i - 1
        current = arr[i]
        while preIndex >= 0 and arr[preIndex] > current:
            arr[preIndex+1] = arr[preIndex]
            preIndex-=1
        arr[preIndex+1] = current
    return arr
Copy the code

Hill sort

Hill sort, also known as the descending incremental sort algorithm, is a more efficient and improved version of insertion sort. But Hill sort is an unstable sort algorithm.

Hill sort is an improved method based on the following two properties of insertion sort:

Insertion sort has high efficiency when it operates on almost ordered data, that is, it can achieve the efficiency of linear sort.
But insert sort is generally inefficient because it can only move data one bit at a time;

The basic idea of Hill sorting is that the whole sequence of records to be sorted is divided into several sub-sequences for direct insertion sorting respectively. When the records in the whole sequence are “basically ordered”, the whole records are directly inserted in sequence.

Specific ideas:

Calculates an incremental (interval) value
Compare elements incremented by, say,7, 0,7,14,21… To insert sort
Then the 1,8,15… Sort, incrementally sort
When all the elements are sorted, reduce the increment to say 3, and then repeat steps 2,3 above
When the increment is finally reduced to 1, the sequence is basically in order, and the last ordinary insert is enough

Def shell_sort(arr): def shell_sort(arr): def shell_sort(arr): def shell_sort(arr): J = I current = arr[I] # j = I current = arr[I] # j = I current = arr[I] # Arr [j] = arr[j] j -= gap arr[j] = current # gap //= 2 return arrCopy the code

Merge sort

As a very typical divide-and-conquer algorithm application.

So what is the idea of divide and rule?

Merge sort uses the divide-and-conquer method, to keep a complex problem to decompose into small problems, solve the small problems, you can solve the big problem backward.

There are 3 steps as shown above:

1. Decomposition: the display of a list is divided into 2 sublists, and 2 sublists are divided into 2 sublists, and the sublists are divided continuously until there is only 1 element in the final list;

2. Solution: Since there is only one element left in the last list, it is very convenient to compare the sizes of them, and then solve each sub-problem in the same way from bottom to top.

3. Merge: Combine solved problems to get the final result;

The sorting is recursive. When you don’t know the answer to a question, the computer will continue to decompose it from top to bottom until the smallest question is answered, and then the computer will return the results of each question from bottom to top.

Merge sort can be implemented in two ways:

Top-down merges (the second method, since all recursive methods can be iteratively rewritten);
Bottom-up iteration;

To merge with recursion:

Def mergeSort(arr): # if(len(arr) <= 1): Middle = len(arr)//2 # slice the list into the left part, Left,right = arr[0:middle],arr[middle:] # mergeSort (mergeSort, mergeSort, mergeSort) Return merge(mergeSort(left),mergeSort(right)) def merge(left,right): Result = [] # loop while there are elements in both classes while left and right: If left[0] <= right[0]: if left[0] <= right[0]: if left[0] <= right[0]: if left[0] <= right[0]: if left[0] <= right[0]: Result.append (right.pop(0)) # if there are more elements in the list, add them to the new list. While left: result.append(left.pop(0)) while right: result.append(right.pop(0)) return resultCopy the code

Merge with iteration:

def myMergeSort(alist): n = len(alist) i = 1 while i < n: Left_start = right_start = right_end = 0 merged = [] right_start = left_end = left_start + i right_end = left_end + i if right_end > n: Right_end = n left = alist[left_start:left_end] # Right = alist[right_start:right_end] while left and right: if left[0] < right[0]: Merged. Append (left.pop(0)) else: Merged.append (right.pop(0)) # print(merged) merged.extend(left if left else right) # Merge elements add alist[left_start:right_end] # print(alist, I,left_start) left_start += I * 2 # move cursor right, Merge return alist; merge return alist; merge return alistCopy the code

Quicksort

Quicksort uses divide-and-conquer to divide an array into two subarrays.

Like merge sort, quicksort is a typical application of divide-and-conquer in sorting algorithms. Quicksort is essentially a recursive divide-and-conquer method based on bubble sort.

Quicksort’s name is simple and rude, because when you hear the name you know it is fast, and efficient! It’s one of the fastest sorting algorithms for big data. Although the worst case is O(n ^ 2), it is excellent, and in most cases performs better than the sorting algorithm with O(n logn) average time.

Basic idea:

1. Start by taking a number from the sequence as the base number.

2. Partitioning, you put everything to the right of this number, everything less than or equal to this number to the left of this number.

3. Repeat the second step for the left and right intervals until each interval has only one number.

Therefore, for most random sequences with weak order, quicksort is always better than merge sort.

def quick_sort(array, left, right):
    if left >= right:
        return
    low = left
    high = right
    key = array[low]
    while left < right:
        while left < right and array[right] > key:
            right -= 1
        array[left] = array[right]
        while left < right and array[left] <= key:
            left += 1
        array[right] = array[left]
    array[right] = key
    quick_sort(array, low, left - 1)
    quick_sort(array, left + 1, high)
Copy the code

Counting sort

Counting sort is a kind of sorting algorithm that is not based on comparison. The sorting algorithm we introduced before is almost based on comparison between elements to sort. The time complexity of counting sort is O(n + m), m refers to the amount of data, and the time complexity of counting sort algorithm is about O(n). Faster than any comparison sorting algorithm.

Ideas:

Find the largest and smallest elements in the array to be sorted
Count the number of occurrences of each element of value I in the array, stored in the i-th entry of array C
Add up all the counts (starting with the first element in C, adding each term to the previous one)
Reverse populating the target array: place each element I in the C(I) item of the new array, subtracting C(I) by 1 for each element

8. Bucket sorting

Bucket sort is an updated version of counting sort. It makes use of the mapping relation of functions, and the key of efficiency lies in the determination of the mapping function.

Bucket sort works by dividing the data into a finite number of buckets, assuming that the input data is evenly distributed, and sorting each Bucket separately (it is possible to use another sorting algorithm or to continue sorting using Bucket sort recursively).

Basic idea:

Set a quantitative array as an empty bucket;
Iterate over the input data and put the data one by one into the corresponding bucket;
Sort each bucket that is not empty;
Splicing together sorted data from never-empty buckets.

def bucketSort(nums): Bucket = [0] * (max_num + 1) * (max_num + 1) * (max_num + 1) * (max_num + 1) * (max_num + 1) Sort_nums = [] # for j in range(len(bucket)): if bucket[j]! = 0: for y in range(bucket[j]): sort_nums.append(j) return sort_numsCopy the code

Heap sort

Heap sort, as its name implies, is an algorithm that uses heap data structure to sort.

If you are not familiar with heap data structures, check out the article – watch the animation to understand heap easily

If you’re familiar with the data structure heap, you should know that heap is a priority queue, and there are two implementations, maximum heap and minimum heap, and since we sort in ascending order here, we’ll just call it maximum heap.

We can all identify the heap (the default for the mass of all) as a complete binary tree, but located at the top of the heap element is always a maximum of the whole tree, each node values are smaller than the parent node, due to the heap to keep such rules features, so once the inside of the heap data changes, we must be carried out on the heap to build at a time.

Since the top element is always the maximum value in the whole tree, shouldn’t we just take the element from the top of the heap after we put the data into a heap? The first time you take an element, is that the maximum? And then we rebuild the heap, and then we take the element at the top of the heap, is that the second largest value? You take it over and over again, and the data that you take out is the ordered data.

Basic idea:

The heap is built by initializing the array to be sorted into the big top heap.
Swap the top element with the last element to form a new big top heap without the last element.
Since the top element of the second heap is swapped with the last one, the newly created heap is not the big top heap, and the big top heap needs to be rebuilt. Repeat the process until there is only one element left in the heap.

class Solution(object): def heap_sort(self, nums): I, l = 0, len(nums) self.nums = nums # for I in range(l//2-1, -1, -1) Self.build_heap (I, L-1) # after the loop completes the construction of the big heap, the root node is swapped with the end node, and then the big heap is resized for j in range(L-1, -1, -1): nums[0], nums[j] = nums[j], nums[0] self.build_heap(0, j-1) return nums def build_heap(self, i, l): "" nums = self. Nums left, right = 2* I +1, Large_index = I if left <= l and nums[I] < nums[left]: large_index = left if right <= l and nums[left] < nums[right]: Large_index = right # If large_index = right # if large_index = right # if large_index = right = i: nums[i], nums[large_index] = nums[large_index], nums[i] self.build_heap(large_index, l)Copy the code

Radix sort

Radix sort is a kind of non-comparative integer sort algorithm. Its principle is to cut the integers into different numbers according to the number of digits, and then compare them according to each number. Since integers can also represent strings (such as names or dates) and floating-point numbers in a particular format, radix sort is not restricted to integers.

Basic idea:

Get the largest number in the array and get the number of digits;
Add zeros before numbers with shorter digits;
To allocate, start with the bits and place them into buckets 0 to 9 according to the bits (0 to 9).
Collect, and then put the data placed in buckets 0 to 9 into the array in order;
Repeat the 3~4 process until the highest digit can complete the sorting.

Return (num % (I * 10) - (num % I)) // def getMax(num % I): Len (numList) == 1: return numList[0] maxNum = numList[0] for I in range(len(numList)): if numList[i] > maxNum: maxNum = numList[i] return maxNum def radixSort(numList): if len(numList) == 0 or len(numList) == 1: return numList maxNum = getMax(numList) bitCount = 0 index = 1 while maxNum // index: Index *= 10 currentBit = 1 While currentBit <= 10**(bitCount-1) Res = [] buckets = [[] for I in range(10)] currentBitNum = getbit(i,currentBit) buckets[currentBitNum].append(i) for i in range(10): for j in range(len(buckets[i])): res.append(buckets[i][j]) numList = res currentBit *= 10 return numListCopy the code

Summary of sorting algorithm

1. Bubble sort is almost the worst sort

2. When sorting random numbers, when the data set is very small, insertion sort is faster than comparison sort

Only when n=10, fast sort is relatively slow, while insertion and Hill sort are relatively fast, because both insertion sort and Hill sort belong to insertion sort, while fast sort and bubble belong to swap sort, and the exchange consumes a large proportion of resources when the amount of data is small.

3. In the case of small amount of data, insertion sort is superior to other sorts in basic ordered data sort

In the sorting results of basic ordered data, when n=10 and n=100, insertion sort consumes less time, because the data is basically ordered, so the number of inserts is less. Although insertion sort requires comparison one by one, the proportion of resources consumed by comparison is not too large because of the small amount of data.

4. Regardless of whether the data is random or basically ordered, the larger the data volume, the more obvious the advantage of fast sorting

Fast sorting is really fast. We can see that when the data set reaches the level of 100,000, bubble sorting has taken more than 800 seconds, while fast sorting only takes 0.3 seconds. I believe that with the increase of data volume, the gap between them will become bigger and bigger.

5. The fast platoon optimization scheme is established

For large data set sorting first use fast sorting, so that the data set to reach a basic order, and then when partition reached a certain small use insert sort, because insert sort for a small number of basic ordered data set performance is better than fast sorting!

Special thanks:

This is probably the best article on top 10 sorting algorithms in the Eastern Hemisphere.

GitHub Hot project: Using Python to implement all algorithms – Zhihu (zhihu.com)

Python implements ten sorting algorithms

Bubble sort

Second, selection sort

Insert sort

Hill sort

Merge sort

Quicksort

Counting sort

8. Bucket sorting

Heap sort

Radix sort

Summary of sorting algorithm

Special thanks:

Related Posts

Python tips you may not know

Want to be a data scientist/engineer? Systematically plan the way of big data learning from scratch

Spark series 5 — Spark runtime mode and job submission