preface

The last post was not displayed in the specified format when talking about time complexity and space complexity (obviously there is no problem when previewing). I am obsessive-compulsive and slightly optimized to re-publish, in order to make friends look comfortable.

Last time we talked about the direct insertion sort, when comparing the ordered data and the data to be inserted, we compare them by traversing in sequence. When the amount of data is large, we need to consider further optimization. Split insertion sort improves efficiency by reducing the number of comparisons between ordered data and data to be inserted.

The body of the

1. Get familiar with half search

1.1 The idea of half search algorithm

Split search, also known as binary search, only applies to the ordered order list;

Idea (assuming the order list is ascending) :

  • The specified value is first compared to the middle element in the order table;

  • If equal, it means found, then return the position of the element;

  • If not, continue to search;

    If the specified value is less than the middle element in the order table, the first half is searched;

    If the specified value is greater than the middle element in the order table, the second half is searched;

Repeat the process until the element is found; Or find all the data, that is, search failure;

1.2 Half search implementation and analysis

The algorithm code is as follows (look up data in the ascending order table)

The result is as follows:

Parse the lookup step procedure as follows:

In the figure, the data indicated by the red, green and yellow arrows are respectively high, middle and low index bits, and the blue is the number to be searched in the sequence table.

Steps to obtain data:

Above steps:

  • In step 1, low is initially set to 0, and high is initially assigned to the number of elements in the sequence table minus 1, which is 5; So when we loop in, we’re going to assign mid to (low+high)/2, because mid is int, we’re going to round it up, so mid is going to be 2; Then, the data 66 with index bit 2 was compared with the data 92 to be searched, and it was found that 92 was greater than 66, so it was necessary to continue searching in the latter part, so the value of low was changed to mid+1, that is, 3.
  • The second step enters the loop and continues to assign the value of mid to (low+high)/2. Since the value of low in the previous step was 3, the value of high remained unchanged and was still 5, the mid value obtained was 4. Then, the data 92 with index bit 4 was compared with the data 92 to be searched. Return to the location mid found at that time.

What happens when the lookup fails:

Above steps:

  • In step 1, low is initially set to 0, and high is initially assigned to the number of elements in the sequence table minus 1, which is 5; So when we loop in, we’re going to assign mid to (low+high)/2, because mid is int, we’re going to round it up, so mid is going to be 2; Then, data 66 with index bit 2 is compared with data 921, which needs to be searched. It is found that 921 is greater than 66, so it needs to continue searching in the latter part. Therefore, the value of low is changed to mid+1, that is, 3.
  • The second step enters the loop and continues to assign the value of mid to (low+high)/2. Since the value of low in the previous step was 3, the value of high remained unchanged and was still 5, the mid value obtained was 4. Then, the data 92 with index bit 4 was compared with the data 921 to be searched, and 921 was found to be larger than 92. So change the value of low to mid+1, which is 5;
  • The third step enters the loop and continues to assign the value of mid to (low+high)/2. Since the value of low in the previous step was 5, the value of high remains unchanged, so the mid value is 5(where the high, middle and low all point to the same position). Then, the data 100 with index bit 5 is compared with the data 921 that needs to be searched. If 921 is greater than 100, we need to continue searching in the latter part, so change the value of low to mid+1, that is, 6.
  • When step 4 enters the loop, low changes to 6 at step 3 and high remains the same, still 5. Low is greater than high, which does not meet the loop condition, indicating that the query has been completed but no data has been queried. Jump out of the loop and return -1.
1.3 Analyzing the performance of the split search algorithm

Time complexity

If the size of the data passed is N, there are n elements; The search is performed in n/2 elements for the first time, n/(22) elements for the second time, and n/(23) elements for the third time. If the element is found after x times, the time complexity is O(log2n).

Spatial complexity

Because several fixed intermediate variables (low,mid and High) are used in the search process, the memory consumed in the algorithm is at a constant level, so the space complexity is O(1).

The stability of

Since it only searches but does not change the position of elements in the algorithm, the half-search algorithm is stable.

To sum up, the time complexity of insertion sort is O(log2n) and the space complexity is O(1), which is a stable algorithm.

2. Figure out split insertion sort

2.1 Idea of split insertion sort algorithm

Binary insertion sort is the optimization of direct insertion sort, insertion sort directly in the process of comparison, in turn, traverse an ordered list of elements and to insert the data comparison, and binary insertion sort is to replace the original traverse an ordered list in turn with binary search algorithm, more efficiency, to find the right position, the corresponding elements need to be backward shift, Then insert the element into the empty space; Repeat until the sorting is complete.

2.2 Implementation and analysis of split insertion sorting algorithm

Code implementation (ascending) :

The running effect is as follows:

Step analysis is shown as follows:

Step description:

The green box represents the sorted list, the arrow represents the next element to be inserted, and the yellow box represents the remaining unordered elements. The yellow square is the mid position found in each half-cut, and the green square is the position vacated in the last ordered list.

  • Copy the original data array to arrayB in the new array. The main purpose of this step is not to declare additional temporary variables later, but also to achieve simple logic for the subsequent core code and reduce excessive judgment.

  • Step 1 takes the first element as an ordered list (the first element is 2), the next element to be inserted is 5, and places 5 in the sentry position, where the index is 0. And then I cut it in half, low is 1 at the beginning, high is 1 the first time; Since there is only one element in the ordered list at the beginning, 2 will be found. Compared with the sentinel value 5, 2 is less than 5, so we need to continue searching in the latter part of the ordered list. Change low to mid+1, which is now 2 and greater than high, and break out of the loop. There is no need to move position, keep current position.

  • At step 2, the elements in the ordered list are 2 and 5, and the next element to be inserted is 6. Put 6 in the sentry position, that is, the position with index 0. Then cut it in half:

    In step 2-1, the initial low is 1, and the value of high is calculated as 2. According to low and high, mid is calculated to be 1(since mid is an integer, 3 is divided by 2 and the whole is 1). The value of mid is compared with the element to be inserted 6, 2 is less than 6, and it is necessary to continue searching in the latter part of the ordered list. Then the value of low is changed to mid plus 1 and low is 2.

    In steps 2-1, low is 2, and high is still 2. According to low and high, mid is calculated to be 2. The value 5 of mid is compared with the element to be inserted 6. 5 is less than 6, so it is necessary to continue searching in the latter part of the ordered list. If the circular condition is not met, exit half; No need to move position, keep current position unchanged;

  • At step 3, the elements in the ordered list are 2, 5, and 6. The next element to be inserted is 1, and 1 is placed in the sentry position, that is, the position with index 0. Then cut it in half:

    In step 3-1, the initial low is 1, and the value of high is calculated as 3. According to low and high, mid is calculated to be 2. The value 5 of mid is compared with the element to be inserted 1. If 5 is greater than 1, we need to continue searching in the first half of the ordered list, then change the value of high to mid minus 1, and high is 1.

    In steps 3-2, the initial low is 1, and in the previous step, the value of high is calculated as 1. Mid is calculated to be 1 according to low and high, and the value of mid 2 is compared with the element to be inserted 1. If 2 is greater than 1, it is necessary to continue searching in the first half of the ordered list. Then the value of high is changed to mid minus 1, and high is 0. Continue the loop if low is greater than high, if the condition is not satisfied, out of the loop;

    In the third step, according to the half-search and comparison, the appropriate position is high+1, that is, the element to be inserted needs to be inserted to position 1, and the elements 2, 5 and 6 need to be shifted backward successively to vacate the position of index bit 1, and the element to be inserted 1 needs to be inserted into this index bit.

  • At step 4, the elements in the ordered list are 1, 2, 5, and 6. The next element to be inserted is 9. Put 9 in the sentry position, that is, the position with index 0. Then cut it in half:

    In step 4-1, the initial low is 1, and the value of high is calculated as 4. According to low and high, mid is calculated to be 2(since mid is an integer, 5 is divided by 2 and the whole is 2). The value of mid is compared with the element to be inserted 9, 2 is less than 9, and it is necessary to continue searching in the latter part of the ordered list. Then the value of low is changed to mid+1 and low is 3.

    In steps 4-2, low is calculated as 3, and the value of high is still 4. According to low and high, mid is calculated to be 3(since mid is an integer, 7 is divided by 2 and the whole is 3). The value of mid is compared with element 9 to be inserted. 5 is less than 9, and it is necessary to continue searching in the latter part of the ordered list.

    In steps 4-3, low is calculated as 4, and the value of high is still 4. Mid is calculated to be 4 according to low and high, and the value of mid is compared with element 9 to be inserted. 6 is less than 9, so it is necessary to continue searching in the latter part of the ordered list. Then the value of low is changed to mid+1, and low is 5. Continue the loop if low is greater than high, if the condition is not satisfied, out of the loop;

  • The fifth step is, the ordered list of elements 1, 2, 5, 6, 9, the next element to be inserted is 3, 3 into the sentry position, that is, index 0 position; Then cut it in half:

    In step 5-1, the initial low is 1, and the value of high is calculated as 5. Mid is calculated to be 3 according to low and high, and the value 5 of mid is compared with the element to be inserted 3. 5 is greater than 3, so it is necessary to continue searching in the first half of the ordered list. Then the value of high is changed to mid minus 1, and high is 2.

    In steps 5-2, the initial low is 1. In the previous step, the value of high is calculated as 2. According to low and high, mid is calculated to be 1(3 divided by 2 is rounded to 1). The value of mid is compared with the element 3 to be inserted. 1 is less than 3, so it is necessary to continue searching in the latter part of the ordered list.

    In steps 5-3, low is calculated as 2 in the previous step, and high is calculated as 2 in steps 5-1. According to low and high, mid is calculated to be 2, and the value of mid is compared with the element to be inserted 3. 2 is less than 3, so it is necessary to continue searching in the latter part of the ordered list. Then, the value of low is changed to mid plus 1, and low is 3. Continue the loop. If the loop condition is not met, the loop is terminated

    In step 5-4, according to the half-search and comparison, the appropriate position is high+1, that is, the element to be inserted needs to be inserted into position 3, and the elements 5, 6 and 9 need to be shifted backward successively to vacate the position of index bit 3, and the element to be inserted into this index bit 3. Finally, the sorting results 1, 2, 3, 5, 6, 9 are obtained.

2.3 Analysis of split insertion sorting algorithm

Time complexity

There are two layers of loops in the algorithm process. The first layer needs to traverse all elements, so the time complexity is O(n). The second layer of the cycle contains two parts of the algorithm. The first step is to find the position through the half-cut algorithm. The time complexity has been analyzed at the beginning and is O(log2n). The second step is to vacate the space after finding the position, and the corresponding element needs to be shifted. The time complexity is O(n). Then the time complexity of the overall algorithm is the time complexity of the outer loop multiplied by the time complexity of the inner loop. Remove the coefficient and constant and take the larger one, and the result is O(n2).

Spatial complexity

Only several fixed intermediate variables (I, J, Low,mid, High, arrayB [0]) are used in the core part of the algorithm, so the memory consumed in the algorithm is a constant, and the space complexity is O(1).

The stability of

In the process of algorithm, if the semicolon algorithm is used to find the position, the greater than sign is used to compare the value. Therefore, when equal data is encountered, the position will not be changed, so the semicolon algorithm is stable.

To sum up, the time complexity and space complexity of split insertion sort are O(n2) and O(1), which are stable algorithms.

conclusion

There are two kinds of algorithms, half-search algorithm, half-insertion sort algorithm; In the end, the idea of insertion sort is the same, but it is optimized when comparing ordered lists; For the sorting of small data volume, no optimization is felt, when the data volume is large, the comparison efficiency is significantly improved; If you do not understand the small partner debugging code to try, no more can leave a message, there is time to reply in time.

Thank you for your likes, favorites and comments. Continue next time **~~ **

A handsome guy who was made ugly by the program, pay attention to “Code variety circle “, learn with me ~~~