“This is the fifth day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”

preface

The front-end algorithm series is a record of my algorithm learning, which mainly analyzes the learning algorithm knowledge from common algorithms, data structure, algorithm thinking and common skills. It realizes deliberate practice through LeetCode platform, and practices Feynman learning method through digging gold and the output of STATION B. I will update the quality content and synchronously update it to Nuggets, B station and Github in the future, so as to record the complete process of learning algorithm. Welcome to communicate, like and collect more, let us make progress together, daydayUp 👊

Directory address: directory section

Code address: Github

Bilibili – 100 days algorithm series

A, thinking

Merge sort as one of the classic divide-and-conquer algorithm, the main design idea is divide and conquer, by dividing the original array into two piles, and then sorting respectively, in the process of sorting the two arrays respectively, we can use any kind of sorting method, also can continue to use merge sort through recursive way. Finally, merge the two ordered arrays. In this way, we can divide it into two groups and sort it separately, which is called two-way merge. In addition, in a specific scenario, we can also divide it into multiple blocks and sort it separately, which is called multi-way merge.

2. Realize two-way merge

function mergeSort(nums) {
    // 1. Split the array into two parts
    // Merge the final result
    dfs(nums, 0, nums.length)
}

var dfs = function(nums, l, r) {
    if (l >= r) return
    let mid = Math.floor((l + r) / 2)
    dfs(nums, l, mid)
    dfs(nums, mid + 1, r)
    mergeTwoIntervals(nums, l, mid, mid + 1, r)
}

// 2. Merge two ordered arrays
var mergeTwoIntervals = function(nums, l1, r1, l2, r2) {
    let i = l1, j = l2, k = 0, res = new Array(r2 - l1 + 1)
    / / merge
    while(i <= r1 || j <= r2) {
        if (j > r2 || (i <= r1 && nums[i] <= nums[j])) {
            res[k++] = nums[i++]
        } else {
            // The value is either from the first array or from the second
            // if (i > r1 || (j <= r2 && nums[j] <= nums[i]))
            res[k++] = nums[j++]
        }
    }
    / / replace
    for(let i=l1; i<r2; i++) {
        nums[i] = res[i - l1]
    }
}
Copy the code

Advanced multiway merge

1. Why multiple roads

Think about this for a moment

With a computer with 2GB of ram, how do I sort a 40GB file on my hard disk?Copy the code

For this problem, we can make full use of the divide and conquer idea of merge sort, first divide the 40GB file into 40 1GB file blocks for sorting (internal sorting), and then use merge sort multi-way merge idea to create 40 Pointers, each time calculate the minimum value of which to add to the array. Of course, extra space is required during the final merge, but we can use hard disk space (outer sort) to fill the merged data into a new file.

This will only use 1GB of memory for sorting and only 40 Pointers for subsequent merging.

So one of the great things about merge sort is that:

The final merge operation can be performed using external sort, so it has some advantages when dealing with sorting large amounts of data.Copy the code

Other notes:

Internal sort: Uses in-memory sort, in which data is read

External sort: use external memory sort, in the process of sorting can use the capacity of the hard disk to sort

2. Multi-way implementation

In order to ensure the principle of least knowledge, we do not implement the operation of reading, writing and inner sorting of files here. In addition, we can use the small top heap to realize the minimum value of forty files under normal circumstances. The specific implementation method will be explained in the data structure section.

// The logic of the following functions does not need to be concerned, as long as the input and output are understood
// f.split (file, num) splits a file into smaller files and returns an array of files
// f.als (file) retrieves all the values in the file and returns an array
// f.val (file) retrieves the first value in the file without returning false
// f.shift (file) removes the first value in the file and returns it
// f.ush (name, text) appends the content to a file named name


function threeMergeSort(bigFile) {
    const num = 40
    // 1. Split the file into 40 files
    let files = F.split(bigFile, num)
    // 2. Sort each file internally
    for(let file of files) {
        let vals = F.vals(file)
        mergeSort(vals) // Sort the current file data using the above recursive version of two-way mergeF.push(file.name, ... vals)// Write the sorted array back into the file
    }
    // merge all files
    while(true) {
        let minIdx = -1, minVal = null
        for(let i=0; i<num; i++) {
            let val = F.val(files[i])
            if(! val)continue;
            if (minIdx == -1 || val < minVal) {
                minIdx = i
                minVal = val
            }
        }
        if(minIdx ! = = -1) {
            // Delete the smallest value in the file and add it to the output file
            F.unshift(files[minIdx])
            F.push('output.text', minVal)
        } else {
            // If minIdx is -1, there are no values in all files
            break; }}}Copy the code

3. Summary

So that’s the end of the code and the idea of binary merge and multiple merge, and I don’t know if you learned that, but whether it’s binary merge or multiple merge, the general idea is to split and sort and then merge the results. In addition, if you have any questions or supplements, please leave a message in the comments section.

other

About small top heap implementation – article link TODO