Abstract

A parallel lookup set is a very efficient data structure for solving the problem of whether data sets are connected or not. For example, on a network, different terminals may be connected. If there are terminals A, B, and C, A connects TO B and A connects to C. So we can say that A, B, and C belong to the same connected component. This kind of connection relation analysis of parallel lookup set will be our common tools. Based on application scenarios and collection, the following problems can be solved

  • Give me two points, and tell me are they connected? (If connected, there is no need to give a specific path, the case of a specific path needs to be based on DFS algorithm)
  • Give two points and connect the two points

(Please go directly to the bottom of the article.)

Keywords: parallel search set, connectivity analysis, path compression

Thought analysis

Given a set of data, there are six points of difference, such as [[0,3],[1,2],[2,5], where each subarray represents a connected relationship between two points. The details are shown in the following figure

and check set
0_a 1_b 2_b
3_a 4_c 5_b

It can be easily seen from the figure that the six points are divided into three groups a, B and C according to the connection relation. So how can we represent this connection? So we need to build a data structure to solve the following problems

  • Find the grouping of nodes in the lookup set
  • Determine whether two nodes are connected
  • Associate the two nodes
  • Get the total number of groups
// To solve the problem of whether different nodes have the same root
class UnionFind {
	constructor(n) {
		this.parent = []
		this.count = 0
        this.init(n)
	}
	// Initialize a parallel set of size n
	init(n) {}
	// Find the grouping of nodes in the lookup set
	find(node) {}
	// Check whether two nodes are connected
	some(left, right) {
		return this.find(left) === this.find(right)
	}
	// Associate the two nodes
	union(left, right) {
		let index_l = this.find(left)
		let index_r = this.find(right)
		if(index_l ! == index_r) {/* 这里待补充将两个点连接起来的代码
            .......
            */
            this.count--
            return true
		}
        return false
	}
	// Get the total number of groups
	getCount() {
		return this.count
	}
}
Copy the code

Intuitively we can just use a one-dimensional array to represent the grouping of points. That [‘ a ‘, ‘b’, ‘b’, ‘a’, ‘c’, ‘b’]. We can also select the subscript of a point based on connectivity as a grouping marker, such as parent= [0,1,1,0,4,1]. That is, the parent of the I node is parent[I]. Indicates that I and parent[I] are connected. So our init() method can be designed like this

// Initializes a set and looks it up. Each node is associated with itself
init(n) {
	this.parent.length = 0
	for (let i = 0; i < n; i++) {
		this.parent[i] = i
	}
    this.count=n
}

Copy the code

Next, we’ll discuss step by step how to design the other apis in this data structure. There are the following design methods

The quick – find algorithm

As you can see from the proposed UnionFind class above, the find() method is used so frequently that every critical method needs to use the find() method first. Therefore, we need to design find() as efficiently as possible. Intuitively, we are most efficient when the time complexity is O(1) each time we call find(). So we can design find() as follows.

find(node) {
		return this.parent[node]
	}
Copy the code

All of these calls to find() take O(1) time. The corresponding union() method is

union(left, right) {
		let index_l = this.find(left)
		let index_r = this.find(right)
		if(index_l ! == index_r) {for(let i=0; i<this.parent.length; i++){// Change the grouping of all points belonging to the index_L group to index_r
				if(this.parent[i]==index_l){
					this.parent[i]=index_r
				}
			}
			this.count--
			return true
		}
		reurn false
	}
Copy the code

The union() method above iterates through and looks up the set each time it is called, finding the nodes that need to be modified and changing the grouping that the nodes belong to. Therefore, if the number of new paths is M and the size of search set is N, the time complexity is O(m*n). When the data scale is large, the complexity of square order is not ideal, so we need to explore a more efficient union() method.

Quick-union && weighted quick-Union algorithm

According to the discussion of The Quick-union algorithm, we find that the time complexity of union () method in the Quick-union algorithm is too high, so another method, quick-union, is proposed here. This algorithm design mainly uses the concept of tree, each point stores its parent node. When looking for groups, keep looking up until the parent node is itself. If the parent = [0, 2). When we look up the group of the node whose index is 2, we first look up its parent node whose index is 1. Because the parent of a node with index 1 is not itself, you keep looking up until you find a node with index 0. In this way, we can determine that the nodes in index 2 are in the same group as those in index 0.

find(node){
    while(this.parent[node]! ==node){ node=this.parent[node]
    }
    return node 
}
Copy the code
union(left, right) {
	let index_l = this.find(left)
	let index_r = this.find(right)
	if(index_l ! == index_r) {this.parent[index_l]=index_r
		this.count--
		return true
	}
	return false }
Copy the code

In the design of quick-union algorithm, union () is very efficient, but the efficiency of find () is obviously problematic. The problem is that if the height of the tree is high, the while loop inside find() will be called many times. So is there a way to reduce the height of the tree so that the whole tree is balanced. Looking at the union () method, we see that the code is hard-coded in that we always set the parent of index_L to index_R. That is, we always treat the left tree as a subtree attached to the right tree. Consider the following (left image)

When the number of left trees is large, if we still connect the left tree as a subtree to the right tree. So the height of the whole tree is going to be higher. Another way to think about it is that every time we do a merge, we choose to use it based on the size of each tree

This.parent [index_l]=index_r or this.parent[index_r]= index_L. That is, we always connect small trees as subtrees to large trees. As shown on the right in the figure above. The height of the entire tree will be balanced, which will improve the efficiency of find (). We can use a size array at initialization to store how many nodes are under each node. The initial value is 1. Int () and union () are encoded as follows


    // initialize a set and look up the set
    init(n) {
        this.parent.length = 0
        for (let i = 0; i < n; i++) {
            this.parent[i] = i
		}
		this.count=n
        this.size = new Array(n).fill(1)}Copy the code
  // Associate two nodes, that is, they share a root node. The sum and union join the root nodes of the two nodes
    union(left, right) {
        let l = this.find(left)
        let r = this.find(right)
        if(l ! = r) {// The left side is smaller, so merge the left side into the right tree
            if (this.size[l] < this.size[r]) {
                this.parent[l] = r
                this.size[r] += this.size[l]
            } else {
                this.parent[r] = l
                this.size[l] += this.size[r]
            }
            this.count--
            return true
        }
        return false
    }
Copy the code

Path to the compression

After the above analysis, weighted Quick-Union algorithm is already a good design. Not only does this make union () less complex, but it also flattens the tree of relationships between nodes to help find () execute faster. Based on this, we should think that the best case is that each tree has two layers, which is flattened into a tree of height 2, so that we can use O (1) complexity quickly every time we find(). So how do you do that? It’s very simple. Look at the code

  // Find the root node of the node in the parallel lookup set
  find(node) {
  	while(node ! =this.parent[node]) {
  		// Path compression, set the parent of the child node to the parent node on each lookup. This will continuously flatten the query tree.
  		this.parent[node] = this.parent[this.parent[node]]
        
  		node = this.parent[node]
  	}
  	return node
  }
Copy the code

The important third line is that every time we look for a parent node we set the parent node of the current node to its grandfather node. So it’s directly connected to grandpa. This reduces the height by one layer. When implemented to a certain extent. The whole tree must have only two floors. This is much more efficient for find ().

Conclusion code, js and lookup set

class UnionFind {
	// To solve the problem of whether different nodes have the same root
	constructor(n) {
		this.parent = [] / / and set
		this.size = [] // Total number of nodes under each node
		this.count = 0
		this.init(n)
	}
	// initialize a set and look up the set
	init(n) {
		this.parent.length = 0
		for (let i = 0; i < n; i++) {
			this.parent[i] = i
		}
		this.count = n
		this.size = new Array(n).fill(1)}// Find the root node of the node in the parallel lookup set
     find(node) {
        while(node ! =this.parent[node]) {
            // Path compression, set the parent of the child node to the parent node on each lookup. This will continuously flatten the query tree.
         this.parent[node] = this.parent[this.parent[node]]
		 node = this.parent[node]
        }
        return node
    }
	// Check whether two nodes have the same root node
	some(left, right) {
		return this.find(left) == this.find(right)
	}
	// Associate two nodes, that is, they share a root node. The sum and union join the root nodes of the two nodes
	union(left, right) {
		let l = this.find(left)
		let r = this.find(right)
		if(l ! = r) {// The left side is smaller, so merge the left side into the right tree
			if (this.size[l] < this.size[r]) {
				this.parent[l] = r
				this.size[r] += this.size[l]
			} else {
				this.parent[r] = l
				this.size[l] += this.size[r]
			}
			// The connected component decreases by 1
			this.count--
			return true
		}
		return false
	}
    // Get the total number of groups
	getCount() {
		return this.count
	}
}
Copy the code

The resources

Dm_vincent’s blog post