This paper mainly introduces the implementation of search set algorithm and related optimization.

And look up the set Union Find

  1. Realization of graph correlation algorithm.

  2. A different kind of tree structure

Connectivity Problem

To visualize the connection problem:

Are the top left, top right, bottom right connected?

Significance: the role in practical application

  • The connection status between nodes in a network

    • A network is an abstract concept: a network of users
  • Social networking: The connections (friends) between users A and B on Facebook. Is it reachable?

  • Music, movies, books and multimedia form a network.

  • Internet A network of web pages

  • Routers form networks with each other

  • Reason traffic, flight scheduling are networks

Set class implementation in mathematics

A union is the realization of a union. & query

Connection problems & path problems

Fewer questions to answer than the path question (what is the path? The connection question only asks if there is a connection)

  • Compare to binary search: sequential search answers rank incidentally. And the position of the previous elements
  • Compare to SELECT: sorting answers more questions. Select answers fewer questions
  • Compare to the heap: only care about the maximum and minimum.

Does it answer questions in addition to the question itself? There are probably more efficient algorithms out there. : Because efficient algorithms don’t need to answer extra questions.

Implement the simplest Union lookup set Union Find

For a set of data, two main actions are supported:

  • union( p , q )
  • find( p )

To answer a question

  • isConnected( p , q )

The simplest representation; The array. 0, 1.

0 to 4 5-9

0-4 is a group, 5-9 is a group. There are relationships between groups, and elements within a group have the same ID

parity

Odd is a group, even is a group.

namespace UF1 { class UnionFind { private: int *id; int count; public: UnionFind(int n) { count = n; id = new int[n]; // Each element is a set for (int I = 0; i < n; i++) id[i] = i; } ~UnionFind() { delete[] id; } // pass the element p and return the id of the element. int find(int p) { assert(p >= 0 && p < count); return id[p]; } bool isConnected(int p, int q) { return find(p) == find(q); } // Pass in two elements and void unionElements(int p, int q) {// Find the id of two elements int pID = find(p); int qID = find(q); If (pID == qID) return; for (int i = 0; i < count; I++) / / from the beginning to the end of the scanning time complexity O (n) if (id = = [I] pID) id [I] = qID; }}; }Copy the code

Testhelper.h:

Namespace UnionFindTestHelper{//n is the amount of data void testUF1(int n){// srand(time(NULL)); UF1::UnionFind uf = UF1::UnionFind(n); time_t startTime = clock(); For (int I = 0; i < n ; i ++ ){ int a = rand()%n; int b = rand()%n; uf.unionElements(a,b); O(n)} for(int I = 0; i < n ; i ++ ){ int a = rand()%n; int b = rand()%n; uf.isConnected(a,b); O(1)} time_endtime = clock(); cout<<"UF1, "<<2*n<<" ops, "<<double(endTime-startTime)/CLOCKS_PER_SEC<<" s"<<endl; }}Copy the code

main.cpp:

int main() {

    int n = 100000;

    UnionFindTestHelper::testUF1(n);

    return 0;
}
Copy the code

Running results:

UF1, 200,000 OPS, 32.3533 s [Finishedin39.7 s]Copy the code

Quick find only requires the O(1) level. But it was slow

And look up another set of implementation ideas

General implementation ideas

Think of each element as a node.

Element nodes

Each element has a pointer to its parent node. Then the uppermost parent pointer points to itself.

Quick Union

Array to store father

parent(i) = i;

union 3 4

union 3 8

union 6 5

union 9 4

We want to connect 9 to the root of 4, 8. In the array: 4-3-8-8 8 is the root of 4. 9 points to 8. 4 and 9 are linked together: because the roots are the same.

results

  • The connection between 6 and 2 is root 0 of 6 and root 1 of 2 selects 1 to hang 0.

Code implementation

namespace UF2{ class UnionFind{ private: int* parent; int count; public: UnionFind(int count){ parent = new int[count]; this->count = count; for( int i = 0 ; i < count ; i ++ ) parent[i] = i; } ~UnionFind(){ delete[] parent; } int find(int p){assert(p >= 0 &&p < count); while( p ! = parent[p] ) p = parent[p]; return p; } // find connected (int p, int q) {return find(p) == find(q); Void unionElements(int p, int q){int pRoot = find(p); int qRoot = find(q); if( pRoot == qRoot ) return; // Attach root to another root parent[pRoot] = qRoot; }}; }Copy the code

Running results:

UF1, 20000 OPS, 0.246341 s UF2, 20000 OPS, 0.059387sCopy the code

When n is large, method 1 is better.

And look up the optimization of the set

Question 1:

Union 9,4 and union 4, 9

union 9 4

9 has fewer elements. Point it to the root of 4. The number of tree layers formed is low.

Union-find namespace UF3{class UnionFind{private: int* parent; // parent[I] indicates the parent of the ith element int* sz; // sz[I] specifies the number of elements in the set root I. Public: // constructor UnionFind(int count){parent = new int[count]; sz = new int[count]; this->count = count; for( int i = 0 ; i < count ; i ++ ){ parent[i] = i; sz[i] = 1; } // destructor ~UnionFind(){delete[] parent; delete[] sz; Int find(int p){assert(p >= 0 &&p < count); Parent [p] == p while(p! = parent[p] ) p = parent[p]; return p; Bool isConnected(int p, int q) {return find(p) == find(q); Void unionElements(int p, int q){int pRoot = find(p); int qRoot = find(q); if( pRoot == qRoot ) return; If (sz[pRoot] < sz[qRoot]){parent[pRoot] = qRoot; parent[pRoot] = qRoot; sz[qRoot] += sz[pRoot]; } else{ parent[qRoot] = pRoot; sz[pRoot] += sz[qRoot]; }}}; }Copy the code

Running results:

UF2, 200000 ops, 19.3316 s
UF3, 200000 ops, 0.0184 s
Copy the code

Analysis of the

  • For UF1, although isConnected only takes O(1) time, the union operation takes O(n) time; The algorithm complexity of the overall testing process is O(n^2)
  • For UF2, its time performance is O(n*h), where H is the maximum height of the tree expressed by the parallel search set
    • Here, strictly speaking, h has nothing to do with logn, but you can see it this way
    • We’re going to optimize for h later, but in general, h is much less than n
    • So the UF2 test results we achieved are much better than UF1, and the larger n is, the more obvious it is 🙂
  • For UF3, its time performance is still O(n*h), where H is the maximum height of the tree expressed by the parallel lookup set
    • However, because UF3 can guarantee the balance of tree with higher probability, the performance is better

Rank based and search set optimization

Analysis of the

It is not entirely reasonable to combine 4 and 2 above and depend on the size of the set to determine who points to whom. The number of layers makes the most sense.

Rank based optimization

Rank [I] specifies the height of the tree whose root node is I

namespace UF4{ class UnionFind{ private: int* rank; // rank[I] = int* parent; // parent[I] indicates the parent of the ith element int count; Public: // constructor UnionFind(int count){parent = new int[count]; rank = new int[count]; this->count = count; for( int i = 0 ; i < count ; i ++ ){ parent[i] = i; rank[i] = 1; } // destructor ~UnionFind(){delete[] parent; delete[] rank; Int find(int p){assert(p >= 0 &&p < count); Parent [p] == p while(p! = parent[p] ) p = parent[p]; return p; Bool isConnected(int p, int q) {return find(p) == find(q); Void unionElements(int p, int q){int pRoot = find(p); int qRoot = find(q); if( pRoot == qRoot ) return; If (rank[pRoot] < rank[qRoot]){parent[pRoot] = qRoot; parent[pRoot] = qRoot; } else if( rank[qRoot] < rank[pRoot]){ parent[qRoot] = pRoot; } else{ // rank[pRoot] == rank[qRoot] parent[pRoot] = qRoot; rank[qRoot] += 1; }}}; }Copy the code

Analysis of the

  • For UF3, its time performance is still O(n*h), where H is the maximum height of the tree expressed by the parallel search set. However, because UF3 can guarantee the balance of the tree with a higher probability, its performance is better

  • UF4 has been optimized compared with UF3, but there are fewer cases in the optimized areas, so the better performance is not obvious, and even worse performance under some data, because there are more judgments.

The results

2000000 ops, 0.313945 s
Copy the code

Path Compression

So we’ve been optimizing the union. Actually we can optimize Find as well. Since each node stores its parent node, all nodes can have countless (multiple) children. In the search value, you can move up the node where the root is not found.

Analysis of the

Let’s say we want to find4

We connect the father of 4 to the father of 4’s father (it doesn’t matter if 3 is the root, because for the root, the father of 3 is still 3).

Consider the parent of 4 :2.

The final result:

Modify find

int find(int p){
assert( p >= 0 && p < count );

  // path compression 1
  while( p != parent[p] ){
    parent[p] = parent[parent[p]];
    p = parent[p];
  }
}
Copy the code

Code implementation of optimal results

  // Path compression 2, a recursive algorithm
  if( p ! = parent[p] ) parent[p] = find( parent[p] );return parent[p];
Copy the code

Final condition

  • Write a recursive function that calls findx and returns the root of the x node. Make each parentx point to the result of findx. The result of findx is also the result of Findparentx. When looking for x, point x’s Findparent result to the parent result.

The optimization is not obvious. Even because of recursive consumption. So what’s best in theory isn’t necessarily good in practice.

After the optimization of searching set and searching set operation, the time complexity is almost O(1)


— — — — — — — — — — — — — — — — — — — — — — — — — gorgeous line — — — — — — — — — — — — — — — — — — — —

After watching the friends can click a like/follow, your support is the biggest encouragement to me.

Personal blog tomato technology small stack and gold digging home page

If you want to know more, please follow my wechat official account: Tomato Technology Xiaozhan