What is the nature of a minimum spanning tree? The Prim algorithm cracked the mystery

This article originated from personal public account: TechFlow, original is not easy, for attention

In today’s 20 articles on algorithms and data structures, we continue with minimum spanning tree algorithms to finish it off.

In the last article, we mainly studied the Kruskal algorithm of minimum spanning tree. Today we’re going to look at the Prim algorithm to understand this problem from another perspective.

From the edge to point

Let’s briefly review the principle of Kruskal’s algorithm, which is very simple despite the length of the previous article. So essentially we sort all the edges of the graph by length, and then we add them in that order as the backbone of the tree.

In this process, in order to avoid the destruction of the tree structure, we use the look-up set algorithm as maintenance. Only edges that are not part of a connected block are considered, otherwise a loop is formed.

Many people, after learning this algorithm, will understand it as a greedy problem, or a usage scenario for parallel lookup sets. That’s true, but there’s a better explanation.

The explanation is the expansion of the edge set.

The whole process that Kruskal runs through is that we keep picking edges to add to the tree, and for a graph of n points, we need n-1 edges. If we focus on the set of edges that we picked out, then we start the algorithm as an empty set, and by the end of the run, it has n minus 1 edges, and it’s saturated.

Of course, if you look at it another way, if we focus on points, then the process of building a minimum spanning tree can also be viewed as an extension of a set of points. It’s just different from edges, edges can be selected, but points can’t, points can only be covered by selecting edges. For example, take a look at the following image:

In the figure, we have a constructed tree on our left, and when we connect AE, we cover E with edges. A point is dependent on an edge, and you can’t cover a point without going through an edge.

We start with an empty set, and every edge after that connects a covered point to an uncovered point, except for the first edge that covers two points. So, again, n points can be covered by n minus 1 edges. This is the core idea of Prim, which is the extension of the set of points.

The overall train of thought

Once we understand that the core idea of Prim is an extension of a set of points, it becomes easy, because every time we choose an edge we must have a covered point and an uncovered point. So instead of being pieced together like Kruskal, we’re generating trees that grow side by side.

Let’s look at an example:

We have connected four points ABCD, where the length of CE is 7, the length of DF is 9, and the length of EF is 5. Although the length of EF is smaller than that of CE, we cannot choose it because we have to connect a covered point with an uncovered point, even though the distance of EF is smaller. CE is the only choice, so the tree gets bigger and bigger as the algorithm runs. If it was Kruskal, we would connect EF first, then CE, and the whole algorithm would run in separate parts, and the final tree would be pieced together gradually.

Compared to the Kruskal maintenance collection, it is much easier to determine whether our maintenance point is covered or not. Since the selected edges of the tree are not modified, we simply use an array to mark each position as overridden. You can do this with a simple bool, which is very convenient.

So we’re left with just one problem, how do we make sure that the sum of paths of the tree that we generate is minimal?

Prim’s answer to this question, like Kruskal’s, is greed. We expand by selecting the smallest edge at a time, and Kruskal sorts all the edges and decides whether to select them in turn. So how does Prim use greed?

It’s actually very simple, and it’s easy for us to figure it out. Prim algorithm has a limit on edges and can only select edges between covered points and uncovered points. So let’s give these edges a name, they’re called augmented edges. So, obviously what we’re going to do is we’re going to choose one of the shortest extendable edges to augment.

The only question that remains is, how do we choose and maintain the shortest extendable edge, do we sort it every time we expand it?

Obviously not, because it would be too expensive to sort every time, so we could use a data structure to maintain the edges and sort them by their length. This data structure should be familiar, as we’ve seen it several times before — the priority queue.

The keys that we’re sorting are pretty obvious, the length of the edges, and it’s pretty easy to determine whether the edges are legal, and we just have to determine if there are any points that aren’t covered. So the whole process is strung together. We can first organize the process and write its process:

Select a point u as covered and queue all connected edges of u and loop through eject an edge from the head of the queue if an edge is legally ejected out of the loop get two ends of the edge and queue all edges of the uncovered end until all points are coveredCopy the code

Finally, let’s take a look at Python’s implementation, starting with the priority queue section, which we can implement using off-the-shelf HEAPQ.

import heapq

class PriorityQueue:
  
    def __init__(self):
 self._queue = []  self._index = 0   def push(self, item, priority):  We pass in two arguments, one is the array to store the elements, and the other is the element to store, which is a tuple.  The default heap size is from small to large  heapq.heappush(self._queue, (priority, self._index, item))  self._index += 1   def pop(self):  return heapq.heappop(self._queue)[- 1]   def empty(self):  return len(self._queue) == 0 Copy the code

Then it is the realization of Prim algorithm. Here, for convenience of storage, we use adjacency list to store edge information. An adjacency list is an array of linked lists, and each element in the array is the head of the list. The linked list stores information about all the edges of a node.

For example, a linked list with subscript 1 in an adjacency list stores information about all edges connected to the node 1. This data structure is often used when we store trees and graphs, but it’s not complicated, and we don’t really need to implement a linked list, because we can simulate it with arrays.

edges = [[1.2.7], [2.3.8], [2.4.9], [1.4.5], [3.5.5], [2.5.7], [4.5.15], [4.6.6], [5.6.8], [6.7.11], [5.7.9]]

if __name__ == "__main__":
    Record point whether overwrite
    visited = [False for _ in range(11)]
 visited[1] = True  An adjacency list can be interpreted as a two-dimensional array  adj_table = [[] for _ in range(11)]  # u and v represent the two endpoints and w represents the length of the line segment  We put v and w into the subscript u  # put u and w into the subscript v  for (u, v, w) in edges:  adj_table[u].append([v, w])  adj_table[v].append([u, w])   que = PriorityQueue()   # We choose 1 as the starting point  Queue all edges adjacent to 1  for edge in adj_table[1] : que.push(edge, edge[1])   ret = 0  # There are seven points, and we need to add six edges  for i in range(7) : If the queue is empty, the tree cannot be formed  while not que.empty():  u, w = que.pop()  If the connected endpoint is already overridden, skip  if visited[u]:  continue  # mark as overwritten  visited[u] = True  ret += w  Queue all edges connected to it  for edge in adj_table[u]:  que.push(edge, edge[1])  break   print(ret) Copy the code

At the end

This concludes our introduction to Prim. In fact, Prim and Kruskal are two sides of the minimum spanning tree algorithm in essence, and they are the same in essence, namely, augmenting. The only difference is that one is augmented by points and the other is augmented by edges. However, since the augmentation of points also depends on edges, Prim not only uses points to judge whether they are covered, but also uses edge information to augment points.

If you simply start with the logic of the algorithm and fail to understand its essence, it is not only easy to confuse the two algorithms, but also easy to get confused when writing code, do not know exactly what to maintain, what to expand.

The idea of augmentation is often used in graph theory-related algorithms (such as network flow), not just in minimum spanning trees, so understanding this concept will be important for the rest of our study. I hope you get the gist of it.

Today’s article is here, the original is not easy, scan code to pay attention to me, get more wonderful articles.

What is the nature of a minimum spanning tree? The Prim algorithm cracked the mystery

From the edge to point

The overall train of thought

At the end

Related Posts

23423

Boosting method, AdaBoost and GBDT

GANs and inefficient mapping