Suppose we were to connect all 15 of the largest Msas to the hyperloop network. The goal is to minimize the cost of the network’s rollout, which means using the least number of tracks. The question then becomes: how do you connect all the Msas with the least amount of track?

4.4.1 Weight treatment

To understand the number of tracks needed to build an edge, you need to know the distance that edge represents. Now it is time to reintroduce the concept of weights. In the Hyperloop network, the weight of an edge is the distance between two connected Msas. Figure 4-5 is almost the same as Figure 4-2, except that each edge is weighted to indicate the distance (in miles) between the two vertices connected by the edge.

 

Figure 4-5 Weighted diagram of the 15 largest MSA in the United States. The weights represent the distance between the two MSA in miles (1 mile ≈1.6093 km).

To handle weights, we subclass WeightedEdge for Edge and WeightedGraph for Graph. Each WeightedEdge has a float type data associated with it representing its weight. Jarnik’s algorithm, which will be introduced in a moment, compares two edges and determines which edge has a lower weight. Using numerical weights makes it easy to compare. The detailed code is shown in Listing 4-6.

Listing 4-6weighted_edge.py

from __future__ import annotations
from dataclasses import dataclassfrom edge import Edge@dataclassclass WeightedEdge(Edge):    weight: float    def reversed(self) -> WeightedEdge:
        return WeightedEdge(self.v, self.u, self.weight)
    # so that we can order edges by weight to find the minimum weight edge    def __lt__(self, other: WeightedEdge) -> bool:
        return self.weight<other.weight
    def __str__(self) -> str:
        return f"{self.u} {self.weight}> {self.v}"
Copy the code

The implementation code for WeightedEdge is not much different from that for Edge, except that a weight attribute is added and the “<” operator is implemented via __lt__() so that two WeightedEdge can be compared to each other. The “<” operator deals only with weights (and not with the inherited attributes u and v), because Jarnik’s algorithm focuses only on finding the least weighted edge.

As listing 4-7 shows, WeightedGraph inherits most of its functionality from Graph, including an __init__ method and a convenient way to add WeightedEdge, as well as implementing its own __str__() method. It also has a new method, neighbors_for_index_with_weights(), which returns not only each neighbor, but also the weight of the edge that reached that neighbor. This method is useful for its __str__().

Listing 4-7weighted_graph.py

from typing import TypeVar, Generic, List, Tuple
from graph import Graphfrom weighted_edge import WeightedEdgeV = TypeVar('V') # type of the vertices in the graph
class WeightedGraph(Generic[V], Graph[V]):    def __init__(self, vertices: List[V] = []) -> None:
        self._vertices: List[V] = vertices
        self._edges: List[List[WeightedEdge]] = [[] for _ in vertices]
    def add_edge_by_indices(self, u: int, v: int, weight: float) -> None:
        edge: WeightedEdge = WeightedEdge(u, v, weight)        self.add_edge(edge) # call superclass version
    def add_edge_by_vertices(self, first: V, second: V, weight: float) -> None:
        u: int = self._vertices.index(first)
        v: int = self._vertices.index(second)
        self.add_edge_by_indices(u, v, weight)
    def neighbors_for_index_with_weights(self, index: int) -> List[Tuple[V, float]]:
        distance_tuples: List[Tuple[V, float]] = []        for edge in self.edges_for_index(index):
            distance_tuples.append((self.vertex_at(edge.v), edge.weight))
        return distance_tuples
    def __str__(self) -> str:
        desc: str = ""
        for i in range(self.vertex_count):
            desc += f"{self.vertex_at(i)} -> {self.neighbors_for_index_with_weights(i)}\n"
        return desc
Copy the code

Now you can actually define the weighted graph. Here we’ll use a weighted graph shown in Figure 4-5 called city_graph2. The detailed code is shown in Listing 4-8.

Listing 4-8weighted_graph.py (continued)

if __name__ == "__main__":
    city_graph2: WeightedGraph[str] = WeightedGraph(["Seattle", "San Francisco", 
      "Los Angeles", "Riverside", "Phoenix", "Chicago", "Boston", "New York", "Atlanta",
      "Miami", "Dallas", "Houston", "Detroit", "Philadelphia", "Washington"])
    city_graph2.add_edge_by_vertices("Seattle", "Chicago", 1737)
    city_graph2.add_edge_by_vertices("Seattle", "San Francisco", 678)
    city_graph2.add_edge_by_vertices("San Francisco", "Riverside", 386)
    city_graph2.add_edge_by_vertices("San Francisco", "Los Angeles", 348)
    city_graph2.add_edge_by_vertices("Los Angeles", "Riverside", 50)
    city_graph2.add_edge_by_vertices("Los Angeles", "Phoenix", 357)
    city_graph2.add_edge_by_vertices("Riverside", "Phoenix", 307)
    city_graph2.add_edge_by_vertices("Riverside", "Chicago", 1704)
    city_graph2.add_edge_by_vertices("Phoenix", "Dallas", 887)
    city_graph2.add_edge_by_vertices("Phoenix", "Houston", 1015)
    city_graph2.add_edge_by_vertices("Dallas", "Chicago", 805)
    city_graph2.add_edge_by_vertices("Dallas", "Atlanta", 721)
    city_graph2.add_edge_by_vertices("Dallas", "Houston", 225)
    city_graph2.add_edge_by_vertices("Houston", "Atlanta", 702)
    city_graph2.add_edge_by_vertices("Houston", "Miami", 968)
    city_graph2.add_edge_by_vertices("Atlanta", "Chicago", 588)
    city_graph2.add_edge_by_vertices("Atlanta", "Washington", 543)
    city_graph2.add_edge_by_vertices("Atlanta", "Miami", 604)
    city_graph2.add_edge_by_vertices("Miami", "Washington", 923)
    city_graph2.add_edge_by_vertices("Chicago", "Detroit", 238)
    city_graph2.add_edge_by_vertices("Detroit", "Boston", 613)
    city_graph2.add_edge_by_vertices("Detroit", "Washington", 396)
    city_graph2.add_edge_by_vertices("Detroit", "New York", 482)
    city_graph2.add_edge_by_vertices("Boston", "New York", 190)
    city_graph2.add_edge_by_vertices("New York", "Philadelphia", 81)
    city_graph2.add_edge_by_vertices("Philadelphia", "Washington", 123)
    print(city_graph2)
Copy the code

Because WeightedGraph implements __str__(), we can gracefully print city_graph2. All vertices connected to each vertex and their weights are displayed in the output.

Seattle -> [('Chicago', 1737), ('San Francisco', 678)]
San Francisco -> [('Seattle', 678), ('Riverside', 386), ('Los Angeles', 348)]
Los Angeles -> [('San Francisco', 348), ('Riverside', 50), ('Phoenix', 357)]
Riverside -> [('San Francisco', 386), ('Los Angeles', 50), ('Phoenix', 307), ('Chicago',
   1704)]
Phoenix -> [('Los Angeles', 357), ('Riverside', 307), ('Dallas', 887), ('Houston', 1015)]
Chicago -> [('Seattle', 1737), ('Riverside', 1704), ('Dallas', 805), ('Atlanta', 588), 
    ('Detroit', 238)]
Boston -> [('Detroit', 613), ('New York', 190)]
New York -> [('Detroit', 482), ('Boston', 190), ('Philadelphia', 81)] 
Atlanta -> [('Dallas', 721), ('Houston', 702), ('Chicago', 588), ('Washington', 543),
    ('Miami', 604)]
Miami -> [('Houston', 968), ('Atlanta', 604), ('Washington', 923)]
Dallas -> [('Phoenix', 887), ('Chicago', 805), ('Atlanta', 721), ('Houston', 225)]
Houston -> [('Phoenix', 1015), ('Dallas', 225), ('Atlanta', 702), ('Miami', 968)]
Detroit -> [('Chicago', 238), ('Boston', 613), ('Washington', 396), ('New York', 482)]
Philadelphia -> [('New York', 81), ('Washington', 123)]
Washington -> [('Atlanta', 543), ('Miami', 923), ('Detroit', 396), ('Philadelphia', 123)]
Copy the code

4.4.2 Searching for the Minimum Spanning Tree

A tree is a special kind of graph in which only one path exists between any two vertices, meaning that there is no loop (cycle) in the tree, sometimes called acyclic. A loop can be thought of as a loop. A loop is said to exist if the graph can be traversed from a starting vertex, without repeating any edges, and back to the starting vertex. Any graph that is not a tree can be made a tree by trimming the edges. Figure 4-6 illustrates the process of turning a graph into a tree by trimming edges.

 

Figure 4-6 In the left figure, there is a loop between vertices B, C, and D, so it is not a tree. In the picture on the right, the edge connecting C and D has been trimmed away, so it is a tree

A Connected graph is a graph that can take a path from one vertex to another. All graphs in this chapter are connected graphs. A spanning tree is a tree that connects all the vertices of a graph. A minimum spanning tree is a tree that connects each vertex of a weighted graph with the minimum total weight (relative to other spanning trees). For each weighted graph, we can find its minimum spanning tree efficiently.

There’s a lot of jargon going on here! “Find the minimum spanning tree” means the same thing as “join all vertices in a weighted graph with the least weight,” which is the key point. This is an important and practical problem for anyone designing networks (transportation networks, computer networks, etc.) : How can all the nodes in the network be connected at the lowest cost? The cost here could be power lines, tracks, roads or anything else. In terms of telephone networks, another way of asking the question is: What is the minimum cable length required to connect each phone?

1. Revisit priority queues

Priority queues were covered in Chapter 2. Jarnik’s algorithm will need to use priority queues. We can import the PriorityQueue class from the package in Chapter 2, see the notes immediately before Listing 4-5 for details, or we can copy the class into a new file and put it in the package in this chapter. For completeness, in Listing 4-9, we recreate the PriorityQueue from Chapter 2, assuming that the import statements will be placed in a separate file.

Listing 4-9Priority_queue.py

from typing import TypeVar, Generic, List
from heapq import heappush, heappopT = TypeVar('T')
class PriorityQueue(Generic[T]):    def __init__(self) -> None:
         self._container: List[T] = []
    @property    def empty(self) -> bool:
        return not self._container  # not is true for empty container
    def push(self, item: T) -> None:
        heappush(self._container, item)  # in by priority
    def pop(self) -> T:
        return heappop(self._container)  # out by priority
    def __repr__(self) -> str:
        return repr(self._container)
Copy the code

2. Calculate the total weight of the weighted path

Before we can develop a way to find a minimum spanning tree, we need to develop a function that detects the total weight of a solution. The solution of the minimum spanning tree problem will consist of a list of weighted edges that make up the tree. First, we define a WeightedPath as a WeightedEdge list. Then we define a total_weight() function that takes a WeightedPath list and adds the weights of all the edges to get the total weight. The detailed code is shown in Listing 4-10.

Listing 4-10mst.py

from typing import TypeVar, List, Optional
from weighted_graph import WeightedGraph
from weighted_edge import WeightedEdge
from priority_queue import PriorityQueue
V = TypeVar('V') # type of the vertices in the graph
WeightedPath = List[WeightedEdge] # type alias for paths
def total_weight(wp: WeightedPath) -> float:
    return sum([e.weight for e in wp])
Copy the code

3. Jarnik algorithm

Jarnik’s algorithm for finding minimum spanning trees divides the graph into two parts: vertices that are being generated and vertices that have not yet been added to the minimum spanning tree. Its working steps are as follows.

(1) Select any vertex to be included in the minimum spanning tree.

(2) Find the edge with the least weight connecting the minimum spanning tree and the vertices not yet added to the tree.

(3) Add the vertex at the end of the least weighted edge to the minimum spanning tree.

(4) Repeat steps 2 and 3 until every vertex in the graph has been added to the minimum spanning tree.

Note that the Jarnik algorithm is often called the Prim algorithm. In the late 1920s two Czech mathematicians OtakarBorůvka and Vojtě chJarnik worked hard to minimize the cost of laying wires and came up with an algorithm for solving the minimum spanning tree problem. Their proposed algorithm was “rediscovered” decades later by others [3].

To run Jarnik’s algorithm efficiently, priority queues are needed. Every time a new vertex is added to the minimum spanning tree, all outgoing edges connected to vertices outside the tree are added to the priority queue. The eject from the priority queue must be the least weighted edge, and the algorithm will continue to run until the priority queue is empty. This ensures that the least weighted edge will always be added to the tree first. If the popped edge is connected to an existing vertex in the tree, it is ignored.

The MST () in Listing 4-11 fully implements Jarnik’s algorithm [4] and comes with a utility function to print WeightedPath.

A caveat is that Jarnik’s algorithm does not necessarily work on directed graphs, nor does it work on disconnected graphs.

Listing 4-11mst.py (continued)

def mst(wg: WeightedGraph[V], start: int = 0) -> Optional[WeightedPath]:
    if start > (wg.vertex_count - 1) or start < 0:
        return None
    result: WeightedPath = [] # holds the final MST
    pq: PriorityQueue[WeightedEdge] = PriorityQueue()    visited: [bool] = [False] * wg.vertex_count # where we've been
    def visit(index: int):        visited[index] = True # mark as visited
        for edge in wg.edges_for_index(index): 
            # add all edges coming from here to pq            if not visited[edge.v]:
                pq.push(edge)    visit(start) # the first vertex is where everything begins
    while not pq.empty: # keep going while there are edges to process
        edge = pq.pop()        if visited[edge.v]:
            continue # don't ever revisit
        # this is the current smallest, so add it to solution        result.append(edge)         visit(edge.v) # visit where this connects
    return result
def print_weighted_path(wg: WeightedGraph, wp: WeightedPath) -> None:
    for edge in wp:
        print(f"{wg.vertex_at(edge.u)} {edge.weight}> {wg.vertex_at(edge.v)}")
    print(f"Total Weight: {total_weight(wp)}")
Copy the code

Let’s go through MST () line by line.

def mst(wg: WeightedGraph[V], start: int = 0) -> Optional[WeightedPath]:
    if start > (wg.vertex_count - 1) or start < 0:
        return None
Copy the code

This algorithm returns a WeightedPath object that represents a minimum spanning tree. The starting position of the algorithm is irrelevant (assuming the graph is connected and undirected), so the default is a vertex with index 0. If start is invalid, MST () returns None.

result: WeightedPath = [] # holds the final MST
pq: PriorityQueue[WeightedEdge] = PriorityQueue()visited: [bool] = [False] * wg.vertex_count # where we've been
Copy the code

Result will be the final place to store the weighted path, which contains the minimum spanning tree. A weighte Edge is added to result as the least-weighted edge is popped up and new areas of the graph are traversed. Jarnik is considered one of the greedy algorithms because it always selects the least weighted edge. Pq is used to store newly discovered edges and pop up the next lowest weighted edge. Visited is used to record the visited index of vertices, which can also be accomplished with Set, similar to the explored in BFS ().

def visit(index: int):
    visited[index] = True # mark as visited
    for edge in wg.edges_for_index(index): 
        # add all edges coming from here
        if not visited[edge.v]:
            pq.push(edge)
Copy the code

Visit () is an internally useful function that marks vertices as visited and adds edges to pQ from unvisited vertices. Notice how easy it is to find edges that belong to a vertex using the adjacency list model.

visit(start) # the first vertex is where everything begins
Copy the code

Unless the graph is disconnected, it doesn’t matter which vertex is accessed first. If the graph is disconnected and consists of discrete parts, then the tree returned by MST () covers only one part of the graph, the part of the graph to which the starting node belongs.

while not pq.empty: # keep going while there are edges to process
    edge = pq.pop()
    if visited[edge.v]:
        continue # don't ever revisit
    # this is the current smallest, so add it
    result.append(edge) 
    visit(edge.v) # visit where this connects
return result
Copy the code

As long as there are edges in the priority queue, we pop them up and check to see if they lead to vertices that have not yet been added to the tree. Because the priority queue is sorted in ascending order, the least weighted edge is popped first. This ensures that the result does have a minimum total weight. If the pop-up edge does not lead to an unexplored vertex, it is ignored; otherwise, because it is by far the least weighted edge, it is added to the result set and the new vertices it leads to are explored. If there are no edges left to explore, the result is returned.

Finally, back to the problem of connecting America’s 15 biggest Msas with the least track hyperloop network. The resulting path is the minimum spanning tree for city_graph2. Let’s try running MST () on city_graph2, as shown in Listing 4-12.

Listing 4-12mst.py (continued)

if __name__ == "__main__": city_graph2: WeightedGraph[str] = WeightedGraph(["Seattle", "San Francisco", "Los Angeles", "Riverside", "Phoenix", "Chicago", "Boston", "New York", "Atlanta", "Miami", "Dallas", "Houston", "Detroit", "Philadelphia", "Washington"]) city_graph2.add_edge_by_vertices("Seattle", "Chicago", 1737) city_graph2.add_edge_by_vertices("Seattle", "San Francisco", 678) city_graph2.add_edge_by_vertices("San Francisco", "Riverside", 386) city_graph2.add_edge_by_vertices("San Francisco", "Los Angeles", 348) city_graph2.add_edge_by_vertices("Los Angeles", "Riverside", 50) city_graph2.add_edge_by_vertices("Los Angeles", "Phoenix", 357) city_graph2.add_edge_by_vertices("Riverside", "Phoenix", 307) city_graph2.add_edge_by_vertices("Riverside", "Chicago", 1704) city_graph2.add_edge_by_vertices("Phoenix", "Dallas", 887) city_graph2.add_edge_by_vertices("Phoenix", "Houston", 1015) city_graph2.add_edge_by_vertices("Dallas", "Chicago", 805) city_graph2.add_edge_by_vertices("Dallas", "Atlanta", 721) city_graph2.add_edge_by_vertices("Dallas", "Houston", 225) city_graph2.add_edge_by_vertices("Houston", "Atlanta", 702) city_graph2.add_edge_by_vertices("Houston", "Miami", 968) city_graph2.add_edge_by_vertices("Atlanta", "Chicago", 588) city_graph2.add_edge_by_vertices("Atlanta", "Washington", 543) city_graph2.add_edge_by_vertices("Atlanta", "Miami", 604) city_graph2.add_edge_by_vertices("Miami", "Washington", 923) city_graph2.add_edge_by_vertices("Chicago", "Detroit", 238) city_graph2.add_edge_by_vertices("Detroit", "Boston", 613) city_graph2.add_edge_by_vertices("Detroit", "Washington", 396) city_graph2.add_edge_by_vertices("Detroit", "New York", 482) city_graph2.add_edge_by_vertices("Boston", "New York", 190) city_graph2.add_edge_by_vertices("New York", "Philadelphia", 81) city_graph2.add_edge_by_vertices("Philadelphia", "Washington", 123) result: Optional[WeightedPath] = mst(city_graph2) if result is None: print("No solution found!" ) else: print_weighted_path(city_graph2, result)Copy the code

Thanks to the beautiful printWeightedPath() method, the minimum spanning tree is pretty readable.

Seattle 678> San Francisco
San Francisco 348> Los Angeles
Los Angeles 50> Riverside
Riverside 307> Phoenix
Phoenix 887> Dallas
Dallas 225> Houston
Houston 702> Atlanta
Atlanta 543> Washington
Washington 123> Philadelphia
Philadelphia 81> New York
New York 190> Boston
Washington 396> Detroit
Detroit 238> Chicago
Atlanta 604> Miami
Total Weight: 5372
Copy the code

In other words, this is the shortest combination of the total side lengths connecting all the MSA in the weighted graph, requiring at least an orbit of 8645 km. Figure 4-7 shows the minimum spanning tree.

 

The highlighted edges in Figure 4-7 represent the minimum spanning tree connecting all 15 Msas

This article is excerpted from The Essence of Algorithms: Python Implementations of Classic Computer Science Problems. By David Kopec. Translated by Dai Xu.

 

This book is an algorithm tutorial for middle and advanced programmers, using classical algorithms, coding techniques and principles to solve some classical problems in computer science. Of nine chapters of the book, not only introduces the recursive, result cache and a basic programming components, such as the operation also tells the story of common search algorithm, common graph algorithm, neural network, genetic algorithm and k-means clustering algorithm, against the search algorithm, etc., using the Python advanced features such as the type hinting at all levels and through the concrete practice scheme, examples and exercises.

This book combines computer science with practical problems such as applications, data, performance, etc., with unique positioning and classic examples. It is suitable for intermediate and advanced Python programmers with some programming experience to improve their technical, programming, and application skills in solving practical problems with Python.