Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

preface

Hello! Friend!!!

Thank you very much for reading haihong’s article, if there are any mistakes in the article, please point out ~

 

Self-introduction ଘ(੭, ᵕ)੭

Nickname: Haihong

Tag: programmer monkey | C++ contestant | student

Introduction: because of C language to get acquainted with programming, then transferred to the computer major, had the honor to get some national awards, provincial awards… Has been confirmed. Currently learning C++/Linux/Python

Learning experience: solid foundation + more notes + more code + more thinking + learn English well!

 

Little White stage of learning Python

The article is only used as my own study notes for the establishment of knowledge system and review

The problem is not to learn one more question to understand one more question

Know what is, know why!

Graph theory model – Dijkstra

Dijkstra can find the shortest path from one vertex to another.

Basic concept

  • Undirected graph: a graph is undirected if every edge in it is directionless.
  • Directed graph: a graph is directed if every edge is directed.
  • Mixed graph: a graph in which some edges are oriented and some edges are unoriented is called mixed graph.

Sample 1

As shown in the figure below, we need to go from ① to ⑨. The red characters on each edge represent the length of the edge. How do we find the shortest path from ① to ⑨?

Steps:

  1. Mark ① as P and others as T to find the point from ① to which the shortest edge currently reaches and change the point T to P
  2. Find the shortest edge of all P points on the reachable T mark points, and change the reached point T to P
  3. Repeat the steps to change T to P at the guiding end point

Process demonstration: the circle plus the number represents each vertex, the number inside () represents the current walking distance

1. (1) (0) 2. (1) (2) (1) 3. (1) (2) (5) and (3) 4. (1) (2) (5) and (3) (1) (4) and (3) 5. (1) (2) (5) 7 (5) (1) (4) and (3) 6. (1) (2) (5) 7) 6 (6) (1) (4) and (3) 7. (1) (2) (5) 7) 6 (6) (1) (2) (5) (3) (6) (1) (4) and (3) 8. (1) (2) (5) 7) 6 (6) (1) (2) (5) (3) (4) (7) (1) (4) and (3) 9. (1) (2) (5) 7) 6 (8) (1) (2) (5) (3) (4) (7) (1) (4) and (3) 10. (1) (2) (5) 7) 6 (8) (1) (2) (5) all landowners pet-name ruby (8) (1) (2) (5) (3) (4) (7) (1) (4) and (3)

So the shortest path is ① ② ⑤ ⑦ ⑨ with length 8

Weighted adjacency matrix: A weighted adjacency matrix is a matrix that represents the adjacency relationship between vertices. For example, the weighted adjacency matrix of the graph above is shown as follows

Note: there is an error in the original PPT, point 9 -> point 7 should be INF.

Each point is zero away from itself (the main diagonal is zero)

If two adjacent points on the graph are connected, the distance is the value of the matrix, undirected symmetric about the main diagonal; The only directed paths have numbers

The unconnectable distance is infinity, inf

The Demo code

Runtime environment: Vs Code
# There are some mistakes in the code of PPT and it has been modified ~
# The following code has been tested to work
from collections import defaultdict 
from heapq import * 
inf = 99999 # disconnected value
mtx_graph = [[0.1, inf, 3, inf, inf, inf, inf, inf],
             [1.0.5, inf, 2, inf, inf, inf, inf],
             [inf, inf, 0.1, inf, 6, inf, inf, inf],
             [inf, inf, inf, 0, inf, 7, inf, 9, inf],
             [inf, 2.3, inf, 0.4.2, inf, 8],
             [inf, inf, 6.7, inf, 0, inf, 2, inf],
             [inf, inf, inf, inf, inf, 1.0, inf, 3],
             [inf, inf, inf, inf, inf, inf, 1.0.2], 
             [inf, inf, inf, inf, 8, inf, inf, 2.0]]
m_n = len(mtx_graph)# Order of weighted connection matrix
edges = [] # save the distance between two connected points (point A, point B, distance)
for i in range(m_n): 
    for j in range(m_n): 
        ifi! =jandmtx_graph[i][j]! =inf: edges.append((i,j,mtx_graph[i][j]))def dijkstra(edges, from_node, to_node) : 
    go_path = [] 
    to_node = to_node-1 
    g = defaultdict(list) 
    for l,r,c in edges:
        # L: point A, r: point B, c: distance
        Point A -> (distance, point B)
        g[l].append((c,r)) 
    q, seen = [(0, from_node-1, ()),set(a)while q: 
        (cost, v1, path) = heappop(q)Minimum cost of heap eject current path
        if v1 not in seen: 
            seen.add(v1) 
            path = (v1, path) 
            if v1 == to_node: 
                break 
            for c, v2 in g.get(v1, ()): 
                if v2 not in seen: 
                    heappush(q, (cost+c, v2, path)) 
                
    ifv1! =to_node:# unreachable
        return float('inf'), []
    if len(path)>0: 
        left=path[0] 
        go_path.append(left) 
        right=path[1] 
        while len(right)>0: 
            left=right[0] 
            go_path.append(left) 
            right=right[1] 
            
        go_path.reverse() # Reverse transformation
        for i in range(len(go_path)): # label plus 1
            go_path[i]=go_path[i]+1 
    return cost, go_path 
leght, path = dijkstra(edges, 1.9) 
print('The shortest distance is:'+str(leght)) 
print('The path forward is:'+str(path))
Copy the code

The results

The shortest distance is:8The forward path is: [1.2.5.7.9]
Copy the code

Graph theory model -Floyd

Floyd algorithm can solve the shortest path problem (multi-source shortest path) between any two points through dynamic programming, and can correctly deal with the shortest path problem with negative weight

Key principles:

d[i][j]=min(d[i][k]+d[k][j])(1<=k<=n)d[i][j] = min(d[i][k] + d[k][j]) (1 <= k <= n)d[i][j]=min(d[i][k]+d[k][j])(1<=k<=n) Enumeration intermediate point k, find the smallest d + d [I] [k] [k] [I] [k] [j] d + d [k] [I] [k] [j] d + d [k] [j], [I] [j] as a d d [I] [j] d minimum [I] [j]

Key conclusions:

Let’s say that in the shortest path between I and j, the node with the largest number is x. Then d[I][j]d[I][j] [j]d[I][j] must get the minimum value when k = x in the outer loop.

Strong induction:

Let the largest intermediate number from I to x be x1, and the largest intermediate number from x to j be x2. Since x is the largest intermediate number from I to j, it is obvious that x1

According to the conclusion that when k = x1k = x_1k = x1 d [I] [I] [x] [x] d d [I] [x] has made the minimum, when k = x2k = x_2k = x2 d [x] [x] [j] [j] d d [x] [j] has made the most of little value

So k = xk = xk = x, d [I] [I] [x] [x] d d [I] [x], and [x] [j] d d d [x] [j] [x] [j] have to obtain the minimum value.

So when k=xk= xk=x, Execute d [I] [j] = min (d [I] [j], [I] [k] d + d [k] [j]) have d [I] [I] [j] [j] d = min (d [I] [j], [I] [k] d + d [k] [j]) To d [I] [j] [I] [j] d = min (d [I] [j], [I] [k] d + d [k] [j]) have d [I] [j].

The sample 2

Similar to example 1, except this time we find the shortest distance and path from each point to the other, and the red number on each side represents the length of that side.

The Demo code

from collections import defaultdict 
from heapq import * 
import numpy as np
inf = 99999 # disconnected value
mtx_graph = [[0.1, inf, 3, inf, inf, inf, inf, inf],
             [1.0.5, inf, 2, inf, inf, inf, inf],
             [inf, inf, 0.1, inf, 6, inf, inf, inf],
             [inf, inf, inf, 0, inf, 7, inf, 9, inf],
             [inf, 2.3, inf, 0.4.2, inf, 8],
             [inf, inf, 6.7, inf, 0, inf, 2, inf],
             [inf, inf, inf, inf, inf, 1.0, inf, 3],
             [inf, inf, inf, inf, inf, inf, 1.0.2], 
             [inf, inf, inf, inf, 8, inf, inf, 2.0]]
def Floyd(graph) : 
    N=len(graph) 
    A=np.array(graph)
    path=np.zeros((N,N)) 
    for i in range(0,N): 
        for j in range(0,N): 
            ifA[i][j]! =inf: path[i][j]=jfor k in range(0,N): 
        for i in range(0,N): 
            for j in range(0,N):
                A[I][j]+A[k][j]
                A[I][k]+A[k][j]
                if A[i][k]+A[k][j]<A[i][j]: 
                    A[i][j]=A[i][k]+A[k][j] 
                    path[i][j]=path[i][k]
    for i in range(0,N): 
        for j in range(0,N): 
            path[i][j]=path[i][j]+1 
    print('Distance =') 
    print(A) 
    print('path =') 
    print(path) 

Floyd(mtx_graph)
Copy the code

The results

5. 8. 8. 8. 8. 8. 9. Because the original mtx_graph setup was wrong: point 9 to point 7 should be INF instead of 3

Conclusion:

  • The distance from point 1 to point 9 is 8 (the first row, ninth column in the distance matrix in the result)
  • Path: [1-2-5-7-9]

How do we look at paths?

  • View the path matrix from the final result
  • Point 1 to point 9 first of all, row 1, column 9 has a value of 2
  • This is 2 in the middle, and then row 2, column 9, is 5
  • Now row 5, column 9, has a value of 7
  • And then row 7, column 9, has a value of 9
  • Get the final path [1-2-5-7-9]

3 airport route design

The data set comes from the aviation industry and has some basic information about the routes. There is a starting point and destination for a journey. There are also columns representing arrival and departure times for each leg of the journey. This data set is well suited for analysis as a graph. Imagine several cities (nodes) connected by air routes (edges).

If you are an airline, you can ask the following questions:

  • What is the shortest way to get from A to B? Think of it in terms of distance and time.
  • Is there a way to get from C to D?
  • Which airports have the busiest traffic?
  • Which airport is “in between” most other airports? So it can be turned into a local transit point.

0. Airlines. CSV data

1. Data import and observe variables

import numpy as np 
import pandas as pd 
data = pd.read_csv('.. /Profile/Airlines.csv') 
data.shape # Data size

# (100, 16) 100 rows, 16 columns
Copy the code

The results Look at the types of individual variables

data.dtypes The type of each variable
Copy the code

The results

2. Data cleaning

# convert sched_dep_time to 'STD' -- scheduled departure time
data['std'] = data.sched_dep_time.astype(str).str.replace('(\d{2}$)'.' ') + ':' + data.sched_dep_time.astype(str).str.extract('(\d{2}$)', expand=False) + '00'
 
# convert sched_arr_time to "STA" -- scheduled arrival time
data['sta'] = data.sched_arr_time.astype(str).str.replace('(\d{2}$)'.' ') + ':' + data.sched_arr_time.astype(str).str.extract('(\d{2}$)', expand=False) + '00'
 
# convert dep_time to 'ATD' - actual departure time
data['atd'] = data.dep_time.fillna(0).astype(np.int64).astype(str).str.replace('(\d{2}$)'.' ') + ':' + data.dep_time.fillna(0).astype(np.int64).astype(str).str.extract('(\d{2}$)', expand=False) + '00'
 
# Convert ARR_time to 'ATA' - actual time of arrival
data['ata'] = data.arr_time.fillna(0).astype(np.int64).astype(str).str.replace('(\d{2}$)'.' ') + ':' + data.arr_time.fillna(0).astype(np.int64).astype(str).str.extract('(\d{2}$)', expand=False) + '00'
Copy the code

3. Merge time information

data['date'] = pd.to_datetime(data[['year'.'month'.'day']]) 
data = data.drop(columns = ['year'.'month'.'day']) 
Copy the code

4. Create the diagram

import networkx as nx
 
FG = nx.from_pandas_edgelist(data, source='origin', target='dest', edge_attr=True.)# View all nodes
FG.nodes()
Copy the code

The results

#NodeView(('EWR', 'MEM', 'LGA', 'FLL', 'SEA', 'JFK', 'DEN', 'ORD', 'MIA', 'PBI', 'MCO', 'CMH', 'MSP', 'IAD', 'CLT', 'TPA', 'DCA', 'SJU', 'ATL', 'BHM', 'SRQ', 'MSY', 'DTW', 'LAX', 'JAX', 'RDU', 'MDW', 'DFW', 'IAH', 'SFO', 'STL', 'CVG', 'IND', 'RSW', 'BOS', 'CLE'))
Copy the code

Query all edges in the graph

# View all edges
FG.edges()
Copy the code

The results

#EdgeView([('EWR', 'MEM'), ('EWR', 'SEA'), ('EWR', 'MIA'), ('EWR', 'ORD'), ('EWR', 'MSP'), ('EWR', 'TPA'), ('EWR', 'MSY'), ('EWR', 'DFW'), ('EWR', 'IAH'), ('EWR', 'SFO'), ('EWR', 'CVG'), ('EWR', 'IND'), ('EWR', 'RDU'), ('EWR', 'IAD'), ('EWR', 'RSW'), ('EWR', 'BOS'), ('EWR', 'PBI'), ('EWR', 'LAX'), ('EWR', 'MCO'), ('EWR', 'SJU'), ('LGA', 'FLL'), ('LGA', 'ORD'), ('LGA', 'PBI'), ('LGA', 'CMH'), ('LGA', 'IAD'), ('LGA', 'CLT'), ('LGA', 'MIA'), ('LGA', 'DCA'), ('LGA', 'BHM'), ('LGA', 'RDU'), ('LGA', 'ATL'), ('LGA', 'TPA'), ('LGA', 'MDW'), ('LGA', 'DEN'), ('LGA', 'MSP'), ('LGA', 'DTW'), ('LGA', 'STL'), ('LGA', 'MCO'), ('LGA', 'CVG'), ('LGA', 'IAH'), ('FLL', 'JFK'), ('SEA', 'JFK'), ('JFK', 'DEN'), ('JFK', 'MCO'), ('JFK', 'TPA'), ('JFK', 'SJU'), ('JFK', 'ATL'), ('JFK', 'SRQ'), ('JFK', 'DCA'), ('JFK', 'DTW'), ('JFK', 'LAX'), ('JFK', 'JAX'), ('JFK', 'CLT'), ('JFK', 'PBI'), ('JFK', 'CLE'), ('JFK', 'IAD'), ('JFK', 'BOS')])
Copy the code

Calculate the average edge density of the graph

# Note that our 100 rows are all from 3 airports
nx.draw_networkx(FG, with_labels=True) 
nx.algorithms.degree_centrality(FG) 

# Average edge density of graph
nx.density(FG)

# 0.09047619047619047
Copy the code

Run results (visualize graphs) Find the average shortest path length of all paths in the figure

The average shortest path length of all paths in the graph
nx.average_shortest_path_length(FG) 

# 2.36984126984127
Copy the code

The results

For a node of degree K – find the average neighbor degree

# For a node of degree K - what is the average of its neighbor degrees
nx.average_degree_connectivity(FG) # For a node of degree k - What is the average of its neighbours' degree?

#{20: 1.95, 1: 19.307692307692307, 2: 19.0625, 17: 2.0588235294117645, 3: 19.0}
Copy the code

The results

5. Calculate the route

# Find all routes from JAX to DFW
for path in nx.all_simple_paths(FG, source='JAX', target='DFW') :print(path)

Find the Dijkstra path from JAX to DFW.
dijpath = nx.dijkstra_path(FG, source='JAX', target='DFW')
dijpath # ['JAX', 'JFK', 'SEA', 'EWR', 'DFW'] Last line in the figure below
Copy the code

The results

Find the shortest flight time of all routes

Find the shortest flight time of all routes
shortpath = nx.dijkstra_path(FG, source='JAX', target='DFW', weight='air_time')
shortpath
# ['JAX', 'JFK', 'BOS', 'EWR', 'DFW']
Copy the code

Running results:

Matters needing attention

  • The start point and destination can be nodes, and other information should be node or edge attributes; A single edge can be thought of as a journey. Such trips will have different times, flight numbers, tail numbers and other relevant information.
  • Note that year, month, day, and time information is scattered in many columns; To create a date and time column containing all of this information, separate the scheduled and actual arrival and departure times; You should end up with four date-time columns (eta, ETA, ETA, and ETA)
  • Time format problem
  • Data type problem
  • Trouble with NaN values

conclusion

Source of learning: STATION B and its classroom PPT, the code is reproduced

PPT mistakes a little bit more with a long time to learn the principle of finding mistakes ah ~

The above code has been successfully run! Environment: Vs Code

The essay is just a study note, recording a process from 0 to 1

Hope to help you, if there is a mistake welcome small partners to correct ~