Public account: You and the cabin by: Peter Editor: Peter

Hi, I’m Peter

Sanketo told you those stories about the workers

This article introduces a relatively rare visualization using Plotly: the Sankey graph, which is a great tool for showing the flow of data.

Although sankey chart is not used as frequently as bar chart and pie chart, I still like it very much.

The first time you are exposed to Sankey carts are made using Pyehcarts (we will show you this later). This article will show you how to implement this using Plotly.

A brief introduction of Sankey diagram

1.1 What is a Sankey diagram

Sankey diagram, namely Sankey energy distribution diagram, is also called Sankey energy balance diagram. It describes the flow from one set of values to another, and is a specific type of flow diagram. Sankey, in fact, was a full name: Matthew Henry Phineas Riall Sankey, an Irish-born engineer and captain in the Royal Army Engineers.

In 1898, he used this graph to represent the energy efficiency of the steam engine. In an article on the energy efficiency of the steam engine in the Proceedings of the Society of Civil Engineers, he first introduced the first energy flow diagram, which was named after the Sankey diagram.

Charles Minard’s Map of Napolean’s Russian Campaign of 1812, drawn in 1869, is a flowchart for overlaying sankey maps on a Map. The graph shows the strength of napoleon’s army as it attacks and retreats:

1.2 Characteristics of Sankey diagram

The main characteristics of Sankey diagram:

  1. The initial and end flows are the same, and the sum of all main branch widths and branch widths is equal, preserving the conservation of energy
  2. Inside the Sankey diagram, different lines represent different flow distribution, and different widths of nodes represent the flow size in a specific state

Sankey diagram consists of three elements: node, flow and edge

Sankey diagram is often used for visualization data analysis in energy, material composition, finance and other fields. At the end of this article, a real life example will be presented to illustrate the use of sankey diagrams.

Consider another example of a Sankey diagram: the economic situation of a country or region

2. Basic Sankey diagram

The following example shows the basic Sankey diagram based on the plotly. Graph_objects implementation:

import pandas as pd
import numpy as np

import plotly_express as px
import plotly.graph_objects as go

# construct data

label = ["Node 0"."Node 1"."Node 2"."Node 3".4 "node".5 "node"]
# source and target are the index values for the corresponding elements in label, and python lists start at 0
source = [0.0.0.1.1.0]  # can be seen as a parent node
target = [2.3.5.4.5.4]  # child nodes
value = [9.3.6.2.7.8]   # value is the value connecting source and target
 
Generate dictionary data for drawing
link = dict(source = source, target = target, value = value)
node = dict(label = label, pad=200, thickness=20)  # Node data, interval and thickness Settings

# Add drawing data
data = go.Sankey(link = link, node=node)

# Draw and display
fig = go.Figure(data)
fig.show()
Copy the code

To explain the above drawing code, we need to prepare the following data:

  • Label: indicates the name of each node
  • Soure: Parent node, which in Plotly is represented by the index of the node, starting from 0 in Python
  • Target: indicates the child node of the data flow
  • Value: connects the parent node to the child node

Another way to write it is:

fig = go.Figure(data=[go.Sankey(
    node = dict(
      pad = 200,
      thickness = 20,
      line = dict(color = "black", width = 0.1),
      label = ["Node 0"."Node 1"."Node 2"."Node 3".4 "node".5 "node"],
      color = "blue"
    ),
    link = dict(
      source = [0.0.0.1.1.0].# Indicates the index of the corresponding label
      target = [2.3.5.4.5.4],
      value = [9.3.6.2.7.8]
  ))])

fig.update_layout(title_text="Plotly plotting sankey plots", font_size=10)
fig.show()
Copy the code

Sankey graph based on JSON file format data

Plotly provides an example of how to draw a Sankey diagram by downloading a JSON file from a given website:

1. Read json files and convert them to Python dictionary data

import urllib, json  Import multiple libraries at the same time

url = 'https://raw.githubusercontent.com/plotly/plotly.js/master/test/image/mocks/sankey_energy.json'
response = urllib.request.urlopen(url)  Get the JSON file
data = json.loads(response.read())   Convert json files into Python dictionaries
Copy the code

How to export dictionary formatted data to JSON file and beautify the format?

with open("sankey.json"."a",encoding="utf-8") as f:
    json.dump(data,   # Data to be written
              f, # File object
              indent=2.# space indent to write multiple lines
              sort_keys=True.# order of keys
              ensure_ascii=False)  # display Chinese
Copy the code

The general format of the beautified file (some screenshots) :

opacity = 0.6   # Transparency Settings

fig = go.Figure(data=[go.Sankey(
    valueformat = ".0f",
    valuesuffix = "TWh".# node definition
    node = dict(
      pad = 15.# interval
      thickness = 15.The width of the side
      line = dict(color = "black", width = 0.5),
      label =  data['data'] [0] ['node'] ['label'].# Label and color corresponding to data
      color =  data['data'] [0] ['node'] ['color']),# connect data
    link = dict(  # Parent node, child node, traffic value, node name, color Settings
      source =  data['data'] [0] ['link'] ['source'],
      target =  data['data'] [0] ['link'] ['target'],
      value =  data['data'] [0] ['link'] ['value'],
      label =  data['data'] [0] ['link'] ['label'],
      color =  data['data'] [0] ['link'] ['color')))# Important: HTML tags can be used in headings
fig.update_layout(title_text="Plotly read json file map sankey via the < a href =" https://bost.ocks.org/mike/sankey/ "> Mike Bostock < / a >",
                  font_size=10)
fig.show()
Copy the code

You can also set the background color of the graph:

import plotly.graph_objects as go
import urllib, json

Read data online and convert it to dictionary format
url = 'https://raw.githubusercontent.com/plotly/plotly.js/master/test/image/mocks/sankey_energy.json'
response = urllib.request.urlopen(url)
data = json.loads(response.read())

# Set image parameters
fig = go.Figure(data=[go.Sankey(
    valueformat = ".0f",
    valuesuffix = "TWh",
    node = dict(
      pad = 15,
      thickness = 15,
      line = dict(color = "black", width = 0.5),
      label =  data['data'] [0] ['node'] ['label'],
      color =  data['data'] [0] ['node'] ['color']
    ),
    link = dict(
      source =  data['data'] [0] ['link'] ['source'],
      target =  data['data'] [0] ['link'] ['target'],
      value =  data['data'] [0] ['link'] ['value'],
      label =  data['data'] [0] ['link'] ['label')))# Set the background color
fig.update_layout(
    hovermode = 'x',
    title="Sankey diagram drawing _ Change background Color",
    font=dict(size = 10, color = 'white'),
    plot_bgcolor='green',
    paper_bgcolor='black'   # Background of the whole image (black part)
)

fig.show()
Copy the code

Four characteristic Sankey diagrams

4.1 Sankey Diagram of User-defined Positions

The Sankey diagram, drawn here, is a self-defined node position by XY:

import plotly.graph_objects as go

fig = go.Figure(go.Sankey(
    arrangement = "snap",
    node = {
        "label": ["Node 0"."Node 1"."Node 2"."Node 3".4 "node".5 "node"].# node name
        "x": [0.2.0.1.0.5.0.7.0.3.0.5].# xy to determine the position
        "y": [0.6.0.5.0.2.0.4.0.2.0.5].'pad':1},  # interval
    link = {
        "source": [0.0.1.2.3.4.3.5].# Parent node and flow value
        "target": [5.3.4.3.0.2.2.3]."value": [8.12.12.11.11.10.11.12]}))

fig.show()
Copy the code

By looking at the graph, the coordinates of the entire canvas should have the origin at the top left corner, positive on the horizontal axis and positive on the vertical axis.

4.2 Customize node and edge colors

Color_mode and color_link parameters can be used to customize the node and edge colors of mulberry graph:

import plotly.graph_objects as go

Construct node data

label = ["Node 0"."Node 1"."Node 2"."Node 3".4 "node".5 "node"]
source = [0.0.0.1.1.0]
target = [2.3.5.4.5.4]
value = [9.3.6.2.7.8]  


# Custom colors
color_node = ['#E8C9B0'.'#48C9B0'.'#A8C9B0'.'#AF7AC5'.'#AF7AC5'.'#AF7AC5']

color_link = ['#A6E3D7'.'#D6E3D7'.'#A6E3D7'.'#CBB4D5'.'#CBB4D5'.'#CBB4D5']
 
Generate dictionary data for drawing
link = dict(source = source, target = target, value = value, color=color_link)
node = dict(label = label, pad=200, thickness=20, color=color_node)  # Node data, interval and thickness Settings

# Add drawing data
data = go.Sankey(link = link, node=node)

# Draw and display
fig = go.Figure(data)
fig.show()
Copy the code

Fifth, Sankey chart _ monthly expenses

The following is to explain how to draw sankey chart in actual data through xiaoming’s total expenditure in a month.

1. First of all, let’s look at the consumption data (virtual data) compiled by Xiaoming.

Xiao Ming’s expenses are mainly divided into five blocks: accommodation, catering, catering, transportation, clothing and red envelopes. Each block is divided into its own sub-blocks and corresponding consumption.

2. Collate the data to show the consumption from the parent level to the child level

Since the drawing of sankey graph requires data between parent and child nodes, we need to first summarize the data as follows:

The graph below is the collated data of the five main blocks:

The graph below is the parent and child data collation corresponding to each child block:

Details: Sanketo tells you stories about workers